top | item 28210966

(no title)

At the end of the day, even if you assume good faith on Google's part (which I think is quite a leap), causing the user to present more entropy to the site will make them easier to fingerprint.

256 topics would be ceil(log2(256)) = 8 bits of entropy

30,000 topics would be ceil(log2(30000) = 15 bits of entropy

As a reminder, there are ~ 10 billion people on earth, so if you have 34 bits of entropy or so, you can uniquely identify each person.

So really, the way to think of this as "Google considers making FLoC 20% less effective at fingerprinting users", and that's not even considering other sources of entropy, like user agent or screen size.

discuss

josefx|4 years ago

> and that's not even considering other sources of entropy, like user agent or screen size.

As a reminder: Chrome sends 16 bits of x-client-data with every http request aimed at Google servers. So they already have half the bits they need to uniquely identify your system without FLoC.

jefftk|4 years ago

The X-Client-Data header is only for evaluating the effect of changes to Chrome, not ad targeting: https://www.google.com/chrome/privacy/whitepaper.html#variat...

Earlier comment with more details: https://news.ycombinator.com/item?id=27367482

(Disclosure: I work on ads at Google, speaking only for myself)

kleene_op|4 years ago

>256 topics would be ceil(log2(256)) = 8 bits of entropy

Unless several topics can be assigned to a person (which seems to be implied in the article), in which case that's 256 bits of entropy available to classify each person.

>As a reminder, there are ~ 10 billion people on earth, so if you have 34 bits of entropy or so, you can uniquely identify each person.

Yeah, well theoretically you could. But that assumes that browsers are able to extract and balance some very arbitrary and very specific information from the browsing habits of all people on earth in a perfect decision tree.

In practice, lots of browsing habits overlap, making this decision tree far less discriminating and powerful than the theoretically optimal one.

Though I think you are absolutely correct that in practice the number of bits to build up a classifier able to uniquely classify each person must be pretty low. Maybe a few hundreds.

That may very well be possible with those 256 topics mentioned in that article.

Also I don't understand the difference between cohorts and topics, apart from the fact that topic are less numerous and can have appealing names?

charlesdaniels|4 years ago

> Unless several topics can be assigned to a person (which seems to be implied in the article), in which case that's 256 bits of entropy available to classify each person.

Good catch, forgot this was a bit-vector not a single key.

> Yeah, well theoretically you could. But that assumes that browsers are able to extract and balance some very arbitrary and very specific information from the browsing habits of all people on earth in a perfect decision tree.

Not really, people have found in the past that combinations of user agent, screen resolution, installed fonts, installed extensions, and things of that sort can come very close to uniquely identifying individual people.

> Though I think you are absolutely correct that in practice the number of bits to build up a classifier able to uniquely classify each person must be pretty low. Maybe a few hundreds.

Exactly. It might not narrow it down to one person, but perhaps a relatively small pool.

ggggtez|4 years ago

I'm not exactly sure what you're doing with your math there, but I think you probably should include what you think is the current entropy of your browsing sessions...

Considering you're already aware of screen size and user agent, and other forms of fingerprinting, you should probably realize that in the pre-FLoC world, you're likely already 100% identified by numerous ad networks.

omegalulw|4 years ago

While your general assertion is true (log2(10billion) ceil'd is indeed 34) it is also misleading. For it assumes that that your identifier will be almost completely uniformly distributed. That is a very hard to achieve, in fact every team that's trying to track users is effectively trying to solve this problem.

omginternets|4 years ago

Where can I read an introduction to the concept of "entropy" in the sense that you're using it?

I understand it's an information-theoretical concept, and also understand it's somehow related to randomness, but I'm not sure exactly how, and I would like to have a more precise understanding.

Seirdy|4 years ago

I'm going to copy something I sent to a 13 year old to explain entropy in simple terms. It came up when we were talking about encryption. Reading forwards goes from dense/mathematical to conceptual; reading sections in reverse order does the opposite. This probably won't be useful to you but I have found it useful in other situations.

N bits of entropy refers to 2^n possible possible states.

Cryptanalysis:

AES-128 has a key size of 128 bits, so there are 2^128 possible AES-128 keys. A brute-force attack capable of testing 2^128 keys can break any AES-128 key with certainty.

Fingerprinting:

If a website measures your "uniqueness", saying "one in over 14 thousand people" isn't a great way to measure uniqueness because that number changes exponentially. Since we're dealing possible states, i.e. possible combinations of screen size, user-agent, etc., we instead take the base-2 logarithm of this to get a count of entropy bits (~13.8 bits).

Thermal physics:

The second law of thermodynamics states that spontaneous changes in a system should move from a low- to a high-entropy state. Hot particles are far apart and moving a lot; there are many possible states. Cold particles are moving around less and can't change as easily; there are fewer possible states. Heat cannot move from cold things to hot things on its own, but it can move from hot things to cold things. Think of balls on a billiards table moving apart rather than together.

Entropy of the whole universe is perpetually on the rise. In an unimaginably long time, the most popular understanding is that particles will all be so far apart that they'll never interact. The universe will look kind of like white noise. And endless sea of random-like movement, where everything adds up to nothing, everywhere and forever.

omeze|4 years ago

Assuming light background in introductory probability, try the first 4 pages of this: https://web.stanford.edu/~montanar/RESEARCH/BOOK/partA.pdf

unknown|4 years ago

[deleted]

mishafb|4 years ago

[deleted]