Stop recommending UUID for deviceId/groupId #682

dontcallmedom · 2020-04-07T16:20:45Z

Because UUID are unique, and until we have double-keyed storage widely available, having deviceId (and groupId) being UUID creates a tracking opportunity for getUserMedia() callers embedded as third-party iframes.

I think having deviceIds be simple monotonically incremented integers should be sufficient for our use case. I'm not sure how we need to be here, beyond no longer recommending to use UUIDs as ids.

/cc @pes10k since we were discussing this while reviewing progress on privacy-related work on the spec

The text was updated successfully, but these errors were encountered:

youennf · 2020-04-08T07:45:04Z

Related to or a dupe of #607.

deviceIds be simple monotonically incremented integers should be sufficient for our use case

This has been discussed in the past.
If you are incrementing integers and try to persist them so that they can be used by web pages across navigations, these integers will end up becoming a tracker for those users that use non built-in camera/microphone devices. These integers might be unique to the user and not even tied to an origin.

If you do not have double-keyed storage, users are anyway being tracked with or without device Ids (given IDs are regenerated anytime website data like IDB is cleared). The spec now mentions double keyed storage, maybe the wording should be made stronger?

Note also that the spec now mandates device Ids to only be exposed after the page is capturing.
This makes it difficult to be used as a tracker in practice. And once page is capturing, device IDs are probably not the most personal information that is leaking to the page.

dontcallmedom · 2020-04-08T07:49:23Z

thanks @youennf for clarifying the current thinking on that particular sub-issue.

@pes10k - wdyt?

pes10k · 2020-04-08T21:46:54Z

these integers will end up becoming a tracker for those users

I don't believe this is correct. If you are reusing the same identifiers across users, they're by definition not identifying that user :) They're only "identifying" when you already have an identifying key to join against (in which case, all bets are off).

The goal of making these less identifying is:

to remove more foot guns for vendors who are managing storage for users (for privacy reasons) in ways that aren't easily compatible with the "all or nothing" definitions keyed off in the spec. The different implementations of Storage Access API do this to a degree (especially in storage upgrade cases), Safari's ITP does this in places, Brave does this in places, etc. There are places where storage is cleared / changed, and not having unique identifiers here makes it much easier to reason about the privacy boundaries in such cases (and to prevent the device ids from being the unique key to join tracking session data).
In general, we should have a hard line about adding unique identifiers into the platform. This is the only case im aware of where this is, and it would be very good to address this too.

the spec now mandates device Ids to only be exposed after the page is capturing

This is extremely good news and very appreciated :)

jan-ivar · 2020-04-16T20:34:45Z

If you are incrementing integers and try to persist them so that they can be used by web pages across navigations, these integers will end up becoming a tracker for those users that use non built-in camera/microphone devices.

I agree with @youennf here. E.g. everything may look benign at first:

Your USB camera, mic & speaker ids may be 0, 1 and 2. Your bluetooth headset 3 and 4.

Cut to 6 months later: Think of every device you've plugged into your system since then, even for a brief moment: conference room external speakers, a friend's camera, bluetooth headset of every family member. The "new id" counter may now be much higher.

Say you purchase a new USB camera, mic, and external speaker after trying a couple in a store, and you got a new bluetooth headset a couple of months ago. Their ids may now be 34, 35, 42, 14, 15. These bits may now help correlate you across origins you visit, because they'll be the same across all origins you visit (even in first-party pages, no iframes needed). They may not be enough to identify you uniquely, but along with other bits they might.

We should weigh that risk against today's origin-unique id, which is not correlatable across first party pages, and prevented from showing up in iframes by default, or may have its iframe storage partitioned in the near future (along with equally damaging JS-created ids in local storage).

jan-ivar · 2020-04-16T20:41:54Z

TL;DR: The bits will be based on your USB insertion/removal activity and shopping behavior, which may be quite unique. Sites may even deduce you've made a new purchase since last visit.

guest271314 · 2020-04-19T15:55:29Z

Because UUID are unique, and until we have double-keyed storage widely available, having deviceId (and groupId) being UUID creates a tracking opportunity for getUserMedia() callers embedded as third-party iframes.

How? What would be tracked?

pes10k · 2020-04-20T22:40:18Z

In general, I understand the points that are being made that a UUID based identifier isn't in all (many) cases a tracking vector on its own (we might disagree, but I understand the argument). What I'm not getting is any positive argument in favor of a UUID. If we can agree that UUIDs are (at best) a privacy risk that needs to be mediated through other means, why use them? At best its a foot gun…

@jan-ivar
I take your point, and yes, using (say) just increasing ints would be its own privacy risk (still, way better than a UUID, but yes, not perfect to be sure). That was a straw proposal thats hung around since our conversations before TPAC, so I apologize for it. Here is a slightly less straw suggestion:

device ids are integers, chosen at random from the range [0,255], w/o replacement
if needed for web compat reasons, device IDs can be packed into the same of UUIDs (eg.
25500000-0000-0000-0000-000000000000)
device ids are in all other ways treated as the started currently describes (dual key'ed on platforms that support it, reset on storage clear events, etc).

@guest271314

i dont think i fully understand your question, but the claim is that (i) having UUIDs when needed is a bad practice in general (ii) browsers do increasingly sophsiticated and clever things to maintain and minimize storage for users (for privacy, among other reasons), and the idea of a single "clear storage" event increasingly doesn't exist, and (iii) its important to not let one of these device IDs be the key used to rejoin sessions when cookies or other identifiers have been cleared.

guest271314 · 2020-04-21T02:07:01Z

@pes10k

i dont think i fully understand your question

This is what was referring to

(iii) its important to not let one of these device IDs be the key used to rejoin sessions when cookies or other identifiers have been cleared.

The case is not clear. How exactly could or would that happen using only a deviceId or groupId?

If that was the case any site where a MediaStreamTrack is used could, if gather the gist of the access point theory correctly, the site could get the deviceId or groupId and do exactly what those strings?

youennf · 2020-04-21T09:04:43Z

What I'm not getting is any positive argument in favor of a UUID.

I think this is historical. We probably all agree this is not a great model in general and we do not want new APIs to adopt this model.

The current approach is passive fingerprinting neutral, without any additional mitigation.
The additional mitigations we are talking about make it fully fingerprinting neutral. These mitigations are also much more urgently needed by other web technologies, the cost of applying them to device IDs is very low.

I personally feel like we fixed this particular issue and would prefer focusing on other existing privacy issues, like device/track labels.

jan-ivar · 2020-04-21T13:24:04Z

What I'm not getting is any positive argument in favor of a UUID

Note the spec doesn't actually mandate a UUID, only a "generated unique identifier", so your idea would be compliant.

I don't actually know why the spec recommends UUIDs. I suspect most browsers use hashes, not an actual UUID generator. A positive argument for hashes, to answer your question, is efficient implementation (we store one key per origin vs. one id per device per origin).

The benefit of the large entropy is not worrying about collisions (though I agree it is probably larger than necessary in most browsers). If you have an algorithm with less entropy with the same storage needs that might be interesting.

But since the title of this issue is "Stop recommending UUID for deviceId/groupId", and since I haven't heard anyone defend the UUID recommendation specifically, I would actually be in favor of removing the recommendation.

youennf · 2020-04-21T14:01:56Z

A positive argument for hashes, to answer your question, is efficient implementation (we store one key per origin vs. one id per device per origin).

Right, Safari is using that strategy.

I would actually be in favor of removing the recommendation.

I am fine with that too.

pes10k · 2020-04-21T18:20:47Z

@youennf @jan-ivar Do you know the amount of entropy that goes into those per origin seeds (that the hashes are generated from)? If its not a huge number of bits, maybe thats the way to solve this issue.

Again, my main concern here isn't (main) situations where storage (and so device identifiers) is dual keyed; its in the majority of browsers where it isn't, where people use a variety of methods to try and reduce third party storage (extensions, for example), and so where storage may rotate on a different interval than deviceIds, and then having a highly unique device id is a way or linking the storage-based identifiers together. (a large number of people on gecko and blink based browsers, particularly privacy conscious users)

So, hashes instead of UUIDs sounds fine; but constraining the entropy of the input to those hashes would be a very useful step that seems like it could be some middle ground / way forward?

youennf · 2020-04-21T20:04:13Z

where storage may rotate on a different interval than deviceIds

The specification mandates this rotation. Maybe there is a bug in some browsers?
Or this rotation mechanism might not kick in if extensions implement this clean-up purely by injecting JS that deletes all the databases (what about the HTTP cache though?).

So, hashes instead of UUIDs sounds fine; but constraining the entropy of the input to those hashes would be a very useful step that seems like it could be some middle ground / way forward?

This is fine to me as long as we do not add needless constraints to browsers implementing partitioning. This seems somehow hard to spec though.

pes10k · 2020-04-24T03:13:41Z

The specification mandates this rotation. Maybe there is a bug in some browsers?
Or this rotation mechanism might not kick in if extensions implement this clean-up purely by injecting JS that deletes all the databases (what about the HTTP cache though?).

Yes, you put it better than i could! Clearing / managing storage in practice is in practice more complicated and non-binary than what the spec seems to imagine. Even setting aside possible bugs, there are ways of clearing storage (injected JS extension code is just one possible example) that won't (and couldn't, given the diversity of possible policies) be mapped into the browser as "storage clear". Privacy in depth really matters here, I dont mean this as a theoretical kind of concern.

This is fine to me as long as we do not add needless constraints to browsers implementing partitioning. This seems somehow hard to spec though.

I appreciate your point here, and again, am not requesting any particular mitigation. Whatever is easy enough to spec and prevents device ids from being unique / identifier-join-ing-material would be terrific. I though picking identifiers from [0,255] w/o replacement would be an easy to specify, privacy preserving option, but if thats not the case, point taken.

I also appreciate that this issue is long, and I really dont mean to be throwing sand in the gears so close to transition; i really appreciate the work you all do! And I think you did a better job of stating the concern (the first quote above) than i managed to do in a TPAC meeting and a couple thousand rambling words above.

jan-ivar · 2020-04-24T18:38:19Z

The specification mandates this rotation. Maybe there is a bug in some browsers?
Or this rotation mechanism might not kick in if extensions implement this clean-up purely by injecting JS that deletes all the databases (what about the HTTP cache though?).

This seems like something browsers should fix. The sole purpose of all this is recognizing deviceIds in site storage. If browsers can detect sites without storage they should rotate deviceIds.

pes10k · 2020-04-24T22:55:02Z

This seems like something browsers should fix. … If browsers can detect sites without storage they should rotate deviceIds

Im not sure what you mean. There are an infinitely diverse number of reasons browser extensions will modify storage; increasingly browsers are doing so too. Sometimes they might delete all storage, sometimes they may delete or modify some storage values and not others.

My point is that spec seems to imagine there are only two cases a) browser clears all storage, b) storage as usual. My point is that there are many situations in between, and an increasing number, where browsers do storage-related interventions above nothing, but below "clear everything", and in those those cases having highly identifying identifiers is where the privacy harm occurs.

The two solutions I can see are to either a) be more specific about when deviceIds should be rotated, or b) make the deviceIds less identifying. I've been suggesting "B" since it seems much easier of the two, but if you think "A" is the better path, that could work too

pes10k · 2020-06-30T05:52:06Z

I do not agree that #687 addresses this concern.

That PR suggests identifiers around 32 bits in length. That is enough bits to identify ~4 billion devices. Why recommend so much identifiability when the common case user will have < 10 devices on their machine? This seems like far more privacy risk than is warranted
I appreciate the new text, describing the "lower-entropy alternative". However, since this is presented as an alternative (and not the main recommendation), it would be worth describing why this more privacy-friendly approach is not the main suggestion. The text says "storage" but that seems odd, given that the amount of storage needed is minuscule (every device identifier, for every site i've visited, would be less storage than it takes to store my cool Marge avatar.

Put differently, I appreciate that we disagree on how much privacy gain their is by using a less identifying device identifier, but I think its hard to argue that there is at least some privacy improvement (for the reasons given in #682 (comment), among others). If the WG is going to recommend an approach that isn't the most privacy-preserving (and equally as user-serving), I think its important to say why, beyond trivial storage difference.

(Im not trying to draw out this disagreement, but I think its important to fully explain the "why" here)

dontcallmedom added the privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. label Apr 7, 2020

This was referenced Apr 10, 2020

Stop recommending UUID for deviceId/groupId w3cping/tracking-issues#82

Open

Stop recommending UUID for deviceId/groupId w3cping/tracking-issues#86

Closed

jan-ivar self-assigned this Apr 23, 2020

jan-ivar mentioned this issue Apr 24, 2020

Remove UUID recommendation for deviceId and groupId. #687

Merged

alvestrand added the PR exists label Apr 30, 2020

w3cbot added privacy-needs-resolution Issue the Privacy Group has raised and looks for a response on. and removed privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. labels Apr 30, 2020

jan-ivar closed this as completed in #687 Apr 30, 2020

youennf mentioned this issue Jun 18, 2020

fixed, per origin, device ID creates tracking risk #607

Closed

jan-ivar added the privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. label Oct 9, 2020

w3cbot removed the privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. label Oct 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop recommending UUID for deviceId/groupId #682

Stop recommending UUID for deviceId/groupId #682

dontcallmedom commented Apr 7, 2020

youennf commented Apr 8, 2020

dontcallmedom commented Apr 8, 2020

pes10k commented Apr 8, 2020

jan-ivar commented Apr 16, 2020 •

edited

Loading

jan-ivar commented Apr 16, 2020 •

edited

Loading

guest271314 commented Apr 19, 2020

pes10k commented Apr 20, 2020

guest271314 commented Apr 21, 2020

youennf commented Apr 21, 2020

jan-ivar commented Apr 21, 2020

youennf commented Apr 21, 2020

pes10k commented Apr 21, 2020

youennf commented Apr 21, 2020

pes10k commented Apr 24, 2020

jan-ivar commented Apr 24, 2020

pes10k commented Apr 24, 2020

pes10k commented Jun 30, 2020 •

edited

Loading

Stop recommending UUID for deviceId/groupId #682

Stop recommending UUID for deviceId/groupId #682

Comments

dontcallmedom commented Apr 7, 2020

youennf commented Apr 8, 2020

dontcallmedom commented Apr 8, 2020

pes10k commented Apr 8, 2020

jan-ivar commented Apr 16, 2020 • edited Loading

jan-ivar commented Apr 16, 2020 • edited Loading

guest271314 commented Apr 19, 2020

pes10k commented Apr 20, 2020

guest271314 commented Apr 21, 2020

youennf commented Apr 21, 2020

jan-ivar commented Apr 21, 2020

youennf commented Apr 21, 2020

pes10k commented Apr 21, 2020

youennf commented Apr 21, 2020

pes10k commented Apr 24, 2020

jan-ivar commented Apr 24, 2020

pes10k commented Apr 24, 2020

pes10k commented Jun 30, 2020 • edited Loading

jan-ivar commented Apr 16, 2020 •

edited

Loading

jan-ivar commented Apr 16, 2020 •

edited

Loading

pes10k commented Jun 30, 2020 •

edited

Loading