-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop recommending UUID for deviceId/groupId #682
Comments
Related to or a dupe of #607.
This has been discussed in the past. If you do not have double-keyed storage, users are anyway being tracked with or without device Ids (given IDs are regenerated anytime website data like IDB is cleared). The spec now mentions double keyed storage, maybe the wording should be made stronger? Note also that the spec now mandates device Ids to only be exposed after the page is capturing. |
I don't believe this is correct. If you are reusing the same identifiers across users, they're by definition not identifying that user :) They're only "identifying" when you already have an identifying key to join against (in which case, all bets are off). The goal of making these less identifying is:
This is extremely good news and very appreciated :) |
I agree with @youennf here. E.g. everything may look benign at first: Your USB camera, mic & speaker ids may be Cut to 6 months later: Think of every device you've plugged into your system since then, even for a brief moment: conference room external speakers, a friend's camera, bluetooth headset of every family member. The "new id" counter may now be much higher. Say you purchase a new USB camera, mic, and external speaker after trying a couple in a store, and you got a new bluetooth headset a couple of months ago. Their ids may now be We should weigh that risk against today's origin-unique id, which is not correlatable across first party pages, and prevented from showing up in iframes by default, or may have its iframe storage partitioned in the near future (along with equally damaging JS-created ids in local storage). |
TL;DR: The bits will be based on your USB insertion/removal activity and shopping behavior, which may be quite unique. Sites may even deduce you've made a new purchase since last visit. |
How? What would be tracked? |
In general, I understand the points that are being made that a UUID based identifier isn't in all (many) cases a tracking vector on its own (we might disagree, but I understand the argument). What I'm not getting is any positive argument in favor of a UUID. If we can agree that UUIDs are (at best) a privacy risk that needs to be mediated through other means, why use them? At best its a foot gun… @jan-ivar
i dont think i fully understand your question, but the claim is that (i) having UUIDs when needed is a bad practice in general (ii) browsers do increasingly sophsiticated and clever things to maintain and minimize storage for users (for privacy, among other reasons), and the idea of a single "clear storage" event increasingly doesn't exist, and (iii) its important to not let one of these device IDs be the key used to rejoin sessions when cookies or other identifiers have been cleared. |
This is what was referring to
The case is not clear. How exactly could or would that happen using only a If that was the case any site where a |
I think this is historical. We probably all agree this is not a great model in general and we do not want new APIs to adopt this model. The current approach is passive fingerprinting neutral, without any additional mitigation. I personally feel like we fixed this particular issue and would prefer focusing on other existing privacy issues, like device/track labels. |
Note the spec doesn't actually mandate a UUID, only a "generated unique identifier", so your idea would be compliant. I don't actually know why the spec recommends UUIDs. I suspect most browsers use hashes, not an actual UUID generator. A positive argument for hashes, to answer your question, is efficient implementation (we store one key per origin vs. one id per device per origin). The benefit of the large entropy is not worrying about collisions (though I agree it is probably larger than necessary in most browsers). If you have an algorithm with less entropy with the same storage needs that might be interesting. But since the title of this issue is "Stop recommending UUID for deviceId/groupId", and since I haven't heard anyone defend the UUID recommendation specifically, I would actually be in favor of removing the recommendation. |
Right, Safari is using that strategy.
I am fine with that too. |
@youennf @jan-ivar Do you know the amount of entropy that goes into those per origin seeds (that the hashes are generated from)? If its not a huge number of bits, maybe thats the way to solve this issue. Again, my main concern here isn't (main) situations where storage (and so device identifiers) is dual keyed; its in the majority of browsers where it isn't, where people use a variety of methods to try and reduce third party storage (extensions, for example), and so where storage may rotate on a different interval than deviceIds, and then having a highly unique device id is a way or linking the storage-based identifiers together. (a large number of people on gecko and blink based browsers, particularly privacy conscious users) So, hashes instead of UUIDs sounds fine; but constraining the entropy of the input to those hashes would be a very useful step that seems like it could be some middle ground / way forward? |
The specification mandates this rotation. Maybe there is a bug in some browsers?
This is fine to me as long as we do not add needless constraints to browsers implementing partitioning. This seems somehow hard to spec though. |
Yes, you put it better than i could! Clearing / managing storage in practice is in practice more complicated and non-binary than what the spec seems to imagine. Even setting aside possible bugs, there are ways of clearing storage (injected JS extension code is just one possible example) that won't (and couldn't, given the diversity of possible policies) be mapped into the browser as "storage clear". Privacy in depth really matters here, I dont mean this as a theoretical kind of concern.
I appreciate your point here, and again, am not requesting any particular mitigation. Whatever is easy enough to spec and prevents device ids from being unique / identifier-join-ing-material would be terrific. I though picking identifiers from [0,255] w/o replacement would be an easy to specify, privacy preserving option, but if thats not the case, point taken. I also appreciate that this issue is long, and I really dont mean to be throwing sand in the gears so close to transition; i really appreciate the work you all do! And I think you did a better job of stating the concern (the first quote above) than i managed to do in a TPAC meeting and a couple thousand rambling words above. |
This seems like something browsers should fix. The sole purpose of all this is recognizing deviceIds in site storage. If browsers can detect sites without storage they should rotate deviceIds. |
Im not sure what you mean. There are an infinitely diverse number of reasons browser extensions will modify storage; increasingly browsers are doing so too. Sometimes they might delete all storage, sometimes they may delete or modify some storage values and not others. My point is that spec seems to imagine there are only two cases a) browser clears all storage, b) storage as usual. My point is that there are many situations in between, and an increasing number, where browsers do storage-related interventions above nothing, but below "clear everything", and in those those cases having highly identifying identifiers is where the privacy harm occurs. The two solutions I can see are to either a) be more specific about when deviceIds should be rotated, or b) make the deviceIds less identifying. I've been suggesting "B" since it seems much easier of the two, but if you think "A" is the better path, that could work too |
I do not agree that #687 addresses this concern.
Put differently, I appreciate that we disagree on how much privacy gain their is by using a less identifying device identifier, but I think its hard to argue that there is at least some privacy improvement (for the reasons given in #682 (comment), among others). If the WG is going to recommend an approach that isn't the most privacy-preserving (and equally as user-serving), I think its important to say why, beyond trivial storage difference. (Im not trying to draw out this disagreement, but I think its important to fully explain the "why" here) |
Because UUID are unique, and until we have double-keyed storage widely available, having deviceId (and groupId) being UUID creates a tracking opportunity for
getUserMedia()
callers embedded as third-party iframes.I think having deviceIds be simple monotonically incremented integers should be sufficient for our use case. I'm not sure how we need to be here, beyond no longer recommending to use UUIDs as ids.
/cc @pes10k since we were discussing this while reviewing progress on privacy-related work on the spec
The text was updated successfully, but these errors were encountered: