Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[selectors] Solve :visited once and for all #3012

Open
tabatkins opened this issue Aug 12, 2018 · 44 comments
Open

[selectors] Solve :visited once and for all #3012

tabatkins opened this issue Aug 12, 2018 · 44 comments
Labels
selectors-4 Current Work

Comments

@tabatkins
Copy link
Member

In w3c/css-houdini-drafts#791 @deian explains a new channel for high-bandwidth leaking of :visited state by observing repaints with the Houdini Paint API. There doesn't appear to be a reasonable way to shut this down under the current regime of partial-censoring of :visited - the current fix for Chrome is to just disallow the paint() function from being used at all on any <a> element or its descendants.

We really need to finally define a sensible model of :visible-state visibility, based on what information would have already leaked to the page via standard, unpluggable channels; then we can finally drop all the silliness around :visited and just treat it as a plain, ordinary pseudo-class that allows all properties to be used in the standard fashion.

  1. At minimum, same-origin visitedness is always visible to the page, as the server can track its own cross-links, assuming standard tracking mechanisms exists (cookies, sufficiently high-entropy user identification, etc). So all same-origin links should report :visited.

  2. Cross-origin inbound links are always visible to the page if the Referer header was sent in the request.

  3. Cross-origin outbound links are always visible to the page if the user visited that link from this origin, as there are a multitude of ways to track outbound links (JS auditing, <a ping>, link shorteners, etc).

  4. Any others?


If a link matches one of the conditions above, and is visited, it's allowed to match :visited; otherwise it never matches :visited. So, what's the cost/benefit of each of the conditions above?

1 is easy to apply and non-controversial. It also probably represents at least half of the benefit of :visited styling for the user - they can tell when they've already visited a given page on a site.

But I don't think it's a whole lot more than half - there is a lot of benefit to knowing what links you've visited from a given page, regardless of origin. (Think of clicking thru each of a list of outbound links, such as in a forum post, or in Google search.) So I think 3 is most of the rest of the benefit, but probably the most controversial in terms of theoretical privacy (even if it's nil in practical privacy for the vast majority of users). I think it's reasonable for browsers to nix this condition if the user is blocking script, as that's the primary tracking mechanism.

I think 2 is of relatively minor benefit, but it lies inbetween 1 and 3 in terms of privacy leaking. Some UAs do offer the ability to block Referer, and of course they would then block this visitedness channel, but otherwise it's common and not a big deal. It's just that most cross-origin pages won't have a link back to the page you visited from; the exception is things like weird cross-site web puzzles, or old-school webrings.

Are there any other conditions that would allow us to safely expose :visited state unreservedly?

@GreLI
Copy link

GreLI commented Aug 12, 2018

What about visitedness in local files (having file:///url)? I've saw some concerns about someone can save malicious web-page/SVG-image and leak data with embedded scripts.

@tabatkins
Copy link
Member Author

Browsers already block Referer when the starting page is a local file, so that blocks 2, and browsers already have some magic for assigning local files an origin (I think each file is just in a unique origin?) so that blocks 1. 3 is still available, but only if you open the local file and then click links in it, which yeah, is already observable by a hostile local file.

@astearns
Copy link
Member

Just so I'm clear on the trade off here - I'm reading this as breaking backwards compatibility for some subset of :visited matching in order to allow custom paint on links and more expressive styling for :visited. Is that correct?

@emilio
Copy link
Collaborator

emilio commented Aug 12, 2018

@tabatkins
Copy link
Member Author

@astearns Yes on the cost, but the benefit is more than just more expressive styling; the current :visited hacks aren't totally secure, and still offer some ways to exfiltrate visitedness. This would be finally closing those holes, limited though they may be. It also simplifies styling code, at the cost of more complex visitedness tracking.

@deian
Copy link
Member

deian commented Aug 12, 2018

To add to #3012 (comment) we found several new timing attacks, so the Paint API is not really at fault here. I think your suggestion @tabatkins is great! (With something like this, we'd even be able to expose a getHistory JavaScript API without impacting privacy [not that we should].)

@AmeliaBR
Copy link
Contributor

@astearns This isn't just about the Paint API.

The paper describes other attacks, which can be summarized as: create a CSS rendering context which is very slow to re-render (SVG with lots of overlapping polygons, extensive nested 3D transforms), then trigger a relatively neutral style property change if :visited applies (either by changing the:visited style, or by changing the link href), while using a requestAnimationFrame() loop to test whether a repaint was triggered.

And yes, there are browser mitigations that could address that, too. Maybe force a repaint every time a link's visited status needs to be re-evaluated, regardless of computed style changes. Or maybe never re-evaluate existing links visited status or :visited styles except in response to navigation actions.

But that still doesn't address other problems with :visited, like using color or blend modes (see w3c/fxtf-drafts#18) to make :visited links more or less visible to the user & therefore trick users into revealing which ones are visited through their interactions.

The Paint API vulnerability was unique because of its high-throughput. But the fundamental problem is that rendering of a web page relies on information that the web page authors should not be able to access. And adding mitigations for each rendering pathway seems like trying to patch leaks with duct tape when you could just turn off the water at the faucet.

And all that duct tape just makes a sticky tangle of so many language features. E.g., Other open bugs for :visited include #2263 #2844 #2884 #2037

@AmeliaBR
Copy link
Contributor

Going back to Tab's post:

I agree that those 3 cases cover the main situations where :visited is both useful and of little privacy concern.

The first case should also be of minor implementation cost: you only need to filter out the visited list by the current page origin when matching the pseudoclass.

The other cases would require more implementation overhead, because the history list would need to track referrers for every page. So that might be best left up to browser discretion.

An alternative option would be for the browser to safelist certain origins (e.g., your favourite search engine) on which :visited could apply to cross-origin links without privacy masking. But again, that would need to be at browser discretion, and hopefully with plenty of transparency to the user about the impact of marking an origin as "safe" in this context.

All that said: For authors, having a fully functional, no-silliness, :visited pseudoclass for same-origin links would help do things like showing "unread" badges on article lists without hacking around with white text on white backgrounds.

@yoavweiss
Copy link

Excited to see this issue finally being tackled! This has been a recurring hurdle when talking about exposing paint timing information, as well as other timing APIs.

Regarding 2, it seems like it should somehow be bound to cookie state and request credentials mode (reliable tracking using Referer also requires having some user ID, as their IP could be shared with other users). Like @tabatkins said, I'm not sure supporting it will add many use-cases, and it will certainly add complexity.

@astearns - some of the use-cases for :visited will indeed be lost. I think the main one is the "cool links of the week" case, where users will no longer be able to know which of those links they already visited when seeing them fly by on their favorite social network.
While regrettable, I think it's a small price to pay to fix this issue once and for all.

@MatsPalmgren
Copy link

MatsPalmgren commented Aug 13, 2018

  1. [...] as there are a multitude of ways to track outbound links (JS auditing, , link shorteners, etc).

While it is true that tracking users on the server side is technically possible as you suggest, please note that it is now illegal to do so on EU citizens without their explicit consent (thanks to GDPR). So I don't think your argument that "the server can track this anyway" holds because most sites don't have that consent.

@htstudios
Copy link

The sensible thing to do is deprecated :visited for the general profile and hide it under a user flag that can be enabled for stand-alone web-based software (such as web tech based mobile apps), or manually by the user (globally or per domain).

-> By default no :visited
-> If you enable it, full :visited support without any magic that attempts to hack-fix the issue

@yoavweiss
Copy link

@MatsPalmgren knowing which links the user visited from a site's origin can also be tracked in JS on the client-side, without tracking the user specifically. The :visited changes @tabatkins proposed seem similar to that. (but I'm not a lawyer, so not sure where would either fall as far as GDPR goes).

@inoas
Copy link

inoas commented Aug 13, 2018

That's a no-op argument. What javascript brings in terms of problems to the user is a different domain. Javascript is meant to tinker with behavioral aspects and one can op to disable it and sensible web authors can opt to still gracefully degrade.

Arguing that "because js may do evil, then css should be able to be evil, too" means users got no choice. Separating css from js is not only a historical design accident (remember NN 4.0 which only ran css if you had js enabled) but is also a logical one with benefits for the user.

If css is kept declarative and non-behavioral then privacy concerns will be small/smaller. The domain of js and privacy is another and should be solved over at the js-and-related camp (I certainly hope for a separation of js-std-libs into different parts that can one by one be enabled/disabled by the user, allowing for some js and disallowing other kinds of js).

Again: Why not keep :visited like it used to be, even relax its restrictions but hide it from the general brower profile to be enabled by users/web-apps. Little tech-debt and no privacy issues (with :visited).

@MatsPalmgren
Copy link

@yoavweiss I'm not a lawyer either, but I did read through a GDPR overview a while ago (I'm a EU citizen and interested to know my new rights) and my understanding is that the intent of the law is that collecting/storing any personal data is illegal without prior explicit consent. I haven't read the actual law text, but I'd assume that it's written in general terms to be future-proof against changes in technology, and against attempts to circumvent it. So the technical details of how or where the data is collected/stored doesn't matter.

I'm pretty sure UA vendors have had actual lawyers look at GDPR to make sure storing link history in the UA itself is legal. I don't know what the GDPR says about unintentionally leaking that data to unauthorised parties though, for example through :visited as suggested above. A quick web search on the topic suggests that UA vendors are legally responsible for not doing that. This page claims that "companies have to report a data leak within 72 hours after the leak occured" and that "the fines are severe and the same as with an unauthorised usage of personal data". If I understand that correctly, UA vendors can be fined "€20 million or 4% of global annual turnover (whichever is greater)" if they leak :visited history. Again, I am not a lawyer, so I don't know for sure.

@slightlyoff
Copy link

slightlyoff commented Aug 13, 2018

@astearns @yoavweiss: to the point about the cost and breakage, when I designed this "directed visitedness" approach a few years back (a side-effect of Intersection Observers), the research plan called for quantifying the size of the potential breakage. Several factors we discussed could impact this:

  • what % of sites mark :visited links the same colour as :link or :active? Engines should be able to detect this and report back. The fraction of links coloured this way will be unaffected, subtracting from the impact of the change. Anecdotally, many high-traffic sites already colour their links this way (e.g., Facebook).
  • what fraction of forward navigations are to visibly differentiated :visited links that are not in the same back-forward navigation chain? That is, what faction of the time do users rely on :visited to re-visit sites at some time in the future from their original back/forward clicking?

The last one is hard to reason about. A thorough analysis will require user studies to understand if :visited is frequently used in the negative sense: users relying on it to avoid re-visiting sites they've already seen.

I suppose an apology is in order: I had intended to start this work several years ago but didn't, leaving us with relatively little information to attack the problem with today. Will see if I can make the original design doc public.

@kornelski
Copy link

Browsers could have a database of visited links per origin. In other words, instead of looking up is_visited(url), browsers could look up is_visited(url + origin)

(this is equivalent to tab's 1 and 3. I guess 2 could be added as a special case).

@spinda
Copy link

spinda commented Aug 13, 2018

@kornelski This is essentially what we suggest in the paper presenting the attacks mentioned in the first post (p. 10, "Defense"). Tag history entries with the origin(s) to which they should be visible. To make it robust against side channels, this should probably be implemented at the lowest level possible (i.e., do the filtering at the storage engine responsible for executing history data queries, rather than at the renderer).

@MatsPalmgren
Copy link

MatsPalmgren commented Aug 13, 2018

@kornelski @spinda that doesn't really solve the underlying problem though, it only constrains the problem to the origin (granted, it certainly helps limiting the damage). But, as I understand the GDPR, it is illegal for the UA to leak personal data to the page script in the first place even to the origin. If the page extracts data from the :visited link database without my consent (illegally) then the UA vendor is liable. I don't see why it matters from a legal POV that the origin could have recorded that data itself (illegally) through other means.

@slightlyoff
Copy link

@MatsPalmgren: it's perhaps not worth speculating without more expertise. Someone like @lknik would be in a better position to represent the situation accurately.

@tabatkins
Copy link
Member Author

More explicitly: as non-lawyers, we can do nothing more than vaguely speculate about the legal implications of anything in this domain, particularly for something as wide-ranging but new-and-untested as GDPR compliance in a corner domain like this.

I suggest we stop speculating and ask our actual lawyers about this, if we believe regulation-compliance is relevant for this.

@MatsPalmgren
Copy link

Yup, I'm merely speculating, since I'm not a lawyer. Seems worth raising the point though since you're proposing that the UA can share personal data with the origin without explicit user consent, IIUC. Precisely what the GDPR is intended to prevent.

@lknik
Copy link

lknik commented Aug 13, 2018

This is an interesting idea building on the 20 years of problems of history stealing. But so far what has taught us is that there are quite a lot of ways to exfiltrate this kind of information, some even quite unexpected.

GDPR-wise it's about intent to a large degree (to simplify). So try boiling down to simple rules: just because you technically can, does not mean you should. But on the other hand: deploy tech in a way there are no leaks (state of the art). Still that would be in hands of end users. I would suggest not getting regulations here in the first place, but to focus on a security/privacy model to solve the problem, then see the implications.

And the general starting point ought to be that sites should be unable to allow the detection of (other) sites the user has visited.

@dbaron
Copy link
Member

dbaron commented Aug 13, 2018

A few thoughts here:

  1. While this discussion seems to have been triggered by the paper cited in [css-paint-api] CSS Paint API leaks browsing history css-houdini-drafts#791, it's not clear to me that that paper brings anything particularly new to the discussion about which approach is best for addressing visited link privacy issues introduced through CSS. I think there are two parts to this: (a) I think the high-bandwidth attack that it introduces is fixable as I described in [css-paint-api] CSS Paint API leaks browsing history css-houdini-drafts#791 (comment), and based on discussions we'd had about how we would implement the CSS Paint API in Gecko, I think we would likely have had that fix by default for performance reasons and it wouldn't have needed to be something special for visitedness. (It was also my understanding about how invalidation was intended to interact with the Paint API.) (b) Second, I think the other CSS-related attacks are variants of attacks that focus on performance of repainting in ways that cause differences resulting from link state to delay requestAnimationFrame cycles, allowing one bit of information per possibly-delayed cycle. I think these attacks are probably fixable using the combination of what roc suggested in 2013 in Mozilla bug 557579 and what @AmeliaBR suggested above in [selectors] Solve :visited once and for all #3012 (comment), i.e., by taking an approach that does both repainting whenever it might have happened as a result of an href change or equivalent. I think we haven't focused on fixing these because they didn't appear (I think) to be substantially higher bandwidth than things like cache timing attacks or other attacks at the network level of the platform, but if we had a coherent strategy for mitigating visited leaks throughout the platform (not just in CSS) then I think it would be worth spending more effort on.

  2. Despite (1), I think it's worth considering the approach Tab suggests above in [selectors] Solve :visited once and for all #3012 (comment) (and which I've previously heard suggested by other Googlers) because of the complexity the existing approach to dealing with preventing link history leaks through CSS imposes on adding features to the Web. That complexity has multiple problems: both the time we spend on it, and what happens when we get things wrong. But if we go down this path, I think we should consider the tradeoff against the value that visited links provide to users (and the possibility we could improve that value, for example by counting links that differ only with certain query parameters as equivalent).

  3. I'd note that there is still no proposed remedy within the existing model to the one-bit-per-user-interaction leaks involved in building a game or other user interface where elements are hidden or not depending on whether links were visited. (This was the final bullet point in Jesse Ruderman's original report of CSS visited history sniffing in 2000.) That is clearly an advantage of switching to the alternative model suggested here, though it's not clear to me how important it is given the low rate of information leakage. (For example, if we worry about that... should we also worry about blocking exfiltration of data about the user's mouse movements, finger touches, or scrolling rate while reading text, from which one could infer a good bit about the user's reaction to the text? In other words, where is the limit of what the browser is responsible for blocking?)

  4. I think if we're going to spend efforts making the CSS side more secure here, we should also get commitments from maintainers of other parts of the platform to fix issues in other areas, i.e., commit to making this a solid part of the platform's security model.

  5. I'm not going to speculate on legal issues.

@ewilligers
Copy link
Contributor

An implementation-specific motivation for redefining :visited is that a browser's sandboxed renderer process handling requests for an origin might be compromised. The process might currently be able to query the user's full set of visited links, regardless of inbound and outbound origins.

Ideally, :visited would be defined such that if an origin isn't allowed to know something, a page renderer for that origin doesn't need to know it either.

@deian
Copy link
Member

deian commented Aug 14, 2018

though it's not clear to me how important it is given the low rate of information leakage. (For example, if we worry about that... should we also worry about blocking exfiltration of data about the user's mouse movements, finger touches, or scrolling rate while reading text, from which one could infer a good bit about the user's reaction to the text? In other words, where is the limit of what the browser is responsible for blocking?)

I think there is a distinction between the channels you mention. In particular, as the attacker I don't need to snoop on the event loop to learn history data from a different context. (Arguably this will remain a challenge even with Tab's proposal and not something I think we need to (yet) tackle.) Today, I can just perform the computation on sensitive data myself and sniff your history.

@dbaron
Copy link
Member

dbaron commented Aug 14, 2018

I was talking about those as other sorts of data that are intrinsically sensitive on their own, not as other channels for leaking history data.

@therealglazou
Copy link
Contributor

Just to understand fully, if I already visited site A directly and then I search for it on google.com, the link to A in google results might become unvisited because of the proposed cross-origin outbound rule above?

@yoavweiss
Copy link

Just to understand fully, if I already visited site A directly and then I search for it on google.com, the link to A in google results might become unvisited because of the proposed cross-origin outbound rule above?

Yes. That's one of the use-cases for which support will get dropped, in favor of better user-privacy and improved visited styles where it is available.

@AmeliaBR
Copy link
Contributor

@therealglazou

Just to understand fully, if I already visited site A directly and then I search for it on google.com, the link to A in google results might become unvisited because of the proposed cross-origin outbound rule above?

Yes, based on Tab's proposal.

This is why I suggested an additional option of letting the user safe-list domains for which the full browsing history would be used.

@lknik
Copy link

lknik commented Aug 15, 2018

exfiltration of data about the user's mouse movements, finger touches, or scrolling rate while reading text

Actually, I'm all for solving those too, @dbaron

@deian
Copy link
Member

deian commented Aug 15, 2018

@AmeliaBR:

This is why I suggested an additional option of letting the user safe-list domains for which the full browsing history would be used.

I like this in principle but am generally scared of asking users to intervene (though the current state is worse, so I'd be +1 on this). Are you thinking of it as a permission-like model or browser configuration?

@AmeliaBR
Copy link
Contributor

@deian I was thinking browser configuration (like: I have 3rd-party cookies turned off, but I safelist certain domains that can use them). But that really only benefits superusers who poke around in their browser settings.

Another option would be an API method that could be called from the web page that pops up a permission request, if the website author thinks history access would be beneficial. This seems most practical in the context of a larger JS API.

@css-meeting-bot
Copy link
Member

The Working Group just discussed Solve :visited once and for all.

The full IRC log of that discussion <dael> Topic: Solve :visited once and for all
<dael> github: https://github.com//issues/3012
<dael> TabAtkins: Over weekend there was a new attack on :visited using houdini api for timing channel attacks. Apprently chromium is mitegating by disallowing paint on links dbaron had proposals to reduce bandwidth of channels. I think we shoudl solve this
<dael> TabAtkins: This is leaking history information. I suggest we limit what is exposed to page to things it can have observed and then we can make :visited an ordinary pseudo calls
<dael> TabAtkins: Same origin pages visited are visible. Links that have navigated into your origin so that could be exposed. Finally any links that are visited from your origin are observable through a number of channels. That's the basis of the entire ad industry. That's r easonable to expose
<dael> TabAtkins: I think that gives you all the usefulness of :visited but limits privacy to things we've lost battle. Only thing we're loosing is cool links of the week pages you wont' be able to tell you've visited it before. most cases are links you've visited in the same origin or in search. That's preserved
<dael> TabAtkins: Concerns by Mats that the sort of tracking from the 3rd case with outbound links may violate GDPR. I can't comment on legal issues. I've reached out to our laywers. In the meantimes, does this seem r easonable? It this promising area to push, turning :visited back into a normal pseudo class?
<dael> dbaron: I think Mats wants both sets of restrictions. Adding what you propose without removing existing restrictions.
<fantasai> What are the new restrictions?
<AmeliaBR> One addition to Tab's comment: All of this origin-specific history data would need to be tied into the ability of users to clear their cookies etc.
<fantasai> I can't tell from the minutes
<dael> TabAtkins: That would add a lot of complexity and not give people anything useful. Reduces information leakage, but I'd like to get all the way over the finish line
<dael> astearns: New restrictions. I think it's that :visited only applies to a certain subset of links that follow the restrictions TabAtkins said
<dael> TabAtkins: Any same origin is visiable. Page nav into origin or pages nav from origin and out to. That's all observable already
<smfr> i think we’ll need to talk about this internally at Apple before we can give an OK to breaking link coloring for “links of the week” pages; that seems like a serious usability regression
<dael> fantasai: So when a browser records if something is in history it also needs to see where you clock...you visited w3c and you have to record everywhere that I came in from as well. All ways I clicked to it from would all be recorded together.
<dael> dbaron: Simpier way to think about i t. What you're doing is you're keeping sep history for each origin.
<dael> TabAtkins: It's per origin not per page. Yes. Sep. history database, basically
<dael> astearns: smfr responded on IRC [reads]
<dael> TabAtkins: It would eliminate that use case, yes. That's the major casulty
<dael> astearns: He says they'd have to talk internally before giving an opinion
<dael> TabAtkins: okay
<dael> astearns: Other objections or reservations?
<dael> astearns: I'm not convinced, personally, but don't have an objection to investigating
<dael> TabAtkins: Anyone reviewing with teams, when weighing please do so against the current status quo where you can basically jsut do link coloring. And there will always be timing channel hacks in the current but this would stop that entirely. Benefits of killing status quo are reasonable. Make sure you weigh that against losing particular use cases
<dael> dbaron: I think always going to be timing channel attacks i s a bit strong. I think there are fixes
<dael> TabAtkins: So we can make repaint always observable?
<dael> dbaron: No, you always repaint no matter if you visited
<dael> TabAtkins: Doesn't stop user interaction based
<dael> dbaron: Other trade off is some browsers double key the cache. They may have different trade offs
<dael> astearns: We're at the hour. Sounds like people are interested in discussing so let's continue in the issue.

@tabatkins
Copy link
Member Author

Yeah, I could maybe see a JS requestPermission() option for asking you to give up all of your browsing history to a site; we give approximately the same permission to any extension that can modify all sites. It's a scary permission to give, but I'd turn it on for Google search, for instance.

@astearns astearns removed the Agenda+ label Aug 15, 2018
@xlf1024
Copy link

xlf1024 commented Aug 16, 2018

Idea: In case :visited gets removed or restricted, show the info in the browser status bar, next to where the URL currently is shown when hovering. Of course, this would require the user to hover the link, but it would keep at least some functionality.

@inoas
Copy link

inoas commented Aug 16, 2018

I like that idea a lot - however mobile/touch devices will not work.

@valtlai
Copy link
Contributor

valtlai commented Aug 16, 2018

I like that idea a lot - however mobile/touch devices will not work.

Maybe it can be shown in the context menu (long-press menu) – like alt texts for images (Chrome for Android). Not very neat, I know, but better than nothing.

chrome-android-image-context-menu-alt

@LJWatson
Copy link
Contributor

LJWatson commented Aug 19, 2018

@AmeliaBR

I was thinking browser configuration (like: I have 3rd-party cookies turned off, but I safelist certain domains that can use them). But that really only benefits superusers who poke around in their browser settings.

Another option would be an API method that could be called from the web page that pops up a permission request, if the website author thinks history access would be beneficial. This seems most practical in the context of a larger JS API.

A risk of asking permission, is that the user gives it without understanding the extent of what they're being asked for. Making it a browser configuration protects against that.

We're starting to use permissions more and more to push responsibility for decisions onto users, who for the most part won't understand the ramifications of what they're being asked. To give permission for something like this requires some fairly arcane knowledge that the majority of users simply won't have at their disposal.

@AmeliaBR
Copy link
Contributor

@LJWatson

A risk of asking permission, is that the user gives it without understanding the extent of what they're being asked for. Making it a browser configuration protects against that.

Agreed. That's part of why I didn't think a permission prompt for just :visited CSS made sense. It would need to be worded like "Do you want this website to have access to the entire history of which websites you have visited since [insert date the history was last cleared]?"

And asking for that only makes sense in the context of an API that adds more functionality than just different link colors.

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed Solve visited once and for all.

The full IRC log of that discussion <emilio> Topic: Solve visited once and for all
<emilio> github: https://github.com//issues/3012
<fantasai> ScribeNick: fantasai
<fantasai> ScribeNick: emilio
<emilio> TabAtkins: So we discussed this about 6 month ago with no conclussion
<emilio> TabAtkins: visited is bad and leaks no matter what we do
<emilio> TabAtkins: we should just fix this, only limiting visited to stuff JS can already observe, and making it a regular pseudo-class
<fantasai> s/what we do/what we do, there's always some way to invoke timing channel attacks/
<emilio> TabAtkins: <reads over the issue>
<emilio> TabAtkins: Solving the three points on the issue solves the use cases that I think people care about, and doesn't expose more privacy information for most people
<fantasai> At minimum, same-origin visitedness is always visible to the page, as the server can track its own cross-links, assuming standard tracking mechanisms exists (cookies, sufficiently high-entropy user identification, etc). So all same-origin links should report :visited.
<fantasai> Cross-origin inbound links are always visible to the page if the Referer header was sent in the request.
<fantasai> Cross-origin outbound links are always visible to the page if the user visited that link from this origin, as there are a multitude of ways to track outbound links (JS auditing, <a ping>, link shorteners, etc).
<fantasai> Any others?
<emilio> TabAtkins: these three should be safe to expose to :visited
<fantasai> </paste>
<emilio> Thanks fantasai :-)
<emilio> TabAtkins: Last time Mozilla had some opinions on this
<fantasai> emilio: I thin the general position is that we should try this, but there were some concerns from other Mozilla ppl like Mats, that not keeping the existing restriciton would also not be GDPR compliant
<fantasai> dbaron: I think can try to represent Mats's position
<fantasai> dbaron: Basic idea is that in collecting the data about what sites ppl have visitors, browsers are collecting a substantial pool of privacy-sensitive data.
<fantasai> dbaron: They have an obligation to protect that data as much as they cna.
<fantasai> dbaron: In many cases, the sites themselves have not gathered that data.
<fantasai> dbaron: Given that we have a mechanism for protecting that data right now
<fantasai> dbaron: we don't want to expose that pool of data to sites right now, even if they could collect it because we haven't.
<fantasai> TabAtkins: But how much is because we know we can extract this info right now?
<fantasai> TabAtkins: anything new you could get from this, you could get today via timing attacks.
<fantasai> TabAtkins: Defeating timing channel attacks here means running everything slower
<fantasai> TabAtkins: Doing the rendering work for visited all the time even if it's not being used on the page, etc.
<hober> q+
<fantasai> TabAtkins: Rmemeber the attach is running 10,000 stacked DOM elements with a filter on them if it's visited
<fantasai> fantasai: visited is on filter or?
<hober> q-
<fantasai> TabAtkins: visited below, 10000 visitors above it
<emilio> hober: I'm a little concerned about the usage of sites whose primary purpose is showing a bunch of links
<emilio> hober: it's pretty common to visually filter out the things that are visited
<emilio> hober: so it'd decrease the usefulness of sites we know are very popular
<emilio> TabAtkins: this is the only use case we kill
<emilio> TabAtkins: and I don't see a way to keep it
<emilio> AmeliaBR: my proposal is adding a safelist for history access, the same way browsers expose a setting for third-party cookies
<emilio> AmeliaBR: I don't think that possibility would need to be defined on the spec though
<emilio> TabAtkins: I'm concerned with trying to ask the user to usefully decide about whether Reddit should've access to all their browser history
<emilio> AmeliaBR: otherwise we get back to the same complications
<emilio> heycam: somebody suggested exposing the visited state in some way outside of the page
<emilio> heycam: like a little hover status-bar or such
<emilio> heycam: so there may be some way to expose this in the UI if there was an important necessity of keeping this use-case
<hober> s/a bunch of links/a bunch of links, e.g. reddit or hacker news/
<emilio> fantasai: I don't think I'd want to carefully hover over all the links when I'm searching for something
<emilio> TabAtkins: It'd handle search, since most of the links are found via search anyway, but it'd break link dumps and such
<emilio> florian: even via search, you might want to find something you've visited and you look at the purple link
<emilio> TabAtkins: yeah, but I don't think we can plug this privacy hole
<emilio> fantasai: if you turn Javascript, it can apply in more cases
<emilio> fremy: You may execute a timing attack measuring loading time? Though network may be not reliable enough generally
<emilio> heycam: so this issue seems to have two parts, changing how :visited matches, and changing the restriction of the properties that apply to it
<emilio> TabAtkins: there's no point in keeping the restrictions if we limit what's exposed
<emilio> florian: except the other argument about sites not having collected the data yet
<emilio> heycam: so last time we (Mozilla) discussed this internally, we said that we'd be happy to experiment with some restriction like that, but not with unrestricting the property
<emilio> TabAtkins: I don't see the point
<emilio> hober: compat with existing content, maybe
<emilio> AmeliaBR: do we have some general policy to deal with this "zombie CSS case"?
<emilio> TabAtkins: trying it
<emilio> fremy: I remember some weirdness with javascript links
<emilio> fremy: I think there's a fourth case which is a `javascript:` link I think currently the link becomes visited only until you refresh the page
<emilio> emilio: so same as links and `#hash` links
<emilio> TabAtkins: so dbaron mentioned it was feasible to mitigate side-channel attacks, how feasible do you think it is?
<emilio> dbaron: I think we could reduce the band-width of some of them, but never get rid of them entirely.
<emilio> dbaron: the amount of effort we could spend on this depends on how it competes with extracting the same data via other attacks like cache timing attacks
<emilio> TabAtkins: I'll try to push internally to do some experimentation in this regard
<emilio> TabAtkins: I know that Alex Russel is the original author of this idea and he'd be really happy
<emilio> AmeliaBR: I think it depends on how much users hate to break the search results use cases and such, but it'd give way more flexibility for authors
<emilio> AmeliaBR: if it's going to break sites major sites with user focus you can explain it, but I don't know what the reaction of the average user is
<emilio> hober: besides cleaning up and simplifying :visited, what's the argument for removing the restrictions?
<emilio> TabAtkins: it'd make :visited a regular pseudo-class for authors
<fantasai> Current spec: “Since it is possible for style sheet authors to abuse the :link and :visited pseudo-classes to determine which sites a user has visited without the user’s consent, UAs may treat all links as unvisited links or implement other measures to preserve the user’s privacy while rendering visited and unvisited links differently.”
<emilio> hober: it's weird, but do authors actually complain about that?
<emilio> AmeliaBR and TabAtkins: Yes
<emilio> AmeliaBR: there are use-cases and hacks to show or hide the "unread" using the color matching the background of the text
<emilio> AmeliaBR: and despite of all the restrictions we're still leaking the history
<emilio> AmeliaBR: just because CSS is so complex that if somebody changes rendering somebody smart can figure out
<emilio> florian: so we're annoying people for no good reason
<emilio> fantasai: <quotes the spec> (see above)
<emilio> TabAtkins: that's because my patch was not accepted, because reality is much more complex
<emilio> dbaron: somebody said for no good reason, I think there's one other reason to think about which is a distinction between attacks that are clearly detectable and ones that are not.
<emilio> dbaron: a site can learn about your visited links via somewhat normal code, or via code that is obviously querying your history, and I think it's a distinction it's worth thinking about
<emilio> florian: so there's no technical distinction but maybe legal ones
<emilio> florian: I'd add "Javascript is off" to the list of "safe" scenarios, because then why not?
<emilio> dbaron: some attacks work without javascript, like loading images or fonts
<emilio> florian: alright, then not...
<fantasai> emilio: One question is, one of the objetions wfrom Mats was that websites haven't collected this data, and now we're exposign it
<fantasai> emilio: If we change how it works, a lot of existing history....
<fantasai> emilio: In order to imlement this, you need to change how you store history. It stops being a giant table of all the links you stored.
<fantasai> emilio: You need to track from/to lists.
<fantasai> emilio: That's new data, nobody has colleted it yet.
<fantasai> TabAtkins: Implementation-wise it'll be you start colecting data now, but then don't switch over for a few months
<fantasai> AmeliaBR: Tha'ts why Tab split into 3 parts, we can have different levels of support
<fantasai> AmeliaBR: E.g. SHOULD support :visited on same-origin
<myles_> q+
<fantasai> AmeliaBR: You can do that with info you currently have
<fantasai> AmeliaBR: Next steps could be smarter
<xfq> ack myles_
<Rossen> ack myles_
<emilio> fantasai: so I think one of the discussion is that something that doesn't match any of the categories does not get visited styling at all
<emilio> fantasai: so for same-origin you should be able to use whatever restriction you have
<emilio> *whatever styling you want
<emilio> fremy: that doesn't work because it's observable via timing attacks, and you still need to run styling twice to avoid them
<fantasai> fantasai^: We don't have to do that right now. We could do something more limited for right now while we figure it out
<emilio> AmeliaBR: so right now we have this visited styles and we ignore the properties, and we could check whether it's a same-origin link
<emilio> fremy: so memory-wise you double the cost of styling
<emilio> dbaron: only for link subtrees
<dbaron> dbaron: It's not all elements that need duplicate data, it's just links and their descendants.
<fantasai> fremy was talking about how right now need to store double styling for links, one for unvisited and one for visited.
<fantasai> current duplicated set is limited to just the properties that are allowed for visited; allowing all would mean duplicating all properties
<fantasai> ...
<fantasai> fremy: The other thing wanted to say is that even if you double the memory and you store all the properties twice and do everything twice.
<fantasai> fremy: you can have nested links
<fantasai> fremy: one same-origin and one not
<fantasai> fremy: Then you have to keep track of whether the difference in style is because of the visited styling of the first link or the nested one
<fantasai> emilio: when you have nested link, from the pov of the nested link and its descendant, the nested link is the only link that could be visited on the page
<fantasai> fremy: That's the restriction we have now. But going forward
<fantasai> emilio: Why I think this wouldn't work is you could detect the performance of styling a same-origin link inside a visied..
<fantasai> emilio: Let's say you have a cross-origin, and a same-origin link inside it
<fantasai> emilio: If you don't apply restrictions to that...
<AmeliaBR> Nested links don't really exist. If you create them from the DOM, browsers are a mass of incompatibility in all sorts of ways. But, you could have a `:visited + :visited` selector which could get into a mess of confusion...
<fantasai> emilio: ... as long as links are treated independently ... I have to think harder than this
<fantasai> fremy: Even if the thing we do now works, we have to have special exception so that when you do selector matching, if it's the first link that you encounter from the base...
<fantasai> fremy: Right now this is what browsers do. it's quite messy
<fantasai> fremy: If you allow some to keep all properties and others not, then you have to keep track. I don't think it's a good idea.
<fantasai> TabAtkins: I see why it would be complex at the minimum
<fantasai> AmeliaBR: Gets rid of one of the arugments for these changes, which is that it would simplify style matching
<fantasai> myles_: Timing attacks, one way to solve them is to have repaints mroe predicatble, either more or less often
<fantasai> myles_: Why not pursue that solution?
<fantasai> myles_: instead of making things mroe expressive
<fantasai> TabAtkins: I'm not sure how changing timing of repaints can really solve this
<fantasai> TabAtkins: E.g. on :visited it activates 10,000 filters
<fantasai> emilio: ...
<fantasai> emilio: You need to repaint every time the href changes
<fantasai> emilio: dbaron trid that, was big perf regression
<fantasai> myles_: That was a perf regression, but performing style selection /cascade wasn't?
<fantasai> dbaron: It wasn't the whole tree, just the links. And they usually don't have many descendants
<fantasai> myles_: so recomputing style is cheap but recomputing pixels is not cheap?
<fantasai> dbaron: I think the repainting patch that was a perf regression was to do more repainting than emilio said
<fantasai> dbaron: It repainted whenever an async history lookup finished
<fantasai> dbaron: You start a lookup, you get a result
<fantasai> dbaron: A lot of timing a attacks could resolve by repainting all links instead of just the one that chnaged.
<fantasai> dbaron: but that's really expensive
<fantasai> dbaron: At the time I wrote this, repaint was sync, async landed a week after
<dbaron> s/repaint/history lookup/
<fantasai> myles_: If we're allowing :visited to become more expressive, then we're not breaking any navigation sites
<fantasai> TabAtkins: The proposal was to allow :visited to do more by restricting where it can be used.
<fantasai> AmeliaBR: Changes the balance
<fantasai> AmeliaBR: Some cases get easier, others get impossible
<fantasai> AmeliaBR: Wrt just fixing timing attack level
<fantasai> AmeliaBR: Every time we introduce a new property, someone comes up with a new example
<fantasai> AmeliaBR: Also not all are timing attacks. Some are abusing user interaction
<fantasai> AmeliaBR: Taking properties we've got, making some elements invisible or visible
<fantasai> AmeliaBR: Using the fact that there's a rendering change and then using people to reveal what they're seeing on the screen
<fantasai> iank_: Nasty one is to have full-page pop-up and position X different.
<fantasai> fremy: Which X do you see?
<fantasai> fremy: That's the one you click on
<fantasai> TabAtkins: So even if we solve timing attacks, don't solve all the attacks
<fantasai> TabAtkins: That's why I want to do this in the first place
<fantasai> AmeliaBR: So going back to earlier discussion that, OK, it comes down to what are users going to say if we break the one use case
<fantasai> AmeliaBR: Are any browser teams willing to do some experimentation with that and try to see how many complaints you get?
<fantasai> TabAtkins: I think working with Alex Russell we can try something
<fantasai> Rossen: Do you feel like you have enough, Tab?
<fantasai> TabAtkins: Yeah.

@deian
Copy link
Member

deian commented Feb 26, 2019

All: this is great!
@tabatkins @slightlyoff if it's helpful to have us involved at any point let me know.

@zcorpan
Copy link
Member

zcorpan commented Apr 3, 2024

EyeDropper can also be used to leak :visited (though low-bandwidth). WICG/eyedropper-api#34

What is the status here?

@yoavweiss
Copy link

^^ @miketaylr who showed interest in this at some point.

@kyraseevers
Copy link

@zcorpan @yoavweiss We are currently implementing partitioned :visited links in Chromium - where partitioning is defined as storing visited links history via partition key, rather than just by the link URL. In our implementation, the partition key is: <link url, top-level site, frame origin>. The best place for more info on the design is the explainer: https://github.com/kyraseevers/Partitioning-visited-links-history. Feel free to reach out if you have any questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
selectors-4 Current Work
Projects
None yet
Development

No branches or pull requests