-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify speculative HTML parsing (preload scanner) #5624
Comments
This is also relevant for whatwg/fetch#590 and |
Here is a very high-level description of the Chromium preload scanner, from @nyaxt: At the high-level, there are two classes involved:
+@richard-townsend-arm to comment on whether any of the above has changed with the new synchronous parser |
Thanks! What is NoStatePrefetch? |
I'm not sure it's a good idea to fully specify it. The preload scanner is a heuristic for faster page loading, and it needs to be possible to evolve it as implementations learn new things about performance, and possibly to evolve it in implementation-specific ways. If observable behavior differences are causing real-world interop problems, then it may make sense to specify constraints on what can be done. Ideally, there would be no preloading that is mandatory, and as few cases as possible that are forbidden. It should be like the http cache, where there are constraints but also enough flexibility to do innovative things like racing disk and network. That said, I'm not the top expert on this area of WebKit. Tagging @cdumez as a person who may know more and @hober to make sure we follow up internally with relevant folks who are not on GitHub. |
https://developers.google.com/web/updates/2018/07/nostate-prefetch |
My understanding is that Gecko's speculative parser is capable of looking further ahead than WebKit's that was inherited into Blink. I'm not interested in making Gecko's speculative parser less capable in order to match WebKit and Blink. IIRC, the reported interop-sensitive problem has been rather niche and isolated: Trying to do server-side responsive images by setting resolution information in a cookie via JS and being unhappy that the browser already fetched the images whose URLs were in the HTML source. However, with a multiprocess architecture, it's increasingly problematic to give guarantees about when script-set cookies are reflected in the network requests that are made, so I think we shouldn't cater to tricks like this. |
@othermaciej thanks for your thoughts. I agree that a full specification might not be desirable. I think the spec can still acknowledge that this optimization exists and specify some rules around how it should behave. This could serve web platform predictability and interoperability with regard to which URLs are speculatively prefetched. @hsivonen thanks. Not wanting to regress on speculative parser capability is understandable. On the script-set cookie case, that's indeed something that doesn't work well together with the preload scanner optimization in particular. For observable behavior, apart from the case @hsivonen mentioned, I'm aware of a few other cases that have been a source of complaints from web developers:
There are likely more issues, this was from a quick search. To facilitate the discussion, I've written a few tentative tests today and a test generation script (so it's easy to add more tests): web-platform-tests/wpt#24521 Test results: https://wpt.fyi/results/html/syntax/preload-scanner.tentative/page-load?sha=ed26ec1897&label=pr_head |
Acknowledging that the optimization exists, and perhaps giving constraints that would resolve some specific differences in observable behavior, seems like probably a good idea. |
I've written some more tests. Here are some things I learned:
I've made the pass conditions be assuming the correct behavior would be to fetch (for a selective set of resource kinds) what would be fetched had the real HTML parser continued without the current script doing any "destructive" Latest test results: https://wpt.fyi/results/html/syntax/speculative-parsing/generated?label=pr_head&max-count=1&pr=24521 Now, I realize that not all of the cases I've tested have a basis of complaints from web developers. I wanted to find and highlight fundamental differences and possible bugs. What to specify is open for discussion. |
I think it makes sense to specify the preloader, in order to make it easier to specify things that rely on it:
While the preload scanner may not be a major point of interop pain, but the differences in behavior are highly observable, through Resource Timing, Service Workers and the servers fielding the actual requests. I agree with @othermaciej and @hsivonen that we wouldn't want such a specification to limit future optimizations, but at this point, I feel this optimization is mature enough that we could specify its basics, and let user agents innovate on top of that. |
Thanks @yoavweiss! New info for me to consider and test:
|
I believe that to simplify the implementation, it prevents any speculative fetches period (at least in Chromium). Might be good to see what other implementations are doing, and maybe do something smarter. |
Here are two demos with From what I can tell with the Network tab devtools, in Safari and Firefox, both cases speculatively fetch the image, ignoring the CSP |
Filed. Thanks! Also: The name of the image test suggests that it's testing |
@hsivonen thank you! The tests are here web-platform-tests/wpt#24521 There are multiple image tests, both |
I looked into trying to specify the integration points of speculative parsing in the spec in #5959 , and I have some questions. Specifically, I'm a bit confused about how to specify speculative parsing for the
|
Isn't it that once we're back in the "outer" tree construction stage:
|
I thought there was special handling for In particular, the document write steps insert input into the input stream, but if the speculative parser is already parsing, it will be beyond the insertion point and thus not parse input without special handling. |
Are you speaking about the Chromium implementation?
Agree that if the speculative parser has already gotten to the end of the input and then document.write happens, special handling will be needed. Is it any more complicated than something equivalent to "the speculative parser begins again at the new areas added to the input stream if it has completed already"? |
I think it’s more that the speculative parser might have to “rewind” back to the insertion point, since it might be past it. |
@mfreed7 right. It will be past it as soon as it has speculatively parsed a single character. It's possible that Gecko has a different strategy with
|
Draft spec for reviewThere is now a draft specification for review in #5959. (Preview of the generated spec) At a high level, the optimization itself is optional, but if it's implemented, it needs to speculatively parse HTML as if it was following the normal HTML parser rules, and only fetch resources that would be fetched from normal processing (if the blocking script does nothing). Which kinds of resources to fetch is implementation-defined. TPAC discussionWe discussed this topic a few days ago in the WHATWG TPAC breakout session. Minutes here: https://www.w3.org/2020/10/26-whatwg-minutes.html#preload Commentary inline:
This about
Henri filed https://bugzilla.mozilla.org/show_bug.cgi?id=1673407
I haven't specified this hash table. |
@mfreed7 Is there any article introduced |
I wrote this article many years ago, but I believe it's still mostly relevant |
@yoavweiss Thanks |
…e HTML parsing, a=testonly Automatic update from web-platform-tests HTML: Add tentative tests for speculative HTML parsing See whatwg/html#5624 -- wpt-commits: 9afacb73a04c1f33837eb0a7ffcd9ec16c9d477f wpt-pr: 24521
…e HTML parsing, a=testonly Automatic update from web-platform-tests HTML: Add tentative tests for speculative HTML parsing See whatwg/html#5624 -- wpt-commits: 9afacb73a04c1f33837eb0a7ffcd9ec16c9d477f wpt-pr: 24521
…e HTML parsing, a=testonly Automatic update from web-platform-tests HTML: Add tentative tests for speculative HTML parsing See whatwg/html#5624 -- wpt-commits: 9afacb73a04c1f33837eb0a7ffcd9ec16c9d477f wpt-pr: 24521
…e HTML parsing, a=testonly Automatic update from web-platform-tests HTML: Add tentative tests for speculative HTML parsing See whatwg/html#5624 -- wpt-commits: 9afacb73a04c1f33837eb0a7ffcd9ec16c9d477f wpt-pr: 24521
From #1349 (comment) , @chrishtr proposes spec edits to explicitly specify a preload scanner (as one part of a larger proposal on behavior for stylesheet loading). But the proposal doesn't say which exact approach to standardize. I believe this is implemented in slightly different ways:
Gecko: https://web.archive.org/web/20201021003137/https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading
Chromium: #5624 (comment)
WebKit: ?
Are there documents that explain how this is implemented in Chromium and WebKit?
Is there interest in aligning on the observable behavior here?
cc @whatwg/html-parser @othermaciej @hsivonen @richard-townsend-arm @lilles @mfreed7
The text was updated successfully, but these errors were encountered: