Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blob: URLs and range requests #1070

Closed
mkruisselbrink opened this issue Aug 6, 2020 · 31 comments
Closed

blob: URLs and range requests #1070

mkruisselbrink opened this issue Aug 6, 2020 · 31 comments
Labels
interop Implementations are not interoperable with each other

Comments

@mkruisselbrink
Copy link
Collaborator

The current fetch spec says that all blob: URL requests should result in a 200/OK response (a separate issue, it should probably also mention how to reply if the URL's blob url entry is null). That doesn't entirely match at least chrome's implementation though. In chrome we also support range requests on blob URLs, as long as the range consists of a single range. In that case we reply with a 206 and just the requested range. If multiple ranges are passed in we fail the request with a network error.

I haven't tried what other browsers do for range requests to blob URLs.

@jakearchibald
Copy link
Collaborator

fwiw, I'd like to make this work for the cache API too at some point.

@youennf
Copy link
Collaborator

youennf commented Aug 7, 2020

I haven't tested it but I would hope this is working in Safari as well.
WPT tests would be nice.

@annevk
Copy link
Member

annevk commented Aug 17, 2020

@mkruisselbrink by design the URL's blob URL entry cannot be null. The URL parser does the lookup. I guess perhaps something could still go amiss if it has to transfer process boundaries (OOM) and we should account for that somehow. That should be a network error.

Is this specific to blob or should this also work for data: URLs and about:blank?

@mkruisselbrink
Copy link
Collaborator Author

I'm not sure why URL's blob URL entry wouldn't be able to be null? Step 4 in https://url.spec.whatwg.org/#url-parsing "Set url’s blob URL entry to [...], and null otherwise." There needs to be some way to deal with resolving/fetching blob URLs that don't actually correspond to blobs after all...

I haven't looked into what chrome does for range requests data: URLs or about:blank. Those are at least very different code paths in chrome.

@annevk
Copy link
Member

annevk commented Aug 18, 2020

Thanks, I split out null handling into #1077.

@annevk annevk added the good first issue Ideal for someone new to a WHATWG standard or software project label May 18, 2022
@annevk
Copy link
Member

annevk commented May 18, 2022

This is a good issue for someone with some experience writing web-platform-tests. There's a couple steps toward solving this:

  1. Write tests for range requests to blob: URLs (and ideally also data: URLs and about:blank) and use those tests to tease out what browsers are doing.
  2. If browsers agree, great. We probably just want to specify what they do. If they don't agree, let's discuss the results in this issue.
  3. Adjust the requirements in the specification accordingly. (This might be a little involved and as such I can help out here once it gets to this stage.)

@dlrobertson
Copy link
Member

dlrobertson commented May 25, 2022

This looks like an interesting issue, so I started writing some wpt tests for this. I haven't tried data: or about: urls yet, but I see the following for blob::

Engine Basic Range Multiple Ranges
Gecko 200/ignore range 200/ignore range
Webkit 206/correct range 416
Chrome 206/correct range fetch failed

@dlrobertson
Copy link
Member

blob: turned out to be the most interesting of the bunch. Firefox, Safari, and Chrome all returned the same results for the tests I wrote for data: and about:blank. I'll post a PR to wpt for the two schemes that have some agreement, then I'll post a follow-up PR with blob: tests.

annevk pushed a commit to web-platform-tests/wpt that referenced this issue May 30, 2022

Verified

This commit was signed with the committer’s verified signature.
kianenigma Kian Paimani
See whatwg/fetch#1070 for context.
@annevk
Copy link
Member

annevk commented May 30, 2022

Does anyone have any strong opinions on the answer for multiple ranges? I'm somewhat inclined to side with Chromium given that's how we normally guard things that we might support in the future, but WebKit's approach is reasonable too.

For the specification side of things we'll need to:

  1. Split out the guts of https://fetch.spec.whatwg.org/#simple-range-header-value into a simple Range header parser (perhaps "extract a simple range"?). (If we could find a better word than "simple" that would be great.)
    1. Note that this parser needs to take a header list for consistency with "extract a MIME type" and friends. That also makes it clear that if you have multiple Range headers what value ends up being operated on and why that ends up failing (the comma).
    2. We should also test the various edge cases of this parser, including multiple Range headers and such.
  2. Invoke that parser in scheme fetch and use the return values to slice the blob.
    1. Unfortunately https://w3c.github.io/FileAPI/#slice-method-algo doesn't call into an internal algorithm. This is a general problem with the File API specification. We can either use some language ~"act as if you're using the original unmodified version of this method" or fix it upstream.
  3. We'll then read the sliced blob the same way we read the blob now. And we also need to setup the 206 response, presumably including a Content-Range header. (Tests also need to assert all aspects of this.)

@dlrobertson
Copy link
Member

dlrobertson commented Jun 4, 2022

According to HTTP-RANGE Section 2.1:

The last-byte-pos value gives the byte-offset of the last byte in the range; that is, the byte positions specified are inclusive.

But according to File API 3.3.1:

slice() method returns a new Blob object with bytes ranging from the optional start parameter up to but not including the optional end parameter

This will not impact parsing, but could impact how we read the sliced blob etc. Currently both chrome and safari use the range like an HTTP Range. I think an inclusive range makes sense, as this would be more intuitive to the end user and we can use rangeStart and rangeEnd + 1 to slice the blob.

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Jun 9, 2022

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
…k with a range request, a=testonly

Automatic update from web-platform-tests
Fetch: add tests for data: and about:blank with a range request

See whatwg/fetch#1070 for context.
--

wpt-commits: 333c51b8d345b00b05854d9774b56bc95098f1c3
wpt-pr: 34213
jamienicol pushed a commit to jamienicol/gecko that referenced this issue Jun 10, 2022

Verified

This commit was signed with the committer’s verified signature.
kianenigma Kian Paimani
…k with a range request, a=testonly

Automatic update from web-platform-tests
Fetch: add tests for data: and about:blank with a range request

See whatwg/fetch#1070 for context.
--

wpt-commits: 333c51b8d345b00b05854d9774b56bc95098f1c3
wpt-pr: 34213
annevk added a commit that referenced this issue Sep 26, 2022

Verified

This commit was signed with the committer’s verified signature.
kianenigma Kian Paimani
This provides infrastructure for #1070 by refactoring an existing CORS-related check.

Co-authored-by: Anne van Kesteren <[email protected]>
@dlrobertson
Copy link
Member

Engine Basic Range Multiple Ranges
Gecko 200/ignore range 200/ignore range
Webkit 206/correct range 416
Chrome 206/correct range fetch failed

Chrome updated to ignore the range request when multiple ranges are provided.

@annevk
Copy link
Member

annevk commented Oct 12, 2022

@dlrobertson pointed out in #1501 (review) that whitespace handling is inconsistent between CORS and this. Does that mean implementations use separate parsers for CORS and blob ranges? @rayankans do you have any insights here?

@rayankans
Copy link

Is the question about the chromium implementation? If so it uses the regular CORS parser with extra restrictions, based on the spec changes here. My understanding from the reading of the spec is that white spaces (among other things like suffix ranges) are disallowed. @jakearchibald might be able to provide more insight here.

@annevk
Copy link
Member

annevk commented Oct 14, 2022

@rayankans yeah, in particular whether Chromium uses the same range request parser for CORS and blob URLs.

The discrepancy is that the HTTP specification does allow whitespace in a number of places (although this is not obvious). But it seems okayish if we don't as we already subset the value space. @dlrobertson what do you think? We should add a note about the difference with HTTP though.

@dlrobertson
Copy link
Member

Both Safari and Chromium pass A simple blob range request with whitespace. in web-platform-tests/wpt#34384. This seems to indicate that they both handle whitespace as described in HTTP-RANGE.

The discrepancy is that the HTTP specification does allow whitespace in a number of places (although this is not obvious). But it seems okayish if we don't as we already subset the value space.

Yeah, as a start I think it is reasonable to not handle whitespace.

In the long run I'd guess we'll need a "single range header" and "http-range header" parser. Since both implementations of blob range requests do currently handle whitespace (and it is reasonable to do so for a blob range request). Would it be a significant effort to add support for whitespace in the blob range request case but not in the CORS-safelisted request-header check? Could we add a "no whitespace for me please!" input value to the singe range parser? Alternatively are we able to split up the parser a bit? e.g.

Parse single range header:

  • parse bytes=
  • parse a range

Parse HTTP range header:

  • parse bytes=
  • consume whitespace
  • parse a range

This is where my lack of experience in working on these specs becomes very evident 😄 Feedback on what sorts of things are encouraged and improve readability in cases like this are very much welcomed!

@annevk
Copy link
Member

annevk commented Oct 14, 2022

Both of those approaches are reasonable, but we should also consider being more lenient in the CORS case with regards to whitespace as it seems rather harmless and would allow for more reuse and less branching. Allowing some whitespace with CORS is probably what I'd prefer.

@mnot
Copy link
Member

mnot commented Dec 21, 2022

@annevk what whitespace are you talking about here? There's no OWS in the spec.

@dlrobertson
Copy link
Member

@annevk what whitespace are you talking about here? There's no OWS in the spec.

@mnot I thought the same thing originally, but section 2.1 states:

It also uses a list extension, defined in Section 5.6.1, that allows for compact definition of comma-separated lists using a "#" operator...

The relevant text from Section 5.6.1 states:

A construct "#" is defined, similar to "*", for defining comma-delimited lists of elements. The full form is "#element" indicating at least and at most elements, each separated by a single comma (",") and optional whitespace (OWS, defined in Section 5.6.3).

So range-set could have optional whitespace due to the use of the list extension.

Please let me know if I've missed something!

@mnot
Copy link
Member

mnot commented Dec 21, 2022

the OWS is only allowed around the commas.

@dlrobertson
Copy link
Member

the OWS is only allowed around the commas.

I do see this in Sender Requirements, which we could enforce since we're only parsing a single range. Note that this is not currently done by WebKit or Safari for blob range request parsing.

@annevk
Copy link
Member

annevk commented Dec 21, 2022

@mnot I thought allowing tabs and spaces directly after bytes= due to range-set being defined as 1#range-spec, but I guess I got that wrong. (I think I also confused myself with how whitespace was implicit in earlier iterations of HTTP.)

However, implementations reportedly do allow whitespace after bytes=? (Though not with CORS.) Do they allow whitespace in other places?

@dlrobertson
Copy link
Member

However, implementations reportedly do allow whitespace after bytes=? (Though not with CORS.) Do they allow whitespace in other places?

Yeah, this does appear to be the case with blob range requests. This test passes on WebKit and Blink.

@annevk
Copy link
Member

annevk commented Dec 21, 2022

Thanks @dlrobertson! I added a comment there with a couple other cases we should add coverage for.

I'm somewhat hopeful this is a thing we can still fix (by disallowing whitespace in these places). @youennf @rayankans thoughts on that?

@annevk
Copy link
Member

annevk commented Dec 21, 2022

@mnot so I had a look at https://www.rfc-editor.org/rfc/rfc2616 and unless I'm missing something y'all changed the grammar at some point.

implied *LWS
The grammar described by this specification is word-based. Except
where noted otherwise, linear white space (LWS) can be included
between any two adjacent words (token or quoted-string), and
between adjacent words and separators, without changing the
interpretation of a field. At least one delimiter (LWS and/or
separators) MUST exist between any two tokens (for the definition
of "token" below), since they would otherwise be interpreted as a
single token.

And then in https://www.rfc-editor.org/rfc/rfc2616#section-14.35.1 it does not note otherwise as far as I can tell.

cc @reschke

@reschke
Copy link

reschke commented Dec 21, 2022

My vague recollection is that we looked at a few implementations, and they did not handle LWS consistently, thus we left it out here.

@royfielding
Copy link

Look at the example for range-set:

  • The first, middle, and last 1000 bytes:

    bytes= 0-999, 4500-5499, -1000

That's why it has been implemented differently. The ABNF does not allow a SP after = but one of the examples used a space after the bytes= so that's what people implement. That's an errata.

Probably the right fix is

ranges-specifier = range-unit "=" OWS range-set

though we'd probably have to check other servers to see if any fail on OWS there.

@royfielding
Copy link

I introduced that extra space in httpwg/http-core#217

@dlrobertson
Copy link
Member

Look at the example for range-set:

* The first, middle, and last 1000 bytes:
  bytes= 0-999, 4500-5499, -1000

@royfielding Sorry, but I don't see this example. Which section is this in?

though we'd probably have to check other servers to see if any fail on OWS there.

Is this something we should check before making a decision on what to do here? I'm happy to give running some tests a shot if so. Any info/tips here on running such tests would be helpful if this is something we need to do.

@annevk In cases like this, what do we tend to do? Next steps feel a bit unclear to me, so any feedback would be very helpful and appreciated. It seems like it would be easier to relax the stance in fetch to allow whitespace in the future if something were to change upstream in HTTP rather than add a restriction in the future. That being said I'm new here 😃

@annevk
Copy link
Member

annevk commented Jan 13, 2023

The example is at https://httpwg.org/specs/rfc9110.html#rfc.section.14.1.2 (search for "1000 bytes").

HTTP 2616 allowed whitespace:

  • Before =
  • After =
  • Before -
  • After -
  • Before ,
  • After ,

My inclination would be to keep doing that as browsers implemented that correctly and in theory HTTP does not make incompatible changes except if there is a security concern (and even then it's controversial), which I don't think there is in this case.

Given that we only consume this for blob: URLs which is outside the realm of HTTP I think that is okay.

At the same time, keeping it no-whitespace-allowed for CORS makes sense to me as in that case it's in the realm of HTTP so being stricter will result in more interoperability.

Producing is already no-whitespace-allowed: https://fetch.spec.whatwg.org/#concept-request-add-range-header.

@dlrobertson
Copy link
Member

Thanks, this seems to make sense. So then essentially for blob: URLs we're essentially following the Recipient Requirements (allows whitespace), but for CORS we're following the stricter Sender Requirements. AFAIK the infrastructure for this is implemented in #1564, but perhaps a extra note or comment should be added to clarify things.

@annevk annevk added interop Implementations are not interoperable with each other and removed good first issue Ideal for someone new to a WHATWG standard or software project labels Feb 6, 2023
@annevk annevk closed this as completed in dc0f9e7 Feb 20, 2023
@dlrobertson
Copy link
Member

\o/ Thanks folks for all the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interop Implementations are not interoperable with each other
Development

No branches or pull requests

9 participants