Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: net/http: basic seek support for io.FS #61791

Closed
oliverpool opened this issue Aug 6, 2023 · 11 comments
Closed

proposal: net/http: basic seek support for io.FS #61791

oliverpool opened this issue Aug 6, 2023 · 11 comments

Comments

@oliverpool
Copy link

oliverpool commented Aug 6, 2023

Support for io.FS in net/http was added in 7211694

I have been experimenting lately with serving the content of a tar or a zip file via the http.FS. Supporting Seek completely is the most challenging part (and a bad experience, because it breaks at runtime, on specific files - when the MIME type must be detected).

However after analyzing the code of net/http, it appears that Seek is only called in 2 cases:

  • sniff the MIME ContentType and seek back to the beginning afterward
  • for range requests

Instead of properly implementing Seek I attempted an alternative approach, faking just enough of the Seek method to support:

  • sniff the MIME ContentType and seek at the start afterward (by buffering the first 512 bytes)
  • seek forward for the ranges requests (by discarding the bytes read and failing on non-ascending ranges)

Non-ascending multi-range request will fail on the first request going backward (except if the previous ranges were within sniffLen). And a malicious actor could request the last byte as a range forcing the server to read the whole file (kind of an amplification attack).

I have created an importable package to experiment with this approach:
https://git.sr.ht/~oliverpool/exp/tree/main/item/seekfaker/seekfaker.go

It seems to work (the tests do work at least :), but strongly relies on the internal of net/http (making it quite brittle).

I think it would be a nice addition to the stdlib, to:

  • make it stable
  • improve the integration of io.FS (by reducing drastically the work of io.FS implementers)

The comment added in afd792f could then be reworded to state // The files provided by fsys must implement io.Seeker to efficiently support range requests.

Draft for the changes to the stdlib: https://git.sr.ht/~oliverpool/go/tree/httpfs_seekable/item/src/net/http/fsys.go (adapted from the fs.go file).

2023-08-07: updated wording to address the misunderstanding of #61791 (comment)

@AlexanderYastrebov
Copy link
Contributor

AlexanderYastrebov commented Aug 6, 2023

sniff the MIME ContentType and seek at the start afterward (by buffering the first 512 bytes)

Isn't it seeking to the start already?

go/src/net/http/fs.go

Lines 238 to 248 in 460dc37

if ctype == "" {
// read a chunk to decide between utf-8 text and binary
var buf [sniffLen]byte
n, _ := io.ReadFull(content, buf[:])
ctype = DetectContentType(buf[:n])
_, err := content.Seek(0, io.SeekStart) // rewind to output whole file
if err != nil {
Error(w, "seeker can't seek", StatusInternalServerError)
return
}
}

(Update) This is the problem (also explained in #48781):

go/src/net/http/fs.go

Lines 792 to 798 in 460dc37

func (f ioFile) Seek(offset int64, whence int) (int64, error) {
s, ok := f.file.(io.Seeker)
if !ok {
return 0, errMissingSeek
}
return s.Seek(offset, whence)
}

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Aug 7, 2023
@ianlancetaylor
Copy link
Member

CC @neild @bradfitz

AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue Aug 7, 2023
@neild
Copy link
Contributor

neild commented Aug 7, 2023

I don't think we want the complexity of wrapping a non-io.Seeker in something that tries to implement Seek.

If you want to serve a non-seekable filesystem, like the contents of a tar, you can do so fairly simply with a handler that opens the file and uses io.Copy to serve it. This won't handle Range requests (serving the entire file in response to one), but I believe it will handle Content-Type detection.

But it doesn't seem unreasonable to relax the requirement that fs.File files served by net/http don't need to implement io.Seeker, especially since #51971 is adding ServeFileFS.

If we do that, then I think the way is to change serveContent to not require a seekable input. It can defer content-type detection to ResponseWrite.Write (I'm not sure why it isn't doing that now--historical accident, or is there a better reason?), and either ignore Range requests for non-seekable files or implement them by discarding the skipped portions of the file.

@AlexanderYastrebov
Copy link
Contributor

I think this could also be achieved by a wrapper FS that would wrap Files to implement "stream seek" (forward seek by discarding or backwards seek not further than size of buffered last written bytes).

@neild
Copy link
Contributor

neild commented Aug 7, 2023

I'm not sure why it isn't doing that now--historical accident, or is there a better reason?

Answering my own question: ServeContent can detect a content type when the response doesn't include the first bytes of the file, such as when responding to a HEAD request or a range request.

@oliverpool
Copy link
Author

oliverpool commented Aug 7, 2023

If we do that, then I think the way is to change serveContent to not require a seekable input. It can defer content-type detection to ResponseWrite.Write (I'm not sure why it isn't doing that now--historical accident, or is there a better reason?), and either ignore Range requests for non-seekable files or implement them by discarding the skipped portions of the file.

I think my prototype is implementing this suggestion.

The ioFileSeekFaker struct is just a way to have this functionality implemented without having to add more logic to the already quite complex serveContent. Besides it only slows down the non-seeker case. Implementing this inside serveContent would mean something like:

  • move the DetectContentType buffer at the top level, to be able to serve it later (sendContent = io.MultiReader(bytes.NewReader(ctypeBuf), content))
  • add the io.CopyN(io.Discard, ...) on "simple" range request
  • add the io.CopyN(io.Discard, ...) on "multiple" range request and stop on non-ascending range (or respond with StatusRequestedRangeNotSatisfiable immediately)

PS: I think supporting DetectContentType and "simple" range request would cover most usecases (I have never seen multi-range requests in the wild).

AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue Aug 7, 2023
@rsc
Copy link
Contributor

rsc commented Aug 9, 2023

I don't believe this is a good idea. Essentially everything passed to net/http should implement Seek. Otherwise range requests fail, and if range requests fail, then downloads can't be resumed, structured files can't be fetched incrementally, and so on.

Chrome and most browsers that preview PDF files fetch the specific sections use range requests to fetch the specific file sections they need to render the current page, instead of having to download the entire file just to show page 1. Long ago, before Go handled range requests well, I noticed that reading PDFs on Go servers was incredibly slow. It turned out this was because Chrome was assuming that range requests are always supported, and so it would do a range request for a specific block of the file, Go would send back the entire file, and Chrome would extract the one block it wanted. Then the next block that needed to be read, same thing. It turned viewing a single page from what should have been sub-linear amounts of bandwidth to almost quadratic. Maybe Chrome has been fixed now, but that experience taught me that in the modern world you're just not a real web server if you don't support range requests.

If we did the "seek by reading and discarding" implementation of range requests, that would fix the bandwidth issue in the PDF failure but not the overall cost. It would create a nice low-bandwidth denial-of-service for the server: a client could connect and ask for the very last byte of the file and cause the server to process the entire compressed data stream.

Given that modern web servers must support range requests and range requests must have Seek to be implemented efficiently, it seems OK to me to leave things as they are and strongly encourage fs.File implementations used with net/http to implement Seek.

@rsc
Copy link
Contributor

rsc commented Aug 9, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals Aug 9, 2023
@oliverpool
Copy link
Author

Chrome and most browsers that preview PDF files fetch the specific sections use range requests to fetch the specific file sections they need to render the current page, instead of having to download the entire file just to show page 1.

Apparently it only performs "single-range" requests. Or do you know of any "multiple-range" requests widely used?

It would create a nice low-bandwidth denial-of-service for the server

Yes, this has been acknowledged in the first message. A mitigation would be to limit the max possible seek, however it would mean that the "good" actors will have to download the entire file in case they make such a range-request (making for a terrible user experience, as you noted).

strongly encourage fs.File implementations used with net/http to implement Seek

Currently it is more of an obligation (than just a mere encouragement :)


My current understanding is that such "hack" should be better served by a package outside of the stdlib (with sufficient warning regarding DoS/bad user experience).

And regarding my first argument "make it stable", it is probably not even relevant since the numbers of bytes read for MIME sniffing is unlikely to change in the future.

@rsc
Copy link
Contributor

rsc commented Aug 16, 2023

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Decline in Proposals Aug 16, 2023
@rsc rsc moved this from Likely Decline to Declined in Proposals Aug 30, 2023
@rsc
Copy link
Contributor

rsc commented Aug 30, 2023

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc closed this as completed Aug 30, 2023
@golang golang locked and limited conversation to collaborators Aug 29, 2024
@rsc rsc removed this from Proposals Sep 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants