Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response.content iterates in needlessly small chunks #3186

Open
vfaronov opened this issue May 11, 2016 · 5 comments
Open

Response.content iterates in needlessly small chunks #3186

vfaronov opened this issue May 11, 2016 · 5 comments

Comments

@vfaronov
Copy link

vfaronov commented May 11, 2016

Response.content iterates over the response data in chunks of 10240 bytes. The number 10240 was set in commit 62d2ea8.

After tracing the source code of urllib3 and httplib, I can’t see a reason for this behavior. It all ultimately goes through httplib’s HTTPResponse.readinto, which automatically limits the read size according to Content-Length or the chunked framing.

Therefore, it seems that, if you simply set CONTENT_CHUNK_SIZE to a much larger number (like 10240000), nothing should change, except Response.content will become more efficient on large responses.

Update: it seems like httplib allocates a buffer of the requested size (to be read into), so simply setting CONTENT_CHUNK_SIZE to a large value will cause large chunks of memory to be allocated, which is probably a bad idea.

This is not a problem for me and I have not researched it thoroughly. I’m filing this issue after investigating a Stack Overflow question where this caused an unexpected slowdown for the poster, and a subsequent IRC exchange with @Lukasa. Feel free to do (or not do) whatever you think is right here.

@Lukasa
Copy link
Member

Lukasa commented May 11, 2016

It's good to know that httplib allocates a buffer of that size. I think we can probably stretch to double that buffer though: 20kb of buffer is unlikely to be the end of the world.

At the very least, though, we should understand how this works so that we can write documentation to explain this.

@kennethreitz
Copy link
Contributor

Originally, I iterated over a chunk size of 1 :)

@nateprewitt
Copy link
Member

While we're on the topic, we have 4 different default chunk_sizes between all of our iterator functions in Requests. Some I can find reasoning for (CONTENT_CHUNK_SIZE vs.
ITER_CHUNK_SIZE
), but others like __iter__ and the default for iter_content aren't entirely clear.

I'm not saying these are wrong, just curious why __iter__ is declared as 128 instead of ITER_CHUNK_SIZE and if there's a reason for still having such a default of 1 on iter_content. Is it related to blocking or file-objects not returning without a full read?

@Lukasa
Copy link
Member

Lukasa commented Nov 17, 2016

There is a long, long issue to look at in the backlog. Anyone wanting to make progress on this needs to read and understand #844. Safe to say this is not a good choice for someone who doesn't want to find a really tough slog of a job.

@morotti
Copy link

morotti commented Jul 8, 2024

just a ping back from the pip project on this 12 years old bug. :)

the iter_content() was set to 10240 bytes 12 years ago in requests. it's a needlessly small size and incur a lot of overhead.
linked bug ticket: real bug in pip, 30% of the time taken by pip to download packages, was just overhead because of using this default chunk size.

I'm quite curious if there is any reason that prevents from updating CONTENT_CHUNK_SIZE to something more reasonable nowadays?
64k-128k-256k would be reasonable values for I/O.

On Linux, the network read buffer was increased to 64k in kernel v4.20, year 2018, the read and write buffer were 16k historically before that.
(they're resized dynamically with the TCP window up to 4MB write 6M read, but let's not get into TCP window sizing, see sysctl_tcp_rmem sysctl_tcp_wmem)
linux code: https://github.com/torvalds/linux/blame/master/net/ipv4/tcp.c#L4775
commit Sep 2018: torvalds/linux@a337531

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants