Response.content iterates in needlessly small chunks #3186

vfaronov · 2016-05-11T19:12:34Z

Response.content iterates over the response data in chunks of 10240 bytes. The number 10240 was set in commit 62d2ea8.

After tracing the source code of urllib3 and httplib, I can’t see a reason for this behavior. It all ultimately goes through httplib’s HTTPResponse.readinto, which automatically limits the read size according to Content-Length or the chunked framing.

Therefore, it seems that, if you simply set CONTENT_CHUNK_SIZE to a much larger number (like 10240000), nothing should change, except Response.content will become more efficient on large responses.

Update: it seems like httplib allocates a buffer of the requested size (to be read into), so simply setting CONTENT_CHUNK_SIZE to a large value will cause large chunks of memory to be allocated, which is probably a bad idea.

This is not a problem for me and I have not researched it thoroughly. I’m filing this issue after investigating a Stack Overflow question where this caused an unexpected slowdown for the poster, and a subsequent IRC exchange with @Lukasa. Feel free to do (or not do) whatever you think is right here.

The text was updated successfully, but these errors were encountered:

Lukasa · 2016-05-11T19:43:41Z

It's good to know that httplib allocates a buffer of that size. I think we can probably stretch to double that buffer though: 20kb of buffer is unlikely to be the end of the world.

At the very least, though, we should understand how this works so that we can write documentation to explain this.

kennethreitz · 2016-05-11T20:31:03Z

Originally, I iterated over a chunk size of 1 :)

nateprewitt · 2016-11-17T18:20:01Z

While we're on the topic, we have 4 different default chunk_sizes between all of our iterator functions in Requests. Some I can find reasoning for (CONTENT_CHUNK_SIZE vs.
ITER_CHUNK_SIZE), but others like __iter__ and the default for iter_content aren't entirely clear.

I'm not saying these are wrong, just curious why __iter__ is declared as 128 instead of ITER_CHUNK_SIZE and if there's a reason for still having such a default of 1 on iter_content. Is it related to blocking or file-objects not returning without a full read?

Lukasa · 2016-11-17T19:23:42Z

There is a long, long issue to look at in the backlog. Anyone wanting to make progress on this needs to read and understand #844. Safe to say this is not a good choice for someone who doesn't want to find a really tough slog of a job.

morotti · 2024-07-08T10:55:04Z

just a ping back from the pip project on this 12 years old bug. :)

the iter_content() was set to 10240 bytes 12 years ago in requests. it's a needlessly small size and incur a lot of overhead.
linked bug ticket: real bug in pip, 30% of the time taken by pip to download packages, was just overhead because of using this default chunk size.

I'm quite curious if there is any reason that prevents from updating CONTENT_CHUNK_SIZE to something more reasonable nowadays?
64k-128k-256k would be reasonable values for I/O.

On Linux, the network read buffer was increased to 64k in kernel v4.20, year 2018, the read and write buffer were 16k historically before that.
(they're resized dynamically with the TCP window up to 4MB write 6M read, but let's not get into TCP window sizing, see sysctl_tcp_rmem sysctl_tcp_wmem)
linux code: https://github.com/torvalds/linux/blame/master/net/ipv4/tcp.c#L4775
commit Sep 2018: torvalds/linux@a337531

sethmlarson modified the milestone: Bankruptcy May 19, 2024

morotti mentioned this issue Jul 8, 2024

PERF: download and compute hashes in chunks of 1MB, did you know the progress bar was 30% of the runtime! pypa/pip#12810

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Response.content iterates in needlessly small chunks #3186

Response.content iterates in needlessly small chunks #3186

vfaronov commented May 11, 2016 •

edited

Loading

Lukasa commented May 11, 2016

kennethreitz commented May 11, 2016

nateprewitt commented Nov 17, 2016

Lukasa commented Nov 17, 2016

morotti commented Jul 8, 2024

Response.content iterates in needlessly small chunks #3186

Response.content iterates in needlessly small chunks #3186

Comments

vfaronov commented May 11, 2016 • edited Loading

Lukasa commented May 11, 2016

kennethreitz commented May 11, 2016

nateprewitt commented Nov 17, 2016

Lukasa commented Nov 17, 2016

morotti commented Jul 8, 2024

vfaronov commented May 11, 2016 •

edited

Loading