Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable short reads when chunked (fixes #28) #35

Merged

Conversation

mutantmonkey
Copy link
Contributor

When chunked transfer encoding is used, the size of the chunk is prepended to each chunk. When calling read1 on the raw file-pointer, this data is not stripped off and breaks the JSON formatting.

I have successfully tested this with http://stream.pushshift.io/ which uses chunked encoding and was failing.

When chunked transfer encoding is used, the size of the chunk is
prepended to each chunk. When calling read1 on the raw file-pointer,
this data is not stripped off and breaks the JSON formatting.
@mutantmonkey mutantmonkey force-pushed the disable_short_reads_when_chunked branch from 972921a to 7b6323b Compare August 11, 2019 04:10
@TheSandDoctor
Copy link
Collaborator

TheSandDoctor commented Feb 27, 2020

This seems like a good fix to merge. Thoughts @btubbs ?

Sorry for all the pings recently, you are just the only one with write access as far as I can tell. Have you thought of moving some of these projects to an organization or giving write access to some others (e.g. anyone) so that PRs etc can be more or less managed w/o having to bother you as much?

@vshirikian
Copy link

+1 @btubbs, what's preventing this from merging? It looks good to me too. Fixes the bug that forces me to use a very old version of this nice lib.

@btubbs
Copy link
Owner

btubbs commented May 9, 2020

No opposition to this fix per se, though I worry about there not being a test in the test suite on chunked encoding. We've had enough accidental breakages over the last year or so around optimizations and corner cases that I'd hate to have this in there without some assurance that the next PR won't break it.

@mutantmonkey
Copy link
Contributor Author

mutantmonkey commented Nov 15, 2020

@btubbs I've started working on a test case for chunked encoding, however there are problems with using chunked encoding in practice that I think are going to make reliably testing this difficult. In particular, let's assume I have the following encoded data:

a\r\nevent: hel\r\na\r\nlo\ndata: h\r\na\r\nello1\n\n\nev\r\na\r\nent: hello\r\na\r\n\ndata: hel\r\na\r\nlo2\n\nevent\r\na\r\n: hello\nda\r\na\r\nta: hello3\r\n2\r\n\n\n\r\n0\r\n\r\n

In this example, I've set the chunk size to 10 (a in hex). But when parsing this, we get the event hel which immediately fails, expecting to receive hello. But since event: hel is a correctly-formed message, it's not really wrong to emit an event with the message hel. Any thoughts on how to deal with this? Or do we just want to have a very basic test case to ensure that chunked encoding won't include the chunk size bits as seen in #28?

@grantcurell
Copy link

Checking - is this repo effectively dead? Seems like this fixes a critical bug but it still hasn't been integrated.

@TheSandDoctor TheSandDoctor merged commit 2ddfa4a into btubbs:master May 31, 2023
@TheSandDoctor
Copy link
Collaborator

@grantcurell thanks for pinging on this. I've merged it.

@grantcurell
Copy link

Wanted to follow up - thanks again @TheSandDoctor. I was working this for a customer. They pulled the updated version and it worked like a champ. Really appreciate your efforts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants