-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sseclient 0.0.24 loses events due to Unterminated string... ValueError #28
Comments
Digging deeper into this matter, this failure was introduced with 0.0.23 if using this line |
As noted in issue btubbs#28 json.loads(event.data) fails sometimes with ValueError. This error was introduced with release 0.0.23. This test will indicate whether the sseclient implementation works as expected without losing data. See also https://phabricator.wikimedia.org/T222885 Change-Id: I2d042ca1e17f7ec57ad569ce062413f58e1fcd96
I made a pull request PR#29 to check this ugly behaviour |
Running into this exact same issue as well with |
Okay, so I think it comes down to this. Calling Problem is, SSE streams require context. If you dump the headers returned from connecting to an SSE Endpoint, you'll probably notice a What gets pre-pended to every chunk? Size of the chunk.
Using So for example, if you're feeding JSON data through an SSE stream, unless it always matches up perfectly to a chunk, using There's a reason you shouldn't poke into the guts. |
Thanks for digging into that @OverlordQ. Looks like we probably need to revert the "short read" change from @mutantmonkey from a few months ago. (See #8 and 8585bac.) |
Regarding
If this is the issue, then it sounds like we'll need to handle chunked encoding directly in this library, or detect when it is being used and disable the short read functionality. @OverlordQ Do you happen to have an example SSE endpoint that uses chunked encoding that I would be able to test with? I understand the concern with poking into the guts, but isn't the purpose of this library to support server-sent events without requiring a low-level understanding of HTTP? There's no other way to properly deal with timely delivery of messages shorter than the specified |
I would guess that the Wikimedia test case @xqt gave will exhibit this, but http://stream.pushshift.io/ was where I was running into this behavior. And yes while it does say should, it doesn't say must, so it is a case that would to be handled |
When chunked transfer encoding is used, the size of the chunk is prepended to each chunk. When calling read1 on the raw file-pointer, this data is not stripped off and breaks the JSON formatting.
@OverlordQ I just pushed a branch that falls back to the old method when chunked encoding is enabled. Can you give it a try and confirm that it fixes your issue? https://github.com/mutantmonkey/sseclient/tree/disable_short_reads_when_chunked Ideally, chunked encoding support should be implemented directly in this library, but that's going to be more involved. Just disabling short reads when using chunked encoding is a much quicker fix since this behavior is broken. |
When chunked transfer encoding is used, the size of the chunk is prepended to each chunk. When calling read1 on the raw file-pointer, this data is not stripped off and breaks the JSON formatting.
@OverlordQ please see the above? |
I hit the same issue as @OverlordQ today, and stumbled onto this. The "disable_short_reads_when_chunked" fixed the issue for me. I can't speak to the internals because I haven't spent the time digging in that Overlord did, because I found this pretty quickly and its working... but its working. |
mutantmonkeys fix works for me also. |
Disable short reads when chunked (fixes btubbs#28)
I checked this again with sseclient 0.0.23 and had 136 such issues within 10 minutes. During the same time I runned the same script with sseclient 0.0.22 and had no such error. Seems this is caused by this change. |
Yes, I advise to try mutantmonkey's patch (#35), it fixed the issue for me |
There are two patches which would solve it but they are pending for 3 years. |
Can confirm that this bug is still present in May 2023 |
This is still happening in sseclient-0.0.27. Mini repo import json
import sseclient
for event in sseclient.SSEClient('https://stream.wikimedia.org/v2/stream/recentchange'):
if event.event == 'message' and event.data:
try:
change = json.loads(event.data)
except json.JSONDecodeError as err:
print(event, err)
raise Fails with
This appears to be due to a partial read as the input is incomplete JSON. |
Yes. Issue still present. |
I abandoned trying to use this sseclient and did the following which worked well enough for me. It seems to me not hard to introduce incomplete chunk reading into the unit tests but decided against investing in that here as the project seems unmaintained.
|
sseclient 0.0.24 fails randomly when json.load is processed at the EventSource's data. Randomly means that you can find this failure pretty sure if you process say 50 events from stream. The problem occures with Python 3.7; I've not tested other Python releases. This problem does not occur with sseclient 0.0.22.
I've have implemented a test suite to check this behavior, see https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/509132/2/tests/eventstreams_tests.py
Additional details you can find on our bug tracker https://phabricator.wikimedia.org/T222885
The text was updated successfully, but these errors were encountered: