-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 object/bucket XML parsing issues 2.0.34 #764
Comments
@DanielRedOak I'm really interested in determining the cause and providing a fix. Would you be able to provide a complete XML document for the list object response that causes the failure? I would like to reproduce this locally, and am currently unable to. I'd like to get this patched asap. |
Probably, I'll need to sanitize things first. I'll see about doing that shortly |
I was able to produce a failure myself. No need to send the output. No clue what the cause is yet, but I'll update as I find out more. |
OK, was just about to ask for an email. Let me know if you need anything else! |
If found the root cause of the issue. It appears that Nokogiri will send potentially multiple text events to the sax handler. For example, given the following XML element: <LastModified>2012-12-06T18:33:01.000Z</LastModified> It will typically trigger a single text event with the value The fix is to have the stack frame collect all of the yeilded text values and then have them parsed when the result is accessed, joining the parts first. This explains why other string values might be truncated (as only the last one would be returned). This appears to only affect the Nokogiri backend, but I'm going to test the others. I need to add a text case that triggers this an then post a fix. I hope to have this patched this afternoon. Thank you for reporting the issue! |
The fix was pretty straight-forward. In the sax-parser, when an element has a text value and that element was a known scalar value (e.g. string, timestamp, integer, boolean, etc) it was being type-cast in response to the Now the text values are collected into a list and they are instead joined when the I added the ability to inject a parsing engine and the tests now exercise a dummy parser engine that splits the sample text into individual characters emitting each as their own text element. This initially broke all of the tests until the related change as made, causing the tests to go green again. I'm happy with the change and am inclined to ship this later today. Please feel free to share any feedback you have. Also, if you have a change to checkout the master branch to test the fix, that would be great. |
This is available now in version v2.0.36. |
Just tested again with a very large data set and it seemed to work correctly now. Thanks for knocking this one out quickly and with a detailed explanation! |
Seeing some bad parsing with the changes from 2.0.34 coming from 2.0.33 when dealing with listing larger buckets. Reproduced with jruby 1.7.16.1 (1.9.3p392),which is by for Logstash where I am ultimately consuming this code.
Error:
Some debugging I've added to (aws-sdk-core-2.0.34/lib/aws-sdk-core/xml/parser/stack.rb) to show what it's receiving.
Output:
Added some debugging in aws-sdk-core-2.0.34/lib/aws-sdk-core/xml/parser.rb to validate the XML. Everything looks correct there so its somewhere down the line within the parsing.
The error only shows up when trying to parse the timefields since there is validation on that field, but I see other inconsistencies where it seems to be chopping the key field into multiples as well causing issues.
Not sure whats going on at this point, it only seems to happen when listing a large bucket (which worked up until 2.0.34) which seems to make the parsing fail.
The text was updated successfully, but these errors were encountered: