Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for serde_json arbitrary_precision in arrow-json TapeSerializer #5069

Open
ryanaston opened this issue Nov 12, 2023 · 3 comments
Open
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@ryanaston
Copy link

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Using the decoder provided by arrow-json ReaderBuilder to serialize a serde_json number which was deserialized with the arbitrary_precision feature enabled results in an error. This is due to serde_json's use of a struct to represent the number to avoid loss in precision (serialization and deserializiation). The TapeSerializer understandably sees this as a struct which the associated decoders don't know how to handle.

Describe the solution you'd like
Serialize serde_json arbitrary_precision numbers as strings rather than structs. This will allow the decoders to properly parse the value.

Describe alternatives you've considered
N/A

Additional context
N/A

@tustvold
Copy link
Contributor

tustvold commented Nov 12, 2023

Would it be possible to use strings instead of the arbitrary precision "feature"? I'm not entirely sure about special casing a somewhat peculiar behaviour of one particular serde implementation?

Further is there a reason for going via serde_json's representation instead of decoding the JSON with the arrow-json Reader/Decoder? This will be significantly faster?

@ryanaston
Copy link
Author

Would it be possible to use strings instead of the arbitrary precision "feature"? I'm not entirely sure about special casing a somewhat peculiar behaviour of one particular serde implementation?

Further is there a reason for going via serde_json's representation instead of decoding the JSON with the arrow-json Reader/Decoder? This will be significantly faster?

This is essentially how I got around the issue myself, but there are libraries such as delta-rs which use serde_json to build objects in memory then pass those serde_json values into the serialize method.

I've personally forked delta-rs to serialize the serde_json objects first, then pass those into an arrow-json reader, but with the popularity of serde_json I thought it might be worth addressing here. If not feel free to close.

@tustvold
Copy link
Contributor

tustvold commented Nov 12, 2023

Aah, that's tricky. IMO that is a bug in delta-rs, writers should really not be relying on JSON numbers to reliably round-trip anything other than f64. I can understand accepting data from writers that don't observe this, but delta-rs should probably not be writing such data...

If there is a straighrforward way to support this without regressing performance or introducing false positives, I'd be happy to review it, just observing that the arbitrary_precision feature is disabled by default for a reason 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

2 participants