-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Rust structures --> RecordBatch
by adding Serde
support to RawDecoder
(#3949)
#3979
Conversation
arrow-json/src/raw/serializer.rs
Outdated
} | ||
|
||
fn serialize_f64(self, v: f64) -> Result<(), SerializerError> { | ||
serialize_lexical!(self, f64, v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting to a string only to parse it back again is rather wasteful, a future PR can likely tweak the tape representation to allow storing raw bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serialize_lexical!(self, f64, v) | |
// Formatting to a string only to parse it back again is rather wasteful, | |
// could likely tweak the tape representation to allow storing raw bytes | |
serialize_lexical!(self, f64, v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍
@@ -233,6 +235,38 @@ impl RawDecoder { | |||
self.tape_decoder.decode(buf) | |||
} | |||
|
|||
/// Serialize `rows` to this [`RawDecoder`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a more compelling example would be to implement some struct
struct MyRow {
field: i32,
name: String
}
And demonstrate how to turn that into a RecordBatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a fairly comprehensive example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is very cool. However, I think it is very hard to find (given that it is on "RawDecoder" that is part of arrow_json).
I suggest we add a sentence / link to this example from the front page (and maybe even bring a simpler example to the front page): https://docs.rs/arrow/latest/arrow/index.html#io
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really compelling @tustvold , thank you for taking the time to consider and implement an approach not using Decoder
!
RecordBatch
by adding Serde
support to RawDecoder
(#3949)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very cool -- I took it for a whirl
Basically all my comments are related to improving the docs / making this feature discoverable -- it is very cool.
I also tried to rename the title of this PR to make it clearer what this feature really is doing. I think the fact that it is buried in the json decoder will make it hard to for people to discover without some more breadcrumbs
🦾 -- very nice
arrow-json/src/raw/serializer.rs
Outdated
} | ||
|
||
fn serialize_f64(self, v: f64) -> Result<(), SerializerError> { | ||
serialize_lexical!(self, f64, v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serialize_lexical!(self, f64, v) | |
// Formatting to a string only to parse it back again is rather wasteful, | |
// could likely tweak the tape representation to allow storing raw bytes | |
serialize_lexical!(self, f64, v) |
@@ -233,6 +235,38 @@ impl RawDecoder { | |||
self.tape_decoder.decode(buf) | |||
} | |||
|
|||
/// Serialize `rows` to this [`RawDecoder`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is very cool. However, I think it is very hard to find (given that it is on "RawDecoder" that is part of arrow_json).
I suggest we add a sentence / link to this example from the front page (and maybe even bring a simpler example to the front page): https://docs.rs/arrow/latest/arrow/index.html#io
@@ -271,6 +271,52 @@ | |||
//! | |||
//! Parquet is published as a [separate crate](https://crates.io/crates/parquet) | |||
//! | |||
//! # Serde Compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Which issue does this PR close?
Closes #3949
Rationale for this change
Whilst this is necessarily inefficient, the UX of being able to take an arbitrary serde-compatible data structure and convert it to arrow is pretty compelling.
What changes are included in this PR?
Are there any user-facing changes?