Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Rust structures --> RecordBatch by adding Serde support to RawDecoder (#3949) #3979

Merged
merged 8 commits into from
Apr 5, 2023

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Mar 30, 2023

Which issue does this PR close?

Closes #3949

Rationale for this change

Whilst this is necessarily inefficient, the UX of being able to take an arbitrary serde-compatible data structure and convert it to arrow is pretty compelling.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 30, 2023
}

fn serialize_f64(self, v: f64) -> Result<(), SerializerError> {
serialize_lexical!(self, f64, v)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting to a string only to parse it back again is rather wasteful, a future PR can likely tweak the tape representation to allow storing raw bytes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
serialize_lexical!(self, f64, v)
// Formatting to a string only to parse it back again is rather wasteful,
// could likely tweak the tape representation to allow storing raw bytes
serialize_lexical!(self, f64, v)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

@@ -233,6 +235,38 @@ impl RawDecoder {
self.tape_decoder.decode(buf)
}

/// Serialize `rows` to this [`RawDecoder`]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more compelling example would be to implement some struct

struct MyRow  {
  field: i32,
  name:  String
}

And demonstrate how to turn that into a RecordBatch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a fairly comprehensive example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is very cool. However, I think it is very hard to find (given that it is on "RawDecoder" that is part of arrow_json).

I suggest we add a sentence / link to this example from the front page (and maybe even bring a simpler example to the front page): https://docs.rs/arrow/latest/arrow/index.html#io

Copy link
Contributor

@rtyler rtyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really compelling @tustvold , thank you for taking the time to consider and implement an approach not using Decoder!

@tustvold tustvold marked this pull request as ready for review March 31, 2023 13:22
@tustvold tustvold requested a review from alamb April 2, 2023 16:05
@alamb alamb changed the title Add serde support to RawDecoder (#3949) Support Rust structures --> RecordBatch by adding Serde support to RawDecoder (#3949) Apr 3, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool -- I took it for a whirl

Basically all my comments are related to improving the docs / making this feature discoverable -- it is very cool.

I also tried to rename the title of this PR to make it clearer what this feature really is doing. I think the fact that it is buried in the json decoder will make it hard to for people to discover without some more breadcrumbs

🦾 -- very nice

arrow-json/Cargo.toml Show resolved Hide resolved
}

fn serialize_f64(self, v: f64) -> Result<(), SerializerError> {
serialize_lexical!(self, f64, v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
serialize_lexical!(self, f64, v)
// Formatting to a string only to parse it back again is rather wasteful,
// could likely tweak the tape representation to allow storing raw bytes
serialize_lexical!(self, f64, v)

arrow-json/src/raw/serializer.rs Outdated Show resolved Hide resolved
arrow-json/src/raw/tape.rs Show resolved Hide resolved
@@ -233,6 +235,38 @@ impl RawDecoder {
self.tape_decoder.decode(buf)
}

/// Serialize `rows` to this [`RawDecoder`]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is very cool. However, I think it is very hard to find (given that it is on "RawDecoder" that is part of arrow_json).

I suggest we add a sentence / link to this example from the front page (and maybe even bring a simpler example to the front page): https://docs.rs/arrow/latest/arrow/index.html#io

arrow-json/src/raw/mod.rs Outdated Show resolved Hide resolved
@tustvold tustvold merged commit 2b354a3 into apache:master Apr 5, 2023
@@ -271,6 +271,52 @@
//!
//! Parquet is published as a [separate crate](https://crates.io/crates/parquet)
//!
//! # Serde Compatibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Deserializing Serde DataTypes to Arrow
3 participants