-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Borrow<Value> on infer_json_schema_from_iterator #3728
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a suggestion on how to avoid the Box.
FWIW the Decoder
that accepts Value
is intended to be deprecated soon, see #3718, I'd be interested to hear your thoughts on this
arrow-json/src/reader.rs
Outdated
where | ||
I: Iterator<Item = Result<Value, ArrowError>>, | ||
I: Iterator<Item = Result<V, ArrowError>>, | ||
V: AsRef<Value>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V: AsRef<Value>, | |
V: Borrow<Value>, |
I think this should remove the need for the boxing
Thank you for both suggestions, I really appreciate them! I didn't remember about Borrow when coding it. About the RawReader, awesome, I'll start using it right away! However, I still need to use the Then I save this schema into a file to be used later, but I'm using tokio::fs to work with files, it'd be nicer to have the RawReader also support async work. It's not a huge issue, though, I can use std fs for this particular operation. On a separate note, I don't think exposing Values is a bad thing, sure a user doesn't need it, but when we're building a library on top of Arrow, it's nice to have access to lower level apis. |
…the json and get the schema. This avoids unnecessary cloning the entire json.
0328bce
to
34e4986
Compare
You might want to try out RawDecoder in that case, it is designed for use in async contexts. You could possibly use it in conjunction with AsyncBufRead. FYI no operating systems currently support async file IO, tokio actually just calls
The reason for moving away from |
Thank you |
Benchmark runs are scheduled for baseline = 221533f and contender = 59016e5. 59016e5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Thank you so much! I'll take a look at the RawDecoder. Because I'm also streaming multiple json files before saving them locally into a single file, which I'm providing to the RawReader. I couldn't test everything yet, I'm certain there are improvements I need to do before finalizing everything. |
Which issue does this PR close?
None
Rationale for this change
In a project I'm working on, we need to save the json data and the schema into a different file. Today we need to clone the original json, where it'd be much easier to just use a read-only reference.
Are there any user-facing changes?
No, using Borrow makes any owned instance still work here.