-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow engine data cleanup #153
Conversation
|
||
use crate::client::arrow_data::ArrowEngineData; | ||
|
||
pub(crate) struct SyncJsonHandler; | ||
|
||
fn try_create_from_json(schema: SchemaRef, location: Url) -> DeltaResult<ArrowEngineData> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just moved from arrow_data.rs
}; | ||
|
||
pub(crate) struct SyncParquetHandler; | ||
|
||
fn try_create_from_parquet(schema: SchemaRef, location: Url) -> DeltaResult<ArrowEngineData> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just moved from arrow_data.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, just a few nits.
if field.data_type() != &ArrowDataType::Utf8 { | ||
return Err(Error::UnexpectedColumnType(format!( | ||
"On {}: Only support maps of String->String", | ||
field.name() | ||
))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside: Looks like another set of use cases for #148 require!
macro...
kernel/src/client/sync/parquet.rs
Outdated
client::{ | ||
arrow_data::ArrowEngineData, | ||
arrow_utils::{generate_mask, get_requested_indices, reorder_record_batch}, | ||
}, | ||
schema::SchemaRef, | ||
DeltaResult, Error, Expression, FileDataReadResultIterator, FileMeta, ParquetHandler, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: These deeply nested use {}
blocks aren't super readable IMO...
use crate::client::arrow_data::ArrowEngineData;
use crate::client::arrow_utils::{generate_mask, get_requested_indices, reorder_record_batch};
use crate::schema::SchemaRef;
use crate::{DeltaResult, Error, Expression, FileDataReadResultIterator, FileMeta, ParquetHandler};
kernel/src/client/sync/parquet.rs
Outdated
let mut reader = builder.build()?; | ||
let data = reader | ||
.next() | ||
.ok_or(Error::generic("No data found reading parquet file"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: avoid the cost of constructing the error string?
.ok_or(Error::generic("No data found reading parquet file"))?; | |
.ok_or_else(|| Error::generic("No data found reading parquet file"))?; |
(several others, we should probably audit the code and fix them all)
(not sure if there's a way to make clippy warn about it these?)
kernel/src/client/sync/parquet.rs
Outdated
schema.clone(), | ||
location, | ||
); | ||
let d = try_create_from_parquet(schema.clone(), location); | ||
d.map(|d| Box::new(d) as _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: fits comfortably on a single line?
try_create_from_parquet(schema.clone(), location).map(|d| Box::new(d) as _)
let file = File::open( | ||
location | ||
.to_file_path() | ||
.map_err(|_| Error::generic("can only read local files"))?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside: Wow Url::to_file_path... I've never seen Err(())
before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, I know @hntd187 LOVES this error type :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, Nick has heard a good amount of my strong dislike of this error signature, I had hoped they would at least move to something like Infallible
or !
then it comes to stable, but yeah, still ()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Err(())
is Infallible
? It just means the error Result
carries no information other than "I'm an error"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT
This PR does a couple of small things:
EngineData
impl for Arrow and returns errors if it's not all StringsLargeList
types (offsets stored as i64. probably not needed, but trivial to add)GetData
impls forArrowEngineData
into thesrc/client
dir, since that's not specific to thesync
client. Previous code would actually have failed to build if you tried to build only the default client without the sync one.try_create_from_json
andtry_create_from_parquet
functions into the sync client files where they are used, since they are specific to that.closes #121