-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unwrap on datetime cast for CSV writer #3570
Conversation
arrow-csv/src/writer.rs
Outdated
let mut writer = Writer::new(&mut file); | ||
let batches = vec![&batch, &batch]; | ||
for batch in batches { | ||
writer.write(batch).map_err(|e| { dbg!(e.to_string()); assert!(e.to_string().ends_with(invalid_cast_error("arrow_array::array::primitive_array::PrimitiveArray<arrow_array::types::Date64Type>".to_owned(), 1, 1).to_string().as_str()))}).unwrap_err(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me know if error message like
Cannot cast to arrow_array::array::primitive_array::PrimitiveArray<arrow_array::types::Date64Type> at row index: 1, col index: 1
is enough user friendly. The long type name concerns me a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about printing the DataType instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor nits
arrow-csv/src/writer.rs
Outdated
@@ -88,6 +88,35 @@ where | |||
lexical_to_string(c.value(i)) | |||
} | |||
|
|||
fn invalid_cast_error(dt: String, col_index: usize, row_index: usize) -> ArrowError { | |||
let mut s = String::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format! might be a nicer way to write this, should also perform better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tustvold
Reg to https://github.com/hoodie/concatenation_benchmarks-rs the format macro is 5x slower than string push. But you right, the exception is evauluated in ok_or_else
whic is lazy and won't affect performance on millions of rows, Will fix it
ArrowError::CastError(s) | ||
} | ||
|
||
macro_rules! write_temporal_value { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a macro or can it be a generic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic was too complex, as we need to use different conversion functions depending on the type, I have tried generics first and then falled back to macros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah I missed the $f
arrow-csv/src/writer.rs
Outdated
let mut writer = Writer::new(&mut file); | ||
let batches = vec![&batch, &batch]; | ||
for batch in batches { | ||
writer.write(batch).map_err(|e| { dbg!(e.to_string()); assert!(e.to_string().ends_with(invalid_cast_error("arrow_array::array::primitive_array::PrimitiveArray<arrow_array::types::Date64Type>".to_owned(), 1, 1).to_string().as_str()))}).unwrap_err(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about printing the DataType instead?
arrow-csv/src/writer.rs
Outdated
@@ -88,6 +88,25 @@ where | |||
lexical_to_string(c.value(i)) | |||
} | |||
|
|||
fn invalid_cast_error(dt: String, col_index: usize, row_index: usize) -> ArrowError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn invalid_cast_error(dt: String, col_index: usize, row_index: usize) -> ArrowError { | |
fn invalid_cast_error(dt: &str, col_index: usize, row_index: usize) -> ArrowError { |
Might simplify some things
arrow-csv/src/writer.rs
Outdated
}}; | ||
} | ||
/// A CSV writer | ||
#[derive(Debug)] | ||
pub struct Writer<W: Write> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
}}; | |
} | |
/// A CSV writer | |
#[derive(Debug)] | |
pub struct Writer<W: Write> { | |
}}; | |
} | |
/// A CSV writer | |
#[derive(Debug)] | |
pub struct Writer<W: Write> { |
arrow-csv/src/writer.rs
Outdated
&self.date_format, | ||
col_index, | ||
row_index, | ||
value_as_datetime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value_as_datetime, | |
value_as_date, |
Done folks, thanks for the review |
Benchmark runs are scheduled for baseline = acaba0a and contender = 24e5dae. 24e5dae is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #3547 .
Rationale for this change
CSV writer panics if cast cannot be done, we should handle this gracefully
What changes are included in this PR?
Remove unwrap on datetime cast for CSV writer
Are there any user-facing changes?