-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy array display (#3638) #3647
Conversation
arrow-cast/src/display.rs
Outdated
macro_rules! timestamp_display { | ||
($($t:ty),+) => { | ||
$(impl<'a> DisplayIndexState for &'a PrimitiveArray<$t> { | ||
type State = Option<Tz>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the motivation for the "deviation" from the proposal in #3638, we can parse the timezone once and use this for formatting
09e620b
to
0a87adf
Compare
@@ -1067,7 +1067,7 @@ impl<T: ArrowPrimitiveType> From<ArrayData> for PrimitiveArray<T> { | |||
} | |||
} | |||
|
|||
impl<T: DecimalType + ArrowPrimitiveType> PrimitiveArray<T> { | |||
impl<T: DecimalType> PrimitiveArray<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive by cleanup
} | ||
} | ||
|
||
macro_rules! primitive_display { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the lack of trait specialization means we are forced to use macros here
0a87adf
to
1ad5dc1
Compare
1ad5dc1
to
790f1c9
Compare
790f1c9
to
58d8872
Compare
d9afe59
to
fba93d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think this PR looks quite good and make the code cleaner and easier to understand. Thank you so much @tustvold
I didn't quite get through the entire display.rs
file yet but what I did get through is quite neat and looks good so far.
I believe this is related to #3638
Date64 prints out the encoded time, previously it only printed the date which was incorrect.
While I think I understand the rationale for this, I think it is worth double checking and maybe considering offering an escape valve formatting option if people want the old behavior.
arrow-json/src/writer.rs
Outdated
| DataType::Time32(_) | ||
| DataType::Time64(_) | ||
| DataType::Duration(_) => { | ||
// TODO: Set options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a TODO for this PR or for some future PR? It seems like the default options work well enough for this?
@@ -937,7 +892,7 @@ mod tests { | |||
|
|||
assert_json_eq( | |||
&buf, | |||
r#"{"date32":"2018-11-13","date64":"2018-11-13","name":"a"} | |||
r#"{"date32":"2018-11-13","date64":"2018-11-13T17:11:10.011","name":"a"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea for this change that since Date64 can repreent milliseconds it should be formatted like a timestamp even thought that may be surprising?
I admit the docs aren't very helpful in this respect: https://docs.rs/arrow/32.0.0/arrow/array/type.Date64Array.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps Date64 represents a DateTIme? It is a rather strange data type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3678 <-- I tried to clarify a little
@@ -290,7 +290,7 @@ pub enum IntervalUnit { | |||
} | |||
|
|||
// Sparse or Dense union layouts | |||
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)] | |||
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord, Copy)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
"+---------------------+", | ||
"| f |", | ||
"+---------------------+", | ||
"| 2005-03-18T01:58:20 |", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same change as in the json code -- I realize Date64 has precision greater than a day, but I wonder if people will be surprised by this change 🤔
| {"list": [{"leaf_a": 6, "leaf_b": null}, {"leaf_a": 7, "leaf_b": null}, {"leaf_a": 8, "leaf_b": null}, {"leaf_a": 9, "leaf_b": 1}]} | | ||
| {"list": [{"leaf_a": 10, "leaf_b": null}]} | | ||
+-------------------------------------------------------------------------------------------------------------------------------------+ | ||
+-------------------------------------------------------------------------------------------------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change certainly looks nicer to me -- I wonder if someone wants the old behavior (which looks like JSON), they should use the JSON writer, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if someone wants actual JSON, they should use the JSON writer, what this previously produced was quasi-JSON that wouldn't illegal escape characters
} | ||
|
||
impl<'a> FormatOptions<'a> { | ||
/// If set to `true` any formatting errors will be written to the output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
|
||
/// Implements [`Display`] for a specific array value | ||
pub struct ValueFormatter<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite clever 👏
arrow-cast/src/display.rs
Outdated
impl<'a> ValueFormatter<'a> { | ||
/// Writes this value to the provided [`Write`] | ||
/// | ||
/// Note: this ignores [`FormatOptions::with_display_error`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Note: this ignores [`FormatOptions::with_display_error`] | |
/// Note: this ignores [`FormatOptions::with_display_error`] and | |
/// will return an error on formatting issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished reading this PR -- it is really nicely done @tustvold 🏅
|
||
macro_rules! decimal_display { | ||
($($t:ty),+) => { | ||
$(impl<'a> DisplayIndexState<'a> for &'a PrimitiveArray<$t> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am impressed that you got the types to work out with references within macros 🤯
/// Get the value at the given row in an array as a String. | ||
/// | ||
/// Note this function is quite inefficient and is unlikely to be | ||
/// suitable for converting large arrays or record batches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// suitable for converting large arrays or record batches. | |
/// suitable for converting large arrays or record batches. | |
/// | |
/// Please see [`ArrayFormatter`] for a better interface | |
} | ||
|
||
/// A string formatter for an [`Array`] | ||
pub struct ArrayFormatter<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand this is the main new interface. I think if we added an example to the docs here it would be great and help this usability a lot. A follow on PR would be fine (and I would be happy to help write it as well)
/// | ||
#[derive(Debug, Clone)] | ||
pub struct FormatOptions<'a> { | ||
safe: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no comment explaining safe
Benchmark runs are scheduled for baseline = 9b48f34 and contender = a3b344d. a3b344d is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?
The output for nested types has been made more consistent, fields are no longer enclosed in
"
without any escaping, and the same null sentinel is used as elsewhere.Invalid timezones will result in an error, as opposed to rendering the error to the output string.
Date64 prints out the encoded time, previously it only printed the date which was incorrect.