-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: coerce_primitive for serde decoded data #5101
Conversation
Signed-off-by: fan <[email protected]>
10d1942
to
98c6e43
Compare
arrow-json/src/reader/mod.rs
Outdated
{"a": 1, "b": "A", "c": "T"}, | ||
{"a": 2, "b": "BB", "c": "F"}, | ||
{"a": 3, "b": 123, "c": false} | ||
]"#; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could test with some numbers that are larger than i32::MAX, i.e. require i64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could test with some numbers that are larger than i32::MAX, i.e. require i64
done, add i32::MAX and like i64::MAX tests.
@@ -89,6 +97,12 @@ impl<O: OffsetSizeTrait> ArrayDecoder for StringArrayDecoder<O> { | |||
TapeElement::Number(idx) if coerce_primitive => { | |||
builder.append_value(tape.get_string(idx)); | |||
} | |||
TapeElement::I64(n) | TapeElement::I32(n) if coerce_primitive => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handling of I64 is incorrect here, this will only use the low bits. There are some examples elsewhere in how to interpret this tape element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handling of I64 is incorrect here, this will only use the low bits. There are some examples elsewhere in how to interpret this tape element
I'm a little confused about the design here. Why we don't use a i64
to save the value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the TapeElement would then need to increase in size from the current 64-bits to 128-bits (for alignment reasons), tape decoding is exceptionally hot and so concerns like this actually matter 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the TapeElement would then need to increase in size from the current 64-bits to 128-bits (for alignment reasons), tape decoding is exceptionally hot and so concerns like this actually matter 😅
I get it. Just for performance. 😺
Signed-off-by: fan <[email protected]>
_ => unreachable!(), | ||
} | ||
} | ||
TapeElement::I32(n) if coerce_primitive => { | ||
builder.append_value(n.to_string()); | ||
} | ||
TapeElement::F32(n) | TapeElement::F64(n) if coerce_primitive => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do the same for f64 please 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do the same for f64 please 😄
done.
Signed-off-by: fan <[email protected]>
TapeElement::I64(n) | TapeElement::I32(n) if coerce_primitive => { | ||
data_capacity += n.to_string().len(); | ||
} | ||
TapeElement::F32(n) | TapeElement::F64(n) if coerce_primitive => { | ||
data_capacity += n.to_string().len(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TapeElement::I64(n) | TapeElement::I32(n) if coerce_primitive => { | |
data_capacity += n.to_string().len(); | |
} | |
TapeElement::F32(n) | TapeElement::F64(n) if coerce_primitive => { | |
data_capacity += n.to_string().len(); | |
} | |
TapeElement::I64(n) | TapeElement::I32(n) | TapeElement::F32(n) | TapeElement::F64(n) => { | |
data_capacity += 10; // An arbitrary estimate | |
} |
This avoids serializing to a string multiple times, this could be improved if it becomes a bottleneck, but this is probably good enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This improvement looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good thank you, just a minor nit
Signed-off-by: fan <[email protected]>
Which issue does this PR close?
Closes #5095
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?