-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Decimal32/Decimal64 support #683
Conversation
8fceed3
to
950b840
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking this on! Just one preliminary note about avoiding n_words == 0
and a suggestions about the tests 🙂
(I think you've seen it, but a heads up that a PR just merged making Arrow C++ optional!)
@@ -916,13 +922,14 @@ static inline void ArrowDecimalInit(struct ArrowDecimal* decimal, int32_t bitwid | |||
memset(decimal->words, 0, sizeof(decimal->words)); | |||
decimal->precision = precision; | |||
decimal->scale = scale; | |||
// n_words will be 0 for bitwidth == 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to avoid the n_words == 0
situation? (Or could it be explained in a comment somewhere how it works?).
Alternatively, could we special-case the bit in ArrowArrayViewGetDecimalUnsafe()
to populate the ArrowDecimal
as if it were a 64-bit decimal by copying the appropriate bytes? Or update the words to be smaller (seems hard)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can definitely add an explanation in a comment for it, but I don't know how to avoid the n_words == 0
situation without updating words to be smaller, personally. I can do that if you want, but I agree it would be hard given how often it is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrow C++ still operates against 64 bit words with decimals right? Guessing the 32 bit implementation upstream still needs to be worked out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrow C++ utilizes templates and other techniques to represent Decimal32 as a 32-bit integer while using 64-bit words for everything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see!
Can you add a note to the documentation for n_words
explaining that a value of 0 is special-cased for 32-bit decimals?
arrow-nanoarrow/src/nanoarrow/common/inline_types.h
Lines 893 to 894 in 253b7ec
/// \brief An array of 64-bit integers of n_words length defined in native-endian order | |
uint64_t words[4]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note for the documentation of both n_words
and words
to indicate the behavior for 32-bit decimal values.
@@ -3821,6 +3901,80 @@ TEST(ArrayViewTest, ArrayViewTestGetIntervalMonthDayNano) { | |||
ArrowArrayRelease(&array); | |||
} | |||
|
|||
TEST(ArrayViewTest, ArrayViewTestGetDecimal32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way we can avoid some of the copy/paste here by something like:
void TestArrayViewGetDecimal(ArrowType type, int precision, int scale) { ... }
void TestArrayViewDecimalArrowRoundtrip(ArrowType type, int precision, int scale, BuilderT builder) { ... }
TEST(ArrayViewTest, ArrayViewTestGetDecimal32) {
TestArrayViewGetDecimal(...);
#if defined(BUILD_TESTS_WITH_ARROW)
TestArrayViewDecimalArrowRoundtrip();
#endif
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll do this in a follow-up (not your fault I didn't know how to use GTest properly when I wrote the first two!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I can try to figure out something now if you prefer. But i would definitely prefer it as a follow-up lol
@paleolimbot Any idea why the R build is failing? |
My guess is that you inserted the type ids in the middle of the list, which broke: Lines 433 to 479 in 2930787
=While we don't guarantee the stability of those values (i.e., we could just update the values in the .R file), it's probably safer to add new types to the end of the enum. |
Also a note that you should be able to check big endian locally by running:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Just one note about the documentation for ArrowDecimal::n_words
if you don't mind updating that before merging 🙂
@@ -3821,6 +3901,80 @@ TEST(ArrayViewTest, ArrayViewTestGetIntervalMonthDayNano) { | |||
ArrowArrayRelease(&array); | |||
} | |||
|
|||
TEST(ArrayViewTest, ArrayViewTestGetDecimal32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll do this in a follow-up (not your fault I didn't know how to use GTest properly when I wrote the first two!)
@@ -916,13 +922,14 @@ static inline void ArrowDecimalInit(struct ArrowDecimal* decimal, int32_t bitwid | |||
memset(decimal->words, 0, sizeof(decimal->words)); | |||
decimal->precision = precision; | |||
decimal->scale = scale; | |||
// n_words will be 0 for bitwidth == 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see!
Can you add a note to the documentation for n_words
explaining that a value of 0 is special-cased for 32-bit decimals?
arrow-nanoarrow/src/nanoarrow/common/inline_types.h
Lines 893 to 894 in 253b7ec
/// \brief An array of 64-bit integers of n_words length defined in native-endian order | |
uint64_t words[4]; |
Initial implementation of Decimal32/Decimal64 support in nanoarrow.
Tests will be added in a bit, but I figured I'd get a draft up so people can take a look in the meantime.