feat: add Decimal32/Decimal64 support #683

zeroshade · 2024-11-15T23:06:05Z

Initial implementation of Decimal32/Decimal64 support in nanoarrow.

Tests will be added in a bit, but I figured I'd get a draft up so people can take a look in the meantime.

paleolimbot

Thank you for taking this on! Just one preliminary note about avoiding n_words == 0 and a suggestions about the tests 🙂

(I think you've seen it, but a heads up that a PR just merged making Arrow C++ optional!)

paleolimbot · 2024-11-19T16:59:25Z

src/nanoarrow/common/inline_types.h

@@ -916,13 +922,14 @@ static inline void ArrowDecimalInit(struct ArrowDecimal* decimal, int32_t bitwid
  memset(decimal->words, 0, sizeof(decimal->words));
  decimal->precision = precision;
  decimal->scale = scale;
+  // n_words will be 0 for bitwidth == 32


Is there any way to avoid the n_words == 0 situation? (Or could it be explained in a comment somewhere how it works?).

Alternatively, could we special-case the bit in ArrowArrayViewGetDecimalUnsafe() to populate the ArrowDecimal as if it were a 64-bit decimal by copying the appropriate bytes? Or update the words to be smaller (seems hard)?

I can definitely add an explanation in a comment for it, but I don't know how to avoid the n_words == 0 situation without updating words to be smaller, personally. I can do that if you want, but I agree it would be hard given how often it is used.

Arrow C++ still operates against 64 bit words with decimals right? Guessing the 32 bit implementation upstream still needs to be worked out?

https://github.com/apache/arrow/blob/cd3321b28b0c9703e5d7105d6146c1270bbadd7f/cpp/src/arrow/util/decimal.cc#L527

Arrow C++ utilizes templates and other techniques to represent Decimal32 as a 32-bit integer while using 64-bit words for everything else.

I see!

Can you add a note to the documentation for n_words explaining that a value of 0 is special-cased for 32-bit decimals?

arrow-nanoarrow/src/nanoarrow/common/inline_types.h

Lines 893 to 894 in 253b7ec

/// \brief An array of 64-bit integers of n_words length defined in native-endian order

uint64_t words[4];

Added a note for the documentation of both n_words and words to indicate the behavior for 32-bit decimal values.

paleolimbot · 2024-11-19T17:07:38Z

src/nanoarrow/common/array_test.cc

@@ -3821,6 +3901,80 @@ TEST(ArrayViewTest, ArrayViewTestGetIntervalMonthDayNano) {
  ArrowArrayRelease(&array);
 }

+TEST(ArrayViewTest, ArrayViewTestGetDecimal32) {


Is there any way we can avoid some of the copy/paste here by something like:

void TestArrayViewGetDecimal(ArrowType type, int precision, int scale) { ... } void TestArrayViewDecimalArrowRoundtrip(ArrowType type, int precision, int scale, BuilderT builder) { ... } TEST(ArrayViewTest, ArrayViewTestGetDecimal32) { TestArrayViewGetDecimal(...); #if defined(BUILD_TESTS_WITH_ARROW) TestArrayViewDecimalArrowRoundtrip(); #endif }

We'll do this in a follow-up (not your fault I didn't know how to use GTest properly when I wrote the first two!)

Thanks, I can try to figure out something now if you prefer. But i would definitely prefer it as a follow-up lol

zeroshade · 2024-11-19T17:29:44Z

@paleolimbot Any idea why the R build is failing?

paleolimbot · 2024-11-19T17:43:40Z

My guess is that you inserted the type ids in the middle of the list, which broke:

arrow-nanoarrow/r/R/type.R

Lines 433 to 479 in 2930787

    
           # These values aren't guaranteed to stay stable between nanoarrow versions, 
        
           # so we keep them internal but use them in these functions to simplify the 
        
           # number of C functions we need to build all the types. 
        
           NANOARROW_TYPE <- list( 
        
             UNINITIALIZED = 0, 
        
             "NA" = 1L, 
        
             BOOL = 2L, 
        
             UINT8 = 3L, 
        
             INT8 = 4L, 
        
             UINT16 = 5L, 
        
             INT16 = 6L, 
        
             UINT32 = 7L, 
        
             INT32 = 8L, 
        
             UINT64 = 9L, 
        
             INT64 = 10L, 
        
             HALF_FLOAT = 11L, 
        
             FLOAT = 12L, 
        
             DOUBLE = 13L, 
        
             STRING = 14L, 
        
             BINARY = 15L, 
        
             FIXED_SIZE_BINARY = 16L, 
        
             DATE32 = 17L, 
        
             DATE64 = 18L, 
        
             TIMESTAMP = 19L, 
        
             TIME32 = 20L, 
        
             TIME64 = 21L, 
        
             INTERVAL_MONTHS = 22L, 
        
             INTERVAL_DAY_TIME = 23L, 
        
             DECIMAL128 = 24L, 
        
             DECIMAL256 = 25L, 
        
             LIST = 26L, 
        
             STRUCT = 27L, 
        
             SPARSE_UNION = 28L, 
        
             DENSE_UNION = 29L, 
        
             DICTIONARY = 30L, 
        
             MAP = 31L, 
        
             EXTENSION = 32L, 
        
             FIXED_SIZE_LIST = 33L, 
        
             DURATION = 34L, 
        
             LARGE_STRING = 35L, 
        
             LARGE_BINARY = 36L, 
        
             LARGE_LIST = 37L, 
        
             INTERVAL_MONTH_DAY_NANO = 38L, 
        
             RUN_END_ENCODED = 39L, 
        
             BINARY_VIEW = 40L, 
        
             STRING_VIEW = 41L 
        
           )

=While we don't guarantee the stability of those values (i.e., we could just update the values in the .R file), it's probably safer to add new types to the end of the enum.

paleolimbot · 2024-11-19T17:45:04Z

Also a note that you should be able to check big endian locally by running:

export NANOAROW_ARCH=s390x
docker compose run verify

paleolimbot

Thank you!

Just one note about the documentation for ArrowDecimal::n_words if you don't mind updating that before merging 🙂

paleolimbot · 2024-11-20T16:15:25Z

src/nanoarrow/common/array_test.cc

@@ -3821,6 +3901,80 @@ TEST(ArrayViewTest, ArrayViewTestGetIntervalMonthDayNano) {
  ArrowArrayRelease(&array);
 }

+TEST(ArrayViewTest, ArrayViewTestGetDecimal32) {


We'll do this in a follow-up (not your fault I didn't know how to use GTest properly when I wrote the first two!)

paleolimbot · 2024-11-20T16:16:59Z

src/nanoarrow/common/inline_types.h

@@ -916,13 +922,14 @@ static inline void ArrowDecimalInit(struct ArrowDecimal* decimal, int32_t bitwid
  memset(decimal->words, 0, sizeof(decimal->words));
  decimal->precision = precision;
  decimal->scale = scale;
+  // n_words will be 0 for bitwidth == 32


I see!

Can you add a note to the documentation for n_words explaining that a value of 0 is special-cased for 32-bit decimals?

arrow-nanoarrow/src/nanoarrow/common/inline_types.h

Lines 893 to 894 in 253b7ec

/// \brief An array of 64-bit integers of n_words length defined in native-endian order

uint64_t words[4];

zeroshade requested review from bkietz and paleolimbot November 15, 2024 23:06

zeroshade added 7 commits November 19, 2024 11:57

feat: add Decimal32/Decimal64 support

80c7cf2

run pre-commit

786eba8

handle strict-aliasing

b134bc2

type-punning

932b819

fix ipc decoder

5d02ef8

first round of tests

afb58ec

fix negate test

950b840

zeroshade force-pushed the decimal32-decimal64 branch from 8fceed3 to 950b840 Compare November 19, 2024 17:02

zeroshade marked this pull request as ready for review November 19, 2024 17:03

paleolimbot reviewed Nov 19, 2024

View reviewed changes

zeroshade added 6 commits November 19, 2024 12:10

update with ifdefs

fd84705

linting

478b2e5

update to arrow 18

2fe4f4f

verify arrow major version

ebb1325

more major version checks

c65fd95

more version 18 checks

286b7c0

use ArrowDecimalSetBytes

53af35f

zeroshade added 5 commits November 19, 2024 15:02

add to the end of the enum to avoid breaking R

fbf8f6f

linting

ecefc50

forgot to release arrays with #else

88c935d

clang-tidy + valgrind

8413726

handle valgrind and GetInt

6fd916d

paleolimbot approved these changes Nov 20, 2024

View reviewed changes

zeroshade added 2 commits November 20, 2024 14:30

add comment for special-case of n_words

4f1255a

update doc for words array

5b20f2f

linting

28ebfb8

paleolimbot merged commit e54b7df into main Nov 20, 2024
58 checks passed

zeroshade deleted the decimal32-decimal64 branch November 20, 2024 22:16

WillAyd mentioned this pull request Jan 23, 2025

MINOR: [Docs] Update implementation status for nanoarrow apache/arrow#45333

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Decimal32/Decimal64 support #683

feat: add Decimal32/Decimal64 support #683

zeroshade commented Nov 15, 2024

paleolimbot left a comment

paleolimbot Nov 19, 2024

zeroshade Nov 19, 2024

WillAyd Nov 20, 2024

zeroshade Nov 20, 2024

paleolimbot Nov 20, 2024

zeroshade Nov 20, 2024

paleolimbot Nov 19, 2024

paleolimbot Nov 20, 2024

zeroshade Nov 20, 2024

zeroshade commented Nov 19, 2024

paleolimbot commented Nov 19, 2024

paleolimbot commented Nov 19, 2024

paleolimbot left a comment

paleolimbot Nov 20, 2024

paleolimbot Nov 20, 2024

	/// \brief An array of 64-bit integers of n_words length defined in native-endian order
	uint64_t words[4];

feat: add Decimal32/Decimal64 support #683

feat: add Decimal32/Decimal64 support #683

Conversation

zeroshade commented Nov 15, 2024

paleolimbot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zeroshade commented Nov 19, 2024

paleolimbot commented Nov 19, 2024

paleolimbot commented Nov 19, 2024

paleolimbot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment