Skip to content

Commit

Permalink
apacheGH-45393: [C++][Compute] Fix wrong decoding for 32-bit column i…
Browse files Browse the repository at this point in the history
…n row table (apache#45473)

### Rationale for this change

The failure reported in apache#45393 seems to be caused by a careless parentheses typo introduced in apache#45336:
https://github.com/apache/arrow/blob/e32f56b478171fc4b53dc2042c4cf5d37c97e351/cpp/src/arrow/compute/row/encode_internal.cc#L281-L282

And unfortunately our `Grouper` UT doesn't have cases covering this particular code path (the questioning code path is only triggered in `Grouper` with very restrictive conditions: the row table is fixed-length, a 32-bit key is encoded after some other keys).

### What changes are included in this PR?

An UT to reproduce the issue and the fix.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

None.

* GitHub Issue: apache#45393

Authored-by: Rossi Sun <[email protected]>
Signed-off-by: Rossi Sun <[email protected]>
  • Loading branch information
zanmato1984 authored Feb 10, 2025
1 parent e32f56b commit d3c4676
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 1 deletion.
2 changes: 1 addition & 1 deletion cpp/src/arrow/compute/row/encode_internal.cc
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ void EncoderInteger::Decode(uint32_t start_row, uint32_t num_rows,
case 4:
for (uint32_t i = 0; i < num_rows; ++i) {
reinterpret_cast<uint32_t*>(col_base)[i] = *reinterpret_cast<const uint32_t*>(
rows.fixed_length_rows(start_row + i + offset_within_row));
rows.fixed_length_rows(start_row + i) + offset_within_row);
}
break;
case 8:
Expand Down
22 changes: 22 additions & 0 deletions cpp/src/arrow/compute/row/grouper_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -825,6 +825,28 @@ TEST(Grouper, DictKey) {
g.grouper_->Consume(dict_span));
}

// GH-45393: Test combinations of numeric type keys of different lengths.
TEST(Grouper, MultipleIntKeys) {
auto types = NumericTypes();
for (auto& t0 : types) {
ARROW_SCOPED_TRACE("t0=", t0->ToString());
for (auto& t1 : types) {
ARROW_SCOPED_TRACE("t1=", t1->ToString());
for (auto& t2 : types) {
ARROW_SCOPED_TRACE("t2=", t2->ToString());
TestGrouper g({t0, t1, t2});

g.ExpectConsume(R"([[0, 1, 2], [0, 1, 2]])", "[0, 0]");
g.ExpectConsume(R"([[0, 1, 2], [null, 1, 2]])", "[0, 1]");
g.ExpectConsume(R"([[0, 1, 2], [0, null, 2]])", "[0, 2]");
g.ExpectConsume(R"([[0, 1, 2], [0, 1, null]])", "[0, 3]");

g.ExpectUniques("[[0, 1, 2], [null, 1, 2], [0, null, 2], [0, 1, null]]");
}
}
}
}

TEST(Grouper, StringInt64Key) {
TestGrouper g({utf8(), int64()});

Expand Down

0 comments on commit d3c4676

Please sign in to comment.