Dense reader: fix user buffer offset computation for multi-index queries. #3002

KiterLuc · 2022-03-24T08:30:27Z

The previous implementation of the dense reader would order the results
in the user buffer by N-dimension range index, for example, if there are
2 ranges set per dimensions, the data would be ordered from N-dimension
range 1 to 4. This implementation is incorrect as it doesn't really
return the data in the expected row/column major.

This change computes the correct offset for the data in the user buffers
and returns the data in true row/column major order.

TYPE: IMPROVEMENT
DESC: Dense reader: fix user buffer offset computation for multi-index queries.

shortcut-integration · 2022-03-24T08:30:30Z

This pull request has been linked to Shortcut Story #16010: [CZI] Different query results between 2.6.4 and 2.7.1.

test/src/unit-capi-dense_array.cc

ypatia · 2022-03-24T19:48:57Z

tiledb/sm/query/dense_reader.cc

+  // Compute the correct multipliers.
+  uint64_t mult = 1;
+  if (subarray.layout() == Layout::COL_MAJOR) {
+    for (int32_t d = 0; d < static_cast<int32_t>(dim_num); d++) {


why int32_t and not uint32_t ? auto d won't work here?
applies to next for as well

In the one below, the condition is d >= 0, if this was a uint32_t, after 0, d-- would just wrap around, so the for loop would never exit. I could fix this one but left it as is for consistency.

ypatia · 2022-03-24T19:51:58Z

tiledb/sm/query/dense_reader.cc

+  if (subarray.layout() == Layout::COL_MAJOR) {
+    for (int32_t d = 0; d < static_cast<int32_t>(dim_num); d++) {
+      auto saved = mult;
+      mult *= range_info[d].multiplier_;


Can this overflow?

I think we are fine here for dense.

…ies. The previous implementation of the dense reader would order the results in the user buffer by N-dimension range index, for example, if there are 2 ranges set per dimensions, the data would be ordered from N-dimension range 1 to 4. This implementation is incorrect as it doesn't really return the data in the expected row/column major. This change computes the correct offset for the data in the user buffers and returns the data in true row/column major order. --- TYPE: IMPROVEMENT DESC: Dense reader: fix user buffer offset computation for multi-index queries.

ihnorton

Thanks, LGTM

github-actions · 2022-03-28T21:01:57Z

The backport to release-2.7 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release-2.7 release-2.7
# Navigate to the new working tree
cd .worktrees/backport-release-2.7
# Create a new branch
git switch --create backport-3002-to-release-2.7
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick --mainline 1 740c589c6d9df044b0197cd9fba08c061363231e
# Push it to GitHub
git push --set-upstream origin backport-3002-to-release-2.7
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release-2.7

Then, create a pull request where the base branch is release-2.7 and the compare/head branch is backport-3002-to-release-2.7.

github-actions · 2022-03-28T21:01:58Z

The backport to release-2.8 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release-2.8 release-2.8
# Navigate to the new working tree
cd .worktrees/backport-release-2.8
# Create a new branch
git switch --create backport-3002-to-release-2.8
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick --mainline 1 740c589c6d9df044b0197cd9fba08c061363231e
# Push it to GitHub
git push --set-upstream origin backport-3002-to-release-2.8
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release-2.8

Then, create a pull request where the base branch is release-2.8 and the compare/head branch is backport-3002-to-release-2.8.

…x queries. (#3002) * Dense reader: fix user buffer offset computation for multi-index queries. The previous implementation of the dense reader would order the results in the user buffer by N-dimension range index, for example, if there are 2 ranges set per dimensions, the data would be ordered from N-dimension range 1 to 4. This implementation is incorrect as it doesn't really return the data in the expected row/column major. This change computes the correct offset for the data in the user buffers and returns the data in true row/column major order. * Addressing feedback from ihnorton. * Making sure ranges are sorted. --- TYPE: IMPROVEMENT DESC: Dense reader: fix user buffer offset computation for multi-index queries. (cherry picked from commit 740c589)

…on for multi-inde… (#3017) * [2.7] Dense reader: fix user buffer offset computation for multi-index queries. (#3002) * Dense reader: fix user buffer offset computation for multi-index queries. The previous implementation of the dense reader would order the results in the user buffer by N-dimension range index, for example, if there are 2 ranges set per dimensions, the data would be ordered from N-dimension range 1 to 4. This implementation is incorrect as it doesn't really return the data in the expected row/column major. This change computes the correct offset for the data in the user buffers and returns the data in true row/column major order. * Addressing feedback from ihnorton. * Making sure ranges are sorted. --- TYPE: IMPROVEMENT DESC: Dense reader: fix user buffer offset computation for multi-index queries. (cherry picked from commit 740c589) * Fix docs CI breakage due to sphinx/jinja2 (new) incompatible release (#3003) * Fix docs CI breakage due to sphinx/jinja2 (new) incompatible release * Pin jinja for GCS emulator installation Co-authored-by: Seth Shelnutt <[email protected]> (cherry picked from commit 2c61486) * [ci] Pin werkzeug to fix GCS emulator startup (cherry picked from commit f584fb3) Co-authored-by: KiterLuc <[email protected]>

* [2.8] Dense reader: fix user buffer offset computation for multi-index queries. (#3002) * Dense reader: fix user buffer offset computation for multi-index queries. The previous implementation of the dense reader would order the results in the user buffer by N-dimension range index, for example, if there are 2 ranges set per dimensions, the data would be ordered from N-dimension range 1 to 4. This implementation is incorrect as it doesn't really return the data in the expected row/column major. This change computes the correct offset for the data in the user buffers and returns the data in true row/column major order. * Addressing feedback from ihnorton. * Making sure ranges are sorted. --- TYPE: IMPROVEMENT DESC: Dense reader: fix user buffer offset computation for multi-index queries. (cherry picked from commit 740c589) * [ci] Pin werkzeug to fix GCS emulator startup (cherry picked from commit f584fb3) Co-authored-by: KiterLuc <[email protected]>

Following up on #3002, it was determined we should not sort the input ranges. If the user has a dense vector of domain 1-10 and requests ranges 3-4 then 1-2, we should return the ranges in that order. --- TYPE: IMPROVEMENT DESC: Dense reader: do not sort input ranges.

Following up on #3002, it was determined we should not sort the input ranges. If the user has a dense vector of domain 1-10 and requests ranges 3-4 then 1-2, we should return the ranges in that order. --- TYPE: IMPROVEMENT DESC: Dense reader: do not sort input ranges. (cherry picked from commit 8aa752c)

KiterLuc requested a review from ihnorton March 24, 2022 08:30

ihnorton requested a review from ypatia March 24, 2022 14:41

ihnorton reviewed Mar 24, 2022

View reviewed changes

test/src/unit-capi-dense_array.cc Outdated Show resolved Hide resolved

ypatia reviewed Mar 24, 2022

View reviewed changes

KiterLuc added 3 commits March 28, 2022 08:17

Addressing feedback from @ihnorton.

14c1654

Making sure ranges are sorted.

976cb7b

KiterLuc force-pushed the lr/dense-multi-index-fix/ch16010 branch from 2061051 to 976cb7b Compare March 28, 2022 07:17

ihnorton approved these changes Mar 28, 2022

View reviewed changes

ihnorton added backport release-2.7 backport release-2.8 labels Mar 28, 2022

ihnorton mentioned this pull request Mar 28, 2022

[WIP] add new hypothesis test for 2d dense comparing refactored and legacy TileDB-Inc/TileDB-Py#1003

Open

ihnorton merged commit 740c589 into dev Mar 28, 2022

ihnorton mentioned this pull request Mar 29, 2022

[Backport release-2.7] Dense reader: fix user buffer offset computation for multi-inde… #3017

Merged

KiterLuc mentioned this pull request Mar 31, 2022

Dense reader: do not sort input ranges. #3036

Merged

KiterLuc deleted the lr/dense-multi-index-fix/ch16010 branch March 31, 2022 12:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dense reader: fix user buffer offset computation for multi-index queries. #3002

Dense reader: fix user buffer offset computation for multi-index queries. #3002

KiterLuc commented Mar 24, 2022

shortcut-integration bot commented Mar 24, 2022

ypatia Mar 24, 2022

KiterLuc Mar 25, 2022

ypatia Mar 24, 2022

KiterLuc Mar 25, 2022

ihnorton left a comment

github-actions bot commented Mar 28, 2022

github-actions bot commented Mar 28, 2022

Dense reader: fix user buffer offset computation for multi-index queries. #3002

Dense reader: fix user buffer offset computation for multi-index queries. #3002

Conversation

KiterLuc commented Mar 24, 2022

shortcut-integration bot commented Mar 24, 2022

ypatia Mar 24, 2022

Choose a reason for hiding this comment

KiterLuc Mar 25, 2022

Choose a reason for hiding this comment

ypatia Mar 24, 2022

Choose a reason for hiding this comment

KiterLuc Mar 25, 2022

Choose a reason for hiding this comment

ihnorton left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 28, 2022

github-actions bot commented Mar 28, 2022