Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dense reader: fix user buffer offset computation for multi-index queries. #3002

Merged
merged 3 commits into from
Mar 28, 2022

Conversation

KiterLuc
Copy link
Contributor

The previous implementation of the dense reader would order the results
in the user buffer by N-dimension range index, for example, if there are
2 ranges set per dimensions, the data would be ordered from N-dimension
range 1 to 4. This implementation is incorrect as it doesn't really
return the data in the expected row/column major.

This change computes the correct offset for the data in the user buffers
and returns the data in true row/column major order.


TYPE: IMPROVEMENT
DESC: Dense reader: fix user buffer offset computation for multi-index queries.

@KiterLuc KiterLuc requested a review from ihnorton March 24, 2022 08:30
@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #16010: [CZI] Different query results between 2.6.4 and 2.7.1.

@ihnorton ihnorton requested a review from ypatia March 24, 2022 14:41
// Compute the correct multipliers.
uint64_t mult = 1;
if (subarray.layout() == Layout::COL_MAJOR) {
for (int32_t d = 0; d < static_cast<int32_t>(dim_num); d++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why int32_t and not uint32_t ? auto d won't work here?
applies to next for as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the one below, the condition is d >= 0, if this was a uint32_t, after 0, d-- would just wrap around, so the for loop would never exit. I could fix this one but left it as is for consistency.

if (subarray.layout() == Layout::COL_MAJOR) {
for (int32_t d = 0; d < static_cast<int32_t>(dim_num); d++) {
auto saved = mult;
mult *= range_info[d].multiplier_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this overflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are fine here for dense.

…ies.

The previous implementation of the dense reader would order the results
in the user buffer by N-dimension range index, for example, if there are
2 ranges set per dimensions, the data would be ordered from N-dimension
range 1 to 4. This implementation is incorrect as it doesn't really
return the data in the expected row/column major.

This change computes the correct offset for the data in the user buffers
and returns the data in true row/column major order.

---
TYPE: IMPROVEMENT
DESC: Dense reader: fix user buffer offset computation for multi-index queries.
@KiterLuc KiterLuc force-pushed the lr/dense-multi-index-fix/ch16010 branch from 2061051 to 976cb7b Compare March 28, 2022 07:17
Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

@github-actions
Copy link
Contributor

The backport to release-2.7 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release-2.7 release-2.7
# Navigate to the new working tree
cd .worktrees/backport-release-2.7
# Create a new branch
git switch --create backport-3002-to-release-2.7
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick --mainline 1 740c589c6d9df044b0197cd9fba08c061363231e
# Push it to GitHub
git push --set-upstream origin backport-3002-to-release-2.7
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release-2.7

Then, create a pull request where the base branch is release-2.7 and the compare/head branch is backport-3002-to-release-2.7.

@github-actions
Copy link
Contributor

The backport to release-2.8 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release-2.8 release-2.8
# Navigate to the new working tree
cd .worktrees/backport-release-2.8
# Create a new branch
git switch --create backport-3002-to-release-2.8
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick --mainline 1 740c589c6d9df044b0197cd9fba08c061363231e
# Push it to GitHub
git push --set-upstream origin backport-3002-to-release-2.8
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release-2.8

Then, create a pull request where the base branch is release-2.8 and the compare/head branch is backport-3002-to-release-2.8.

ihnorton pushed a commit that referenced this pull request Mar 29, 2022
…x queries. (#3002)

* Dense reader: fix user buffer offset computation for multi-index queries.

The previous implementation of the dense reader would order the results
in the user buffer by N-dimension range index, for example, if there are
2 ranges set per dimensions, the data would be ordered from N-dimension
range 1 to 4. This implementation is incorrect as it doesn't really
return the data in the expected row/column major.

This change computes the correct offset for the data in the user buffers
and returns the data in true row/column major order.

* Addressing feedback from ihnorton.

* Making sure ranges are sorted.

---
TYPE: IMPROVEMENT
DESC: Dense reader: fix user buffer offset computation for multi-index queries.

(cherry picked from commit 740c589)
ihnorton pushed a commit that referenced this pull request Mar 29, 2022
…x queries. (#3002)

* Dense reader: fix user buffer offset computation for multi-index queries.

The previous implementation of the dense reader would order the results
in the user buffer by N-dimension range index, for example, if there are
2 ranges set per dimensions, the data would be ordered from N-dimension
range 1 to 4. This implementation is incorrect as it doesn't really
return the data in the expected row/column major.

This change computes the correct offset for the data in the user buffers
and returns the data in true row/column major order.

* Addressing feedback from ihnorton.

* Making sure ranges are sorted.

---
TYPE: IMPROVEMENT
DESC: Dense reader: fix user buffer offset computation for multi-index queries.

(cherry picked from commit 740c589)
Shelnutt2 pushed a commit that referenced this pull request Mar 30, 2022
…on for multi-inde… (#3017)

* [2.7] Dense reader: fix user buffer offset computation for multi-index queries. (#3002)

* Dense reader: fix user buffer offset computation for multi-index queries.

The previous implementation of the dense reader would order the results
in the user buffer by N-dimension range index, for example, if there are
2 ranges set per dimensions, the data would be ordered from N-dimension
range 1 to 4. This implementation is incorrect as it doesn't really
return the data in the expected row/column major.

This change computes the correct offset for the data in the user buffers
and returns the data in true row/column major order.

* Addressing feedback from ihnorton.

* Making sure ranges are sorted.

---
TYPE: IMPROVEMENT
DESC: Dense reader: fix user buffer offset computation for multi-index queries.

(cherry picked from commit 740c589)

* Fix docs CI breakage due to sphinx/jinja2 (new) incompatible release (#3003)

* Fix docs CI breakage due to sphinx/jinja2 (new) incompatible release

* Pin jinja for GCS emulator installation

Co-authored-by: Seth Shelnutt <[email protected]>
(cherry picked from commit 2c61486)

* [ci] Pin werkzeug to fix GCS emulator startup

(cherry picked from commit f584fb3)

Co-authored-by: KiterLuc <[email protected]>
Shelnutt2 pushed a commit that referenced this pull request Mar 30, 2022
* [2.8] Dense reader: fix user buffer offset computation for multi-index queries. (#3002)

* Dense reader: fix user buffer offset computation for multi-index queries.

The previous implementation of the dense reader would order the results
in the user buffer by N-dimension range index, for example, if there are
2 ranges set per dimensions, the data would be ordered from N-dimension
range 1 to 4. This implementation is incorrect as it doesn't really
return the data in the expected row/column major.

This change computes the correct offset for the data in the user buffers
and returns the data in true row/column major order.

* Addressing feedback from ihnorton.

* Making sure ranges are sorted.

---
TYPE: IMPROVEMENT
DESC: Dense reader: fix user buffer offset computation for multi-index queries.

(cherry picked from commit 740c589)

* [ci] Pin werkzeug to fix GCS emulator startup

(cherry picked from commit f584fb3)

Co-authored-by: KiterLuc <[email protected]>
KiterLuc added a commit that referenced this pull request Mar 31, 2022
Following up on #3002, it was
determined we should not sort the input ranges. If the user has a dense
vector of domain 1-10 and requests ranges 3-4 then 1-2, we should
return the ranges in that order.

---
TYPE: IMPROVEMENT
DESC: Dense reader: do not sort input ranges.
KiterLuc added a commit that referenced this pull request Mar 31, 2022
Following up on #3002, it was
determined we should not sort the input ranges. If the user has a dense
vector of domain 1-10 and requests ranges 3-4 then 1-2, we should
return the ranges in that order.

---
TYPE: IMPROVEMENT
DESC: Dense reader: do not sort input ranges.
@KiterLuc KiterLuc deleted the lr/dense-multi-index-fix/ch16010 branch March 31, 2022 12:24
ihnorton pushed a commit that referenced this pull request Apr 5, 2022
Following up on #3002, it was
determined we should not sort the input ranges. If the user has a dense
vector of domain 1-10 and requests ranges 3-4 then 1-2, we should
return the ranges in that order.

---
TYPE: IMPROVEMENT
DESC: Dense reader: do not sort input ranges.
ihnorton pushed a commit that referenced this pull request Apr 6, 2022
Following up on #3002, it was
determined we should not sort the input ranges. If the user has a dense
vector of domain 1-10 and requests ranges 3-4 then 1-2, we should
return the ranges in that order.

---
TYPE: IMPROVEMENT
DESC: Dense reader: do not sort input ranges.

(cherry picked from commit 8aa752c)
ihnorton pushed a commit that referenced this pull request Apr 7, 2022
Following up on #3002, it was
determined we should not sort the input ranges. If the user has a dense
vector of domain 1-10 and requests ranges 3-4 then 1-2, we should
return the ranges in that order.

---
TYPE: IMPROVEMENT
DESC: Dense reader: do not sort input ranges.

(cherry picked from commit 8aa752c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants