-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: issue #3118 and #3185 #3193
fix: issue #3118 and #3185 #3193
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my eyes, this looks like a good solution, though a lot of tests are failing.
The original PR #3119 had to change a test expectation: see this, which explains one of the test failures, but it looks like there are more.
Another might be related to the == 0
check with typetracer arrays; I made a suggestion below.
Making UnionArray.project
non-lazy shouldn't require many tests to change, since toggling it in the other direction only upset one test. We should only see the same one complaining again when we toggle it back the other way. Unless UnionArray.project
and UnionArray.simplified
need to be toggled in tandem?
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good and ready to merge; the tests @pfackeldey added resolves the known_data
versus unknown_length
question that @agoose77 and I were discussing. Also, it's nice that it completely reverts PR #3119 to the state that the code has been in for years. The probability of another unforeseen consequence is smaller than it was with PR #3119.
I noted a few surprising diffs below and even made a (minor) suggested change in one that was actually implemented in PR #3191 earlier today. There's nothing left to do for them, unless @maxgalli wants to make the minor suggested change as a new PR. You can click "resolved" and merge the PR.
…indices for IndexOptionArrays)
…noncanonical.py'
b88ad3c
to
c16d846
Compare
@ikrommyd could you give this fix a try? |
Seems to be working. Removes overtouching from both the original diff --git a/src/awkward/contents/unionarray.py b/src/awkward/contents/unionarray.py
index bb3b6ca9..04cb1f54 100644
--- a/src/awkward/contents/unionarray.py
+++ b/src/awkward/contents/unionarray.py
@@ -702,7 +702,7 @@ class UnionArray(UnionMeta[Content], Content):
nextcarry = ak.index.Index64(
tmpcarry.data[: lenout[0]], nplike=self._backend.index_nplike
)
- return self._contents[index]._carry(nextcarry, True)
+ return self._contents[index]._carry(nextcarry, False)
@staticmethod
def regular_index( introduces overtouching again as expected. |
@all-contributors please add @pfackeldey for code |
I've put up a pull request to add @pfackeldey! 🎉 |
After a longer discussion with @jpivarski and @maxgalli we came up with a solution for issue #3118.
This PR addresses issue #3118 and by trimming the
Index*Arrays
safely before concatenation and thus avoiding memory growth of the underlyingNumpyArrays
.In addition, it allows to re-enable laziness (disabled recently in #3119) for
UnionArray.simplified
and thus avoids touching too many columns as described in #3185 and #3119 (comment).Best, Peter