Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: issue #3118 and #3185 #3193

Conversation

pfackeldey
Copy link
Collaborator

After a longer discussion with @jpivarski and @maxgalli we came up with a solution for issue #3118.

This PR addresses issue #3118 and by trimming the Index*Arrays safely before concatenation and thus avoiding memory growth of the underlying NumpyArrays.

In addition, it allows to re-enable laziness (disabled recently in #3119) for UnionArray.simplified and thus avoids touching too many columns as described in #3185 and #3119 (comment).

Best, Peter

@pfackeldey pfackeldey changed the title Fix issue #3118 and #3185 fix: issue #3118 and #3185 Jul 31, 2024
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my eyes, this looks like a good solution, though a lot of tests are failing.

The original PR #3119 had to change a test expectation: see this, which explains one of the test failures, but it looks like there are more.

Another might be related to the == 0 check with typetracer arrays; I made a suggestion below.

Making UnionArray.project non-lazy shouldn't require many tests to change, since toggling it in the other direction only upset one test. We should only see the same one complaining again when we toggle it back the other way. Unless UnionArray.project and UnionArray.simplified need to be toggled in tandem?

src/awkward/contents/indexedarray.py Outdated Show resolved Hide resolved
src/awkward/contents/indexedoptionarray.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jul 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.32%. Comparing base (b749e49) to head (c16d846).
Report is 131 commits behind head on main.

Additional details and impacted files
Files Coverage Δ
src/awkward/contents/indexedarray.py 82.35% <100.00%> (+1.31%) ⬆️
src/awkward/contents/indexedoptionarray.py 88.54% <100.00%> (+0.32%) ⬆️
src/awkward/contents/unionarray.py 84.67% <100.00%> (-0.58%) ⬇️

... and 98 files with indirect coverage changes

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and ready to merge; the tests @pfackeldey added resolves the known_data versus unknown_length question that @agoose77 and I were discussing. Also, it's nice that it completely reverts PR #3119 to the state that the code has been in for years. The probability of another unforeseen consequence is smaller than it was with PR #3119.

I noted a few surprising diffs below and even made a (minor) suggested change in one that was actually implemented in PR #3191 earlier today. There's nothing left to do for them, unless @maxgalli wants to make the minor suggested change as a new PR. You can click "resolved" and merge the PR.

src/awkward/contents/numpyarray.py Outdated Show resolved Hide resolved
src/awkward/operations/ak_from_buffers.py Outdated Show resolved Hide resolved
@pfackeldey pfackeldey force-pushed the fix/issue_3118_prevent_exponential_memory_growth_in_unionarray branch from b88ad3c to c16d846 Compare July 31, 2024 23:09
@pfackeldey pfackeldey merged commit 55a9a43 into scikit-hep:main Aug 1, 2024
41 checks passed
@lgray
Copy link
Contributor

lgray commented Aug 1, 2024

@ikrommyd could you give this fix a try?

@ikrommyd
Copy link

ikrommyd commented Aug 1, 2024

Seems to be working. Removes overtouching from both the original q8_hist reproducer and the your more minimal one.
Naturally doing this:

diff --git a/src/awkward/contents/unionarray.py b/src/awkward/contents/unionarray.py
index bb3b6ca9..04cb1f54 100644
--- a/src/awkward/contents/unionarray.py
+++ b/src/awkward/contents/unionarray.py
@@ -702,7 +702,7 @@ class UnionArray(UnionMeta[Content], Content):
         nextcarry = ak.index.Index64(
             tmpcarry.data[: lenout[0]], nplike=self._backend.index_nplike
         )
-        return self._contents[index]._carry(nextcarry, True)
+        return self._contents[index]._carry(nextcarry, False)

     @staticmethod
     def regular_index(

introduces overtouching again as expected.
I believe #3185 is to be closed.
Thanks for the fix y'all and if everything is good, I'd be happy to see this in a release soon.

@jpivarski
Copy link
Member

@all-contributors please add @pfackeldey for code

Copy link
Contributor

@jpivarski

I've put up a pull request to add @pfackeldey! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants