Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent exponential memory growth in UnionArray #3119

Merged

Conversation

jpivarski
Copy link
Member

I decided to just make UnionArray.simplified not allow lazy carries. It should always return correct results and it can only have a performance impact on wide RecordArrays that go through UnionArray.simplified (perhaps via ak.concatenate).

The new test would fail for the old code: for 5 loops, the NumpyArray length gets up to 32, and we assert <= 2.

I only had to change one test, which was hard-coded to a particular Form. It was for testing allow_noncanonical_form=True in ak.from_buffers. I think that feature is still being tested with the updated test.

@agoose77, am I missing anything here? Would you do anything differently?

@jpivarski jpivarski requested a review from agoose77 May 14, 2024 22:15
@jpivarski jpivarski linked an issue May 14, 2024 that may be closed by this pull request
Copy link
Collaborator

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpivarski - it looks like the tests on Mac are segfaulting and not only for this PR. I'm checking locally.

@agoose77
Copy link
Collaborator

@jpivarski I agree with your reasoning. I haven't given much more thought to alternative solutions, but we can always revisit this in future!

@ianna
Copy link
Collaborator

ianna commented May 15, 2024

@jpivarski - it looks like the tests on Mac are segfaulting and not only for this PR. I'm checking locally.

I can confirm that I can reproduce the segfault locally on MacOS 11.6 with pyarrow 16.1.0 (released yesterday). It happens in tests/test_0080_flatpandas_multiindex_rows_and_columns.py

Changing back to pyarrow 16.0.0 fixes the issue.

@jpivarski
Copy link
Member Author

Although we could hold back pyarrow<16.1.0, that would have to be temporary; we'd have to remove the constraint after pyarrow gets fixed. We can only expect pyarrow to get fixed if the error is reported to them, so I tried to do that, but #3122.

@ianna
Copy link
Collaborator

ianna commented May 16, 2024

Although we could hold back pyarrow<16.1.0, that would have to be temporary; we'd have to remove the constraint after pyarrow gets fixed. We can only expect pyarrow to get fixed if the error is reported to them, so I tried to do that, but #3122.

I uninstalled both awkward and awkward-cpp and have a minimum reproducer of a segfault with:

% python
Python 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
zsh: segmentation fault  python

The test_pyarrow.py has only one line:
import pyarrow

 % lldb python test_pyarrow.py 
(lldb) target create "python"
Current executable set to '/Users/yana/anaconda3/bin/python' (x86_64).
(lldb) settings set -- target.run-args  "test_pyarrow.py"
(lldb) run
Process 34536 launched: '/Users/yana/anaconda3/bin/python' (x86_64)
Process 34536 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x0000000129c8685e libarrow.1601.dylib`malloc_conf_init_helper + 302
libarrow.1601.dylib`malloc_conf_init_helper:
->  0x129c8685e <+302>: movzbl (%rbx), %eax
    0x129c86861 <+305>: testb  %al, %al
    0x129c86863 <+307>: je     0x129c8936e               ; <+11326>
    0x129c86869 <+313>: movq   %rbx, %rcx
(lldb) 

@jpivarski
Copy link
Member Author

That reproducer must depend on the environment because it didn't segfault immediately on import for me. I also noticed in the output that this is x86-64. My laptop is ARM (M2). That's likely relevant.

Could you send as much information as you can to the Arrow developers? Since we also need a short-term solution, we can put a version cap on Arrow in our tests for now. There are a few pull requests that need it.

@ianna
Copy link
Collaborator

ianna commented May 16, 2024

That reproducer must depend on the environment because it didn't segfault immediately on import for me. I also noticed in the output that this is x86-64. My laptop is ARM (M2). That's likely relevant.

Could you send as much information as you can to the Arrow developers? Since we also need a short-term solution, we can put a version cap on Arrow in our tests for now. There are a few pull requests that need it.

I've opened an issue apache/arrow#41696

@jpivarski
Copy link
Member Author

Awesome! It looks like they're dealing with it right away.

@jpivarski jpivarski merged commit 28a89da into main May 28, 2024
41 checks passed
@jpivarski jpivarski deleted the jpivarski/prevent_exponential_memory_growth_in_unionarray branch May 28, 2024 14:43
@lgray
Copy link
Contributor

lgray commented Jul 17, 2024

This PR introduces massive data inefficiencies for HEP analyses that involve combinations of multiple flavors of leptons, or generally gluing multiple tables together. We should reconsider. I would propose we revert this PR for now, there doesn't seem to be an issue this fixes, but rather one that might happen. I appreciate that, but we should be more careful and reconcile this with the needs of the primary user base.

return self._contents[index]._carry(nextcarry, True)
return self._contents[index]._carry(nextcarry, False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced lazy projection with eager projection here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exponential memory growth in UnionArray broadcasting
4 participants