-
Notifications
You must be signed in to change notification settings - Fork 932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Acquire spill lock in to/from_arrow #13646
Acquire spill lock in to/from_arrow #13646
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think this is just an oversight. Adding more acquire_spill_lock
calls was always intended to be an ongoing task IIRC, although @madsbk can correct me if I've forgotten something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, since cudf::to_arrow
returns a arrow::Table
that owns the data, we can spill.
Good catch @shwina
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is resolving a real problem that needs to be fixed, just wanted to report here that it doesn't resolve the P2P shuffle issue we've seen and we still end up with errors such as below:
MemoryError: std::bad_alloc: CUDA error at: /datasets/pentschev/miniconda3/envs/cudf-invalid-address-src/include/rmm/mr/device/cuda_async_view_memory_resource.hpp:121: cudaErrorIllegalAddress an illegal memory access was encountered
With that said, the illegal address errors are probably due to something else.
cc @quasiben @rjzamora @VibhuJawa for awareness, it doesn't resolve the illegal address errors as I mentioned in #13646 (review) . |
/merge |
Description
Currently, calling
to_arrow
will render a cuDF Series/DataFrame unspillable. I believe it's just an oversight that we don't call to/from_arrow in anacquire_spill_lock
context. Neither of those methods should expose the device pointer to a cuDF object externally.cc: @madsbk @vyasr -- could you take a look please?
Checklist