-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change memory object USM allocation ownership, and make execution asynchronous #1705
Conversation
The _Memory object acquires additional field void * _opaque_ptr. This pointer is the pointer to std::shared_ptr<void> with a deleter class to delete USM allocations. The pointer is opaque to retain C-API of Py_MemoryObject. If _opaque_ptr is NULL, refobj field is assumed to be responsible for USM deallocation. api Memory_GetOpaquePointer is added to access the opaque pointer. A test in test_sycl_usm is modified to reflect factual changes in the behavior of _Memory type.
The manager stores a list of events for tasks submitted so far, as well as list of events for host_tasks submitted so far. Every opperation of the manager prunes lists of completed events. `dpctl.utils.SequentialOrderManager` class instance is Python API to work with this class. Every offloading operation calls `.submitted_events` property to get dependent events, and adds computational event and host_task event to the manager using `.add_to_both_events(ht_ev, comp_ev)` method. The destructor of manager synchronizes on outstanding events.
Remove pervasive use of SyclEvent.wait in favor of using SequentialOrderManager to maintain sequential order semantics via ordering of submitted tasks using events.
Report caugh exception to std::cerr by ignore it otherwise.
… scalars Also synchronize in __sycl_usm_array_interface__ attribute, to ensure that `dpctl.memory.as_usm_memory(ary).copy_to_host()` produces expected results.
Move implementation from dpctl/utils/__init__.py into dedicated files.
Instead of relying on SUAI attribute which has to synchronize to get the offset, use `X._element_offset` attribute directly.
f885a0f
to
1930287
Compare
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_50 ran successfully. |
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_51 ran successfully. |
1 similar comment
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_51 ran successfully. |
Incidentally, the added test also found a bug
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_53 ran successfully. |
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_55 ran successfully. |
What's missing from this PR that must be addressed before merging:
|
I suggest we expand the documentation with explaining the asynchronous execution in a separate PR. The important change to realize is that execution of Submission preserves sequential ordering of operations as executed by Python interpreter. |
f31fb28
to
f6a1f06
Compare
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_57 ran successfully. |
1 similar comment
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_57 ran successfully. |
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_58 ran successfully. |
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_59 ran successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked over the PR and this LGTM, I think that is a good change overall, and bring tangible performance improvements to the project.
Thank you for the work on this @oleksandr-pavlyk !
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_60 ran successfully. |
This is PR changes the size and content of the struct behind
dpctl.memory._Memory
object.This is backwards incompatible change, and would require rebuilding of downstream projects with native extensions (
cython
, orpybind11
) that usedpctl
.The
_Memory
object no longer explicitly frees USM allocations it made, but delegates this job to a smart pointer. This allowshost_task
jobs that ensure deferment of USM deallocation till after offloaded tasks that operate on it complete execution to do their job by capturing the smart pointer in the callable passed to thehost_task
instead needing to rely on Python reference counting, and thus need to acquire GIL in that callable.Furthermore, the PR introduces mechanism of ensuring sequential order amongst tasks implementing offloading Python API, such as
dpctl.tensor
implementing array API compliant tensor library.One consequence of this change is that timing of
dpctl.tensor
functions must synchronize the execution queue before taking the final timestamp reading.