Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change memory object USM allocation ownership, and make execution asynchronous #1705

Merged
merged 25 commits into from
Jun 5, 2024

Conversation

oleksandr-pavlyk
Copy link
Collaborator

This is PR changes the size and content of the struct behind dpctl.memory._Memory object.

This is backwards incompatible change, and would require rebuilding of downstream projects with native extensions (cython, or pybind11) that use dpctl.

The _Memory object no longer explicitly frees USM allocations it made, but delegates this job to a smart pointer. This allows host_task jobs that ensure deferment of USM deallocation till after offloaded tasks that operate on it complete execution to do their job by capturing the smart pointer in the callable passed to the host_task instead needing to rely on Python reference counting, and thus need to acquire GIL in that callable.

Furthermore, the PR introduces mechanism of ensuring sequential order amongst tasks implementing offloading Python API, such as dpctl.tensor implementing array API compliant tensor library.

One consequence of this change is that timing of dpctl.tensor functions must synchronize the execution queue before taking the final timestamp reading.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you opening the PR as a draft?

The _Memory object acquires additional field void * _opaque_ptr.
This pointer is the pointer to std::shared_ptr<void> with a deleter
class to delete USM allocations. The pointer is opaque to retain C-API
of Py_MemoryObject.

If _opaque_ptr is NULL, refobj field is assumed to be responsible for
USM deallocation.

api Memory_GetOpaquePointer is added to access the opaque pointer.

A test in test_sycl_usm is modified to reflect factual changes in
the behavior of _Memory type.
The manager stores a list of events for tasks submitted so far,
as well as list of events for host_tasks submitted so far.

Every opperation of the manager prunes lists of completed events.

`dpctl.utils.SequentialOrderManager` class instance is Python
API to work with this class.

Every offloading operation calls `.submitted_events` property to get
dependent events, and adds computational event and host_task
event to the manager using `.add_to_both_events(ht_ev, comp_ev)`
method.

The destructor of manager synchronizes on outstanding events.
Remove pervasive use of SyclEvent.wait in favor of using
SequentialOrderManager to maintain sequential order semantics
via ordering of submitted tasks using events.
Report caugh exception to std::cerr by ignore it otherwise.
… scalars

Also synchronize in __sycl_usm_array_interface__ attribute, to ensure
that `dpctl.memory.as_usm_memory(ary).copy_to_host()` produces
expected results.
Move implementation from dpctl/utils/__init__.py into
dedicated files.
Instead of relying on SUAI attribute which has to synchronize to get
the offset, use `X._element_offset` attribute directly.
Copy link

github-actions bot commented Jun 4, 2024

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

Copy link

github-actions bot commented Jun 4, 2024

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_50 ran successfully.
Passed: 890
Failed: 11
Skipped: 91

@coveralls
Copy link
Collaborator

coveralls commented Jun 4, 2024

Coverage Status

coverage: 88.057% (+0.1%) from 87.911%
when pulling f4e3b6f on memory-work
into c6f5f79 on master.

Copy link

github-actions bot commented Jun 4, 2024

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_51 ran successfully.
Passed: 890
Failed: 11
Skipped: 91

1 similar comment
Copy link

github-actions bot commented Jun 4, 2024

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_51 ran successfully.
Passed: 890
Failed: 11
Skipped: 91

Copy link

github-actions bot commented Jun 4, 2024

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_53 ran successfully.
Passed: 890
Failed: 11
Skipped: 91

Copy link

github-actions bot commented Jun 4, 2024

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_55 ran successfully.
Passed: 890
Failed: 11
Skipped: 91

@oleksandr-pavlyk
Copy link
Collaborator Author

oleksandr-pavlyk commented Jun 4, 2024

What's missing from this PR that must be addressed before merging:

  • dpctl_capi documentation must be expanded with new CAPI added, one for usm_ndarray, and one for usm_memory
  • Execution model explanation in documentation must now reflect that execution is asynchronous
  • Examples of dpctl for pybind11 and cython must be updated.

@oleksandr-pavlyk oleksandr-pavlyk marked this pull request as ready for review June 5, 2024 19:22
@oleksandr-pavlyk
Copy link
Collaborator Author

I suggest we expand the documentation with explaining the asynchronous execution in a separate PR.

The important change to realize is that execution of tensor operations is done asynchronously, so to get a proper timing values, one must use queue.wait() before taking the final time stamp reading.

Submission preserves sequential ordering of operations as executed by Python interpreter.

Copy link

github-actions bot commented Jun 5, 2024

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_57 ran successfully.
Passed: 890
Failed: 11
Skipped: 91

1 similar comment
Copy link

github-actions bot commented Jun 5, 2024

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_57 ran successfully.
Passed: 890
Failed: 11
Skipped: 91

Copy link

github-actions bot commented Jun 5, 2024

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_58 ran successfully.
Passed: 890
Failed: 11
Skipped: 91

Copy link

github-actions bot commented Jun 5, 2024

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_59 ran successfully.
Passed: 890
Failed: 11
Skipped: 91

Copy link
Collaborator

@ndgrigorian ndgrigorian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked over the PR and this LGTM, I think that is a good change overall, and bring tangible performance improvements to the project.

Thank you for the work on this @oleksandr-pavlyk !

Copy link

github-actions bot commented Jun 5, 2024

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_60 ran successfully.
Passed: 889
Failed: 12
Skipped: 91

@oleksandr-pavlyk oleksandr-pavlyk merged commit e24e263 into master Jun 5, 2024
60 checks passed
@oleksandr-pavlyk oleksandr-pavlyk deleted the memory-work branch June 5, 2024 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants