-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][HIP] Add AMDGPU reflect pass to choose between safe and unsafe AMDGPU atomics #11467
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely happy about introducing another reflect pass, but I appreciate that we'll likely get better perf than unconditionally prefetching.
What happens to __oclc_amdgpu_reflect if this pass isn't run?
The func will remain in the module and a linking error will result at some stage. Which is the same behaviour as the LLVM reflect pass |
379d3a6
to
e5657aa
Compare
e5657aa
to
b4617ef
Compare
b4617ef
to
57ae613
Compare
5a8c9ac
to
9051df1
Compare
1872497
to
bad104b
Compare
0eda761
to
982ea7a
Compare
982ea7a
to
4d17f4f
Compare
df15e57
to
7f2771b
Compare
- Change getNumOperands to arg size - Move size check above the for loop
@frasercrmck all your comments have been addressed. Thanks for review! In terms of function vs module pass - I understand that it might be slightly more optimal to make this a module pass, however I also think for comprehensibility this pass should not diverge too much from NVVMReflect, which is a func pass. Let me know if you think that this is OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Friendly ping @intel/dpcpp-tools-reviewers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good modulo some minor code nits
- Use a vector of CallInsts instead of Instructions. - Change assert(fasle) to report_fatal_error.
Thanks @ldrumm changes made |
6c5ef4e
to
c4ab1ed
Compare
- Use auto - Use drop_back to remove null byte - Replace hip_be with hip
Change opt test to use update_test_checks.py
AMDGPU reflect pass is needed to choose between safe and unsafe atomics
at the libclc level. In the long run we will delete this patch as work
is being done to ensure correct lowering of atomic instructions. See
patches:
llvm/llvm-project#85052
llvm/llvm-project#69229
This work is necessary as malloc shared atomics rely on PCIe atomics
which can have patchy and unreliable support. Therefore, we want to be
able to choose at compile time whether we should use safe atomics using
CAS (which PCIe should support), or if we want to rely of the
availability of the newest PCIe atomics, if malloc shared atomics are
desired.
Also changes the implementation of
atomic_or
,atomic_and
so that theycan choose between the safe or unsafe version based on the AMDGPU
reflect value.