-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BACKEND] Add Address Sanitizer Pass #5127
Conversation
@antiagainst can you update LLVM hash to atleast: |
First step: #5179. |
Second step: #5180 |
@antiagainst don't merge this yet. @AlexAUT is looking into why this seems to sometimes hang the MI300 CI. So far it appears random so we need to look into it a bit. |
@@ -304,7 +304,8 @@ def compile(src, target=None, options=None): | |||
# This is needed to safely finalize threads pool inside context: if current process forks before | |||
# python GC deletes context object, thread pool in child process will be invalid, which could | |||
# lead to child crash or hang. | |||
context.disable_multithreading() | |||
if not os.environ.get("TRITON_ENABLE_ASAN", "0") == "1": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should we enable multi-threading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabling multi-threading causes ASAN to create a deadlock. We're still hunting down why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a TODO here to mention the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@antiagainst this commit from ~6 months ago: |
@@ -304,7 +304,8 @@ def compile(src, target=None, options=None): | |||
# This is needed to safely finalize threads pool inside context: if current process forks before | |||
# python GC deletes context object, thread pool in child process will be invalid, which could | |||
# lead to child crash or hang. | |||
context.disable_multithreading() | |||
if not os.environ.get("TRITON_ENABLE_ASAN", "0") == "1": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a TODO here to mention the issue?
memory leak and out of bounds access detection. Currently only supported on the AMD | ||
backend. This must be run using the ASAN libraries documented [here](https://rocm.docs.amd.com/projects/llvm-project/en/latest/conceptual/using-gpu-sanitizer.html). | ||
|
||
When enabling the address sanitizer it is recommended to disable various memory caching strategies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can simplify by pointing to the test: "When enabling the address sanitizer it is recommended to disable various memory caching strategies both within the ROCm stack and PyTorch. See this test for it."
This way it's also always up to date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to do this after this is merged since it will immediately create a broken sym link and break the CI.
7d2ea8c
to
b02ebad
Compare
Add address sanitizer pass to LLVM pass pipeline. Known limitations for this PR: - Currently only the AMD backend is supported - Source code line support not implemented here, coming in follow up patch
Add address sanitizer pass to LLVM pass pipeline.
Known limitations for this PR: