Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BACKEND] Add Address Sanitizer Pass #5127

Merged
merged 45 commits into from
Jan 3, 2025
Merged

Conversation

CRobeck
Copy link
Contributor

@CRobeck CRobeck commented Nov 12, 2024

Add address sanitizer pass to LLVM pass pipeline.

Known limitations for this PR:

  • Currently only the AMD backend is supported
  • Source code line support not implemented here, coming in follow up patch

@CRobeck CRobeck changed the title [Draft][Backend] Add Address Sanitizer Pass [Draft][Backend] Add LLVM Level Address Sanitizer Pass Nov 12, 2024
@CRobeck CRobeck changed the title [Draft][Backend] Add LLVM Level Address Sanitizer Pass [Draft][Backend] Add Address Sanitizer Pass Nov 12, 2024
@CRobeck
Copy link
Contributor Author

CRobeck commented Nov 15, 2024

@antiagainst can you update LLVM hash to atleast:
llvm/llvm-project@bd9145c

@antiagainst
Copy link
Collaborator

First step: #5179.

@antiagainst
Copy link
Collaborator

Second step: #5180

@CRobeck CRobeck changed the title [Draft][Backend] Add Address Sanitizer Pass [Draft][BACKEND] Add Address Sanitizer Pass Nov 22, 2024
@AlexAUT
Copy link
Contributor

AlexAUT commented Nov 29, 2024

Do not merge. Depends on #5230

@CRobeck #5230 is merged

@CRobeck CRobeck changed the title [Draft][BACKEND] Add Address Sanitizer Pass [BACKEND] Add Address Sanitizer Pass Dec 2, 2024
@CRobeck CRobeck marked this pull request as ready for review December 2, 2024 21:07
.github/workflows/integration-tests.yml Outdated Show resolved Hide resolved
.github/workflows/integration-tests.yml Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
include/triton/Tools/Sys/GetEnv.hpp Outdated Show resolved Hide resolved
third_party/amd/backend/compiler.py Show resolved Hide resolved
python/test/unit/test_address_sanitizer.py Outdated Show resolved Hide resolved
python/test/unit/address_sanitizer_helper.py Show resolved Hide resolved
python/src/llvm.cc Outdated Show resolved Hide resolved
python/src/llvm.cc Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
@CRobeck
Copy link
Contributor Author

CRobeck commented Dec 13, 2024

@antiagainst don't merge this yet. @AlexAUT is looking into why this seems to sometimes hang the MI300 CI. So far it appears random so we need to look into it a bit.

@@ -304,7 +304,8 @@ def compile(src, target=None, options=None):
# This is needed to safely finalize threads pool inside context: if current process forks before
# python GC deletes context object, thread pool in child process will be invalid, which could
# lead to child crash or hang.
context.disable_multithreading()
if not os.environ.get("TRITON_ENABLE_ASAN", "0") == "1":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we enable multi-threading?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling multi-threading causes ASAN to create a deadlock. We're still hunting down why.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a TODO here to mention the issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@CRobeck
Copy link
Contributor Author

CRobeck commented Dec 23, 2024

@antiagainst this commit from ~6 months ago:
#4169
disables multithreading in the MLIR context - this causes the ASAN pass to hang (i.e. if we don't have multithreading enabled). I don't have a good explanation for this outside of the MLIR/LLVM pass manager being multithreaded by default. What are your thoughts about having this be enabled only if we invoked the ASAN pass?

@@ -304,7 +304,8 @@ def compile(src, target=None, options=None):
# This is needed to safely finalize threads pool inside context: if current process forks before
# python GC deletes context object, thread pool in child process will be invalid, which could
# lead to child crash or hang.
context.disable_multithreading()
if not os.environ.get("TRITON_ENABLE_ASAN", "0") == "1":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a TODO here to mention the issue?

python/test/unit/test_address_sanitizer.py Show resolved Hide resolved
memory leak and out of bounds access detection. Currently only supported on the AMD
backend. This must be run using the ASAN libraries documented [here](https://rocm.docs.amd.com/projects/llvm-project/en/latest/conceptual/using-gpu-sanitizer.html).

When enabling the address sanitizer it is recommended to disable various memory caching strategies
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can simplify by pointing to the test: "When enabling the address sanitizer it is recommended to disable various memory caching strategies both within the ROCm stack and PyTorch. See this test for it."

This way it's also always up to date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to do this after this is merged since it will immediately create a broken sym link and break the CI.

@CRobeck CRobeck force-pushed the rocm_asan branch 2 times, most recently from 7d2ea8c to b02ebad Compare January 3, 2025 00:48
@CRobeck CRobeck requested a review from antiagainst January 3, 2025 00:52
@antiagainst antiagainst merged commit dc261bf into triton-lang:main Jan 3, 2025
7 checks passed
@CRobeck CRobeck deleted the rocm_asan branch January 8, 2025 02:36
makslevental pushed a commit to makslevental/triton that referenced this pull request Jan 13, 2025
Add address sanitizer pass to LLVM pass pipeline. 

Known limitations for this PR:
- Currently only the AMD backend is supported
- Source code line support not implemented here,
  coming in follow up patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants