Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMD] Disable block merging to avoid block argument explosion #4176

Merged
merged 2 commits into from
Jun 21, 2024

Conversation

giuseros
Copy link
Contributor

@giuseros giuseros commented Jun 20, 2024

This PR disable block merging when running convert-builtin-func-to-llvm.

The reason behind this is that for now block merging can double the arguments of the blocks. This means that after a while we can start witnessing a block argument "explosion" which hangs the compiler.

I am working on this ticket: llvm/llvm-project#63230 to make block merging better, but in the meantime, we should stop merging blocks to avoid compiler hangs.

I added the minimal test to reproduce the explosion. The test for now is checking that we don't try to merge blocks.

@giuseros giuseros marked this pull request as draft June 20, 2024 10:32
@giuseros
Copy link
Contributor Author

cc @antiagainst @zhanglx13 for review

@antiagainst antiagainst marked this pull request as ready for review June 21, 2024 14:51
@antiagainst antiagainst requested review from ThomasRaoux and zahimoud and removed request for ptillet June 21, 2024 14:51
@antiagainst
Copy link
Collaborator

@zahimoud and @ThomasRaoux FYI.

@antiagainst antiagainst changed the title Disable block merging to avoid block argument explosion [AMD] Disable block merging to avoid block argument explosion Jun 21, 2024
@antiagainst antiagainst merged commit cf2ad02 into triton-lang:main Jun 21, 2024
6 checks passed
@zhanglx13
Copy link
Collaborator

@giuseros I ran the test without this PR and it also passed. Can you double check?

zhanglx13 pushed a commit that referenced this pull request Jun 22, 2024
…4186)

This is a follow-up to
#4176 (comment)

I am now counting the number of blocks with (17) and without (31) block
merging. I double checked to make sure this does not pass when we use an
aggressive region simplification strategy.
Jokeren pushed a commit that referenced this pull request Jul 1, 2024
…4186)

This is a follow-up to
#4176 (comment)

I am now counting the number of blocks with (17) and without (31) block
merging. I double checked to make sure this does not pass when we use an
aggressive region simplification strategy.
Jokeren added a commit that referenced this pull request Jul 3, 2024
Update

Update

Update

Update

Add a more meaningful check to make sure we are not merging blocks (#4186)

This is a follow-up to
#4176 (comment)

I am now counting the number of blocks with (17) and without (31) block
merging. I double checked to make sure this does not pass when we use an
aggressive region simplification strategy.

[AMD] Skip mfma layout in maybeDuplicate (#4170)

The workaround introduced in
#4048 "forgot" to skip mfma
layout.

[TEST] Merge duplicate `max_num_imprecise_acc` tests and improve code (#4191)

[DOCS][NFC] Fix doc formatting problems (#4195)

1. f-string cannot be used as docstrings in Python.
2. URLs should follow the reStructuredText format.
3. Code snippets in a code block should be indented.

Tested and passed on a local machine.

[BACKEND] Fix regression in pipeliner pre-checks. (#4196)

During some previous refactoring we changed the logic and started
pipeling cases that had incompatible shared encoding. This was missed
because one of the lit test had not been updated :(

Remove tl.multiple_of call from tma persistent kernel (#4198)

[AMD] Guard against null in `BypassEpilogueSMEM` (#4203)

`val.getDefiningOp()` can return `nullptr`. In this case, we must fail
the `BypassEpilogueSMEM` rewrite pass for the given op. This prevents
run-time crashes.

[FRONTEND][NFC] Fix type checking, conditional logic, and loop structures for improved readability and performance (#4208)

Document TRITON_HOME (#4210)

Document the existence of `TRITON_HOME` environment variable.

The `TRITON_HOME` variable controls the location of the `.triton`
directory that stores, among other things, the files downloaded during a
`pip install -e python` virtualenv build. By default, this is located in
the user's home directory, at `~/.triton`.

I was trying to build Triton on my system on a large local disk, but
with limited network home directory space, and the `pip` command kept
failing with out of disk space errors. It turned out that during
installation, large files were downloaded to the `~/.triton` directory
causing failure.

After checking that it was not `pip` doing this, I found the
`TRITON_HOME` variable which allowed me to workaround the issue and
build Triton successfully. After seconding #4007, I decided to
contribute this documentation fix.

Co-authored-by: sree <sree@buckyball>

[BACKEND] Fix regression in i1 reduction (#4215)

Recent refactoring broke i1 shared memory load.

[BUILD] update URL for LLVM tarballs (#4216)

[BACKEND] Fix divisibility analysis for shift ops (#4221)

Divisibility does not ensure that a value is not 0 therefore we cannot
use divisibility as a minimum shifted values.

Support FP8 constant (#4222)

To unblock the compilation of kernels like below which don't operate
arithmetically on FP8.

```
@triton.jit
def triton_poi_fused__scaled_mm__to_copy_constant_pad_nd_lift_fresh_2(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
    xnumel = 400624
    xoffset = tl.program_id(0) * XBLOCK
    xindex = xoffset + tl.arange(0, XBLOCK)[:]
    xmask = xindex < xnumel
    x0 = xindex % 784
    x1 = (xindex // 784)
    x2 = xindex
    tmp0 = x0
    tmp1 = tl.full([1], 769, tl.int64)
    tmp2 = tmp0 < tmp1
    tmp3 = tl.load(in_ptr0 + (x0 + (769*x1)), tmp2 & xmask, other=0.0)
    tmp4 = tmp3.to(tl.float8e4nv)
    tmp5 = tl.full(tmp4.shape, 0.0, tmp4.dtype)
    tmp6 = tl.where(tmp2, tmp4, tmp5)
    tl.store(out_ptr0 + (x2), tmp6, xmask)
```

[INTERPRETER] Implement implicit tensor conversion for assignment operators (#4214)

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update

Update
joviliast added a commit to joviliast/triton that referenced this pull request Sep 2, 2024
joviliast added a commit to joviliast/triton that referenced this pull request Sep 6, 2024
…triton-lang#4176)"

This reverts commit cf2ad02
and enables aggresive strategy to enable block merging.
antiagainst pushed a commit that referenced this pull request Sep 6, 2024
…#4176)" (#4631)

Turn back aggressive strategy by default to enable block merging
given now we have upstream fixes for it brought in:
#4619.

This reverts commit cf2ad02.
bertmaher pushed a commit to bertmaher/triton that referenced this pull request Dec 10, 2024
…-lang#4176)

This PR disable block merging when running
`convert-builtin-func-to-llvm`.

The reason behind this is that for now block merging can double the
arguments of the blocks. This means that after a while we can start
witnessing a block argument "explosion" which hangs the compiler.

I am working on this ticket:
llvm/llvm-project#63230 to make block merging
better, but in the meantime, we should stop merging blocks to avoid
compiler hangs.

I added the minimal test to reproduce the explosion. The test for now is
checking that we don't try to merge blocks.
bertmaher pushed a commit to bertmaher/triton that referenced this pull request Dec 10, 2024
…riton-lang#4186)

This is a follow-up to
triton-lang#4176 (comment)

I am now counting the number of blocks with (17) and without (31) block
merging. I double checked to make sure this does not pass when we use an
aggressive region simplification strategy.
bertmaher pushed a commit to bertmaher/triton that referenced this pull request Dec 10, 2024
…triton-lang#4176)" (triton-lang#4631)

Turn back aggressive strategy by default to enable block merging
given now we have upstream fixes for it brought in:
triton-lang#4619.

This reverts commit cf2ad02.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants