Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge OpenAI Triton commit f436c9e #3124

Merged
merged 5 commits into from
Jan 9, 2025
Merged

Conversation

whitneywhtsang
Copy link
Contributor

This PR change the Triton base from 51dddd3 to f436c9e (Jan 8).
Pass rate: 99.86%

Please do not squash and merge this PR.

lezcano and others added 4 commits January 8, 2025 14:22
It was reported that triton compilation times have heavily increased
lately. The cause of this is that we very often create the associated LL
to check properties of a given Layout. We do this thousands of times,
and this gets very expensive.

In this PR, we implement a thread-safe cache for LinearLayouts. We clear
this
cache after we are done with the TTGIR -> LLVM conversion.

In the future, we will make `DistributedEncoding` inherit from
`LinearLayoutEncoding`, which will mean that `DistributedEncoding`s
will always have access to their associated LinearLayout. Even in this
scenario I still think that caching will be good, as there is no real
1-to-1 correspondence between `DistributedEncoding`s and `LinearLayout`s
due to broadcasting, where we tile a layout along the tensor or we make
it smaller. As such, I think this cache may be also useful in the
future.
<!---
The core Triton is a small number of people, and we receive many PRs
(thank
you!).  To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the
following
tasks and include the filled-out checklist in your PR description.**

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them.
-->

Currently, torch is required for importing triton and performing
autotuning. This seems like a relatively heavy runtime dependency in the
context of the cpu backend, as numpy can easily be used instead.

Opening here as suggested in
triton-lang/triton-cpu#205 to minimize future
merge conflicts.

Ideally there would be a test for this, but with the cpu backend
out-of-tree this seems hard to test.

See also triton-lang/triton-cpu#204,
triton-lang/triton-cpu#205.

# New contributor declaration
- [x] I am not making a trivial change, such as fixing a typo in a
comment.

- [x] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).

- [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- Select one of the following.
  - [ ] I have added tests.
    - `/test` for `lit` tests
    - `/unittest` for C++ tests
    - `/python/test` for end-to-end tests
- [x] This PR does not need a test because not (currently) easy to test
and basic functionality should be covered by existing tests.

- Select one of the following.
  - [x] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
    and using the instructions it generates is not minimal.)
Unifying lowering path of `local_alloc` with `local_store` for the case
shared mem layout has `leadingOffset`.
@whitneywhtsang whitneywhtsang self-assigned this Jan 9, 2025
@whitneywhtsang whitneywhtsang marked this pull request as ready for review January 9, 2025 18:11
@whitneywhtsang whitneywhtsang merged commit a15b458 into main Jan 9, 2025
5 of 6 checks passed
@whitneywhtsang whitneywhtsang deleted the whitneywhtsang/merge branch January 9, 2025 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants