[LAYOUTS] Cache LinearLayout creation #5542

lezcano · 2025-01-06T22:57:19Z

It was reported that triton compilation times have heavily increased
lately. The cause of this is that we very often create the associated LL
to check properties of a given Layout. We do this thousands of times,
and this gets very expensive.

In this PR, we implement a thread-safe cache for LinearLayouts. We clear this
cache after we are done with the TTGIR -> LLVM conversion.

In the future, we will make DistributedEncoding inherit from
LinearLayoutEncoding, which will mean that DistributedEncodings
will always have access to their associated LinearLayout. Even in this
scenario I still think that caching will be good, as there is no real
1-to-1 correspondence between DistributedEncodings and LinearLayouts
due to broadcasting, where we tile a layout along the tensor or we make
it smaller. As such, I think this cache may be also useful in the
future.

lib/Dialect/TritonGPU/IR/LinearLayoutConversions.cpp

third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/TritonGPUToLLVM.cpp

lib/Dialect/TritonGPU/IR/LinearLayoutConversions.cpp

peterbell10 · 2025-01-06T23:41:27Z

Once CI is passing, it would be good to measure the compile time difference before and after for a few different kernels.

lib/Dialect/TritonGPU/IR/LinearLayoutConversions.cpp

lezcano · 2025-01-06T23:50:06Z

Agreed. In the kernel I was benchmarking, compilation time went from 5 min to 2.30 min. This is in line with the issue reported by Meta, where they had seen a 2x increase in compilation times in the last few months Will benchmark others.

It was reported that triton compilation times have heavily increased lately. The cause of this is that we very often create the associated LL to check properties of a given Layout. We do this thousands of times, and this gets very expensive. In this PR, we implement a thread-safe cache for LinearLayouts In the future, we will make `DistributedEncoding` inherit from `LinearLayoutEncoding`, which will mean that `DistributedEncoding`s will always have access to their associated LinearLayout. Even in this scenario I still think that caching will be good, as there is no real 1-to-1 correspondence between `DistributedEncoding`s and `LinearLayout`s due to broadcasting, where we tile a layout along the tensor or we make it smaller. As such, I think this cache may be also useful in the future.

include/triton/Dialect/TritonGPU/IR/TritonGPUDialect.td

lib/Dialect/TritonGPU/IR/LinearLayoutConversions.cpp

lezcano · 2025-01-07T16:13:36Z

In the internal test it went from 9:11 min to 4:11 min.
Running test_core.py with 32 cores just went down from 1:38min to 1:31min tho, which makes me believe that the internal test was a particularly tricky example for the compiler.

pawelszczerbuk · 2025-01-07T17:31:56Z

include/triton/Dialect/TritonGPU/IR/Dialect.h

+    std::shared_lock lock(mutex);
+    auto it = cache.find(key);
+    if (it != cache.end()) {
+      return it->second;


Just curious, why not add entry to cache on successful find, but leave it up to the user of API?

I just wanted to keep the cache reasonably general, rather than have it depend on toLinearLayout, to split the responsabilities.

Is it for Peter's question instead of Pawel's? :)

IMO the get and set methods are separated because third-party backends may not have defined a corresponding linear layout conversion for some of their legacy layouts

peterbell10 · 2025-01-08T12:29:24Z

include/triton/Dialect/TritonGPU/IR/Dialect.h

+
+private:
+  std::unordered_map<CacheKey, LinearLayout> cache;
+  llvm::sys::SmartRWMutex<true> mutex;


NIT: Since you're not actually calling lock_shared this could just be a plain mutex.

Suggested change

llvm::sys::SmartRWMutex<true> mutex;

llvm::sys::SmartMutex<true> mutex;

I'm calling shared_lock in the get method

Ah didn't see that the guard variable had changed type.

lezcano requested a review from ptillet as a code owner January 6, 2025 22:57

lezcano requested review from Jokeren and peterbell10 January 6, 2025 22:57

lezcano changed the title ~~Cache LinearLayout creation~~ [LAYOUTS] Cache LinearLayout creation Jan 6, 2025

Jokeren reviewed Jan 6, 2025

View reviewed changes

lib/Dialect/TritonGPU/IR/LinearLayoutConversions.cpp Outdated Show resolved Hide resolved

third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/TritonGPUToLLVM.cpp Outdated Show resolved Hide resolved

lezcano force-pushed the caching branch from 3c766a6 to aeb7a3c Compare January 6, 2025 23:32

lezcano requested review from antiagainst and zhanglx13 as code owners January 6, 2025 23:32

peterbell10 reviewed Jan 6, 2025

View reviewed changes

lib/Dialect/TritonGPU/IR/LinearLayoutConversions.cpp Outdated Show resolved Hide resolved

jeffniu-openai reviewed Jan 6, 2025

View reviewed changes

lib/Dialect/TritonGPU/IR/LinearLayoutConversions.cpp Outdated Show resolved Hide resolved

lib/Dialect/TritonGPU/IR/LinearLayoutConversions.cpp Outdated Show resolved Hide resolved

lezcano force-pushed the caching branch from aeb7a3c to 6cf2a29 Compare January 7, 2025 00:42

lezcano added 2 commits January 7, 2025 01:47

Cache should not be static lol

5609a2c

compile

e555b29

lezcano commented Jan 7, 2025

View reviewed changes

include/triton/Dialect/TritonGPU/IR/TritonGPUDialect.td Outdated Show resolved Hide resolved

peterbell10 reviewed Jan 7, 2025

View reviewed changes

lib/Dialect/TritonGPU/IR/LinearLayoutConversions.cpp Show resolved Hide resolved

pawelszczerbuk reviewed Jan 7, 2025

View reviewed changes

Address review

fafd655

peterbell10 approved these changes Jan 8, 2025

View reviewed changes

lezcano merged commit 67f5707 into main Jan 8, 2025
7 checks passed

lezcano deleted the caching branch January 8, 2025 13:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LAYOUTS] Cache LinearLayout creation #5542

[LAYOUTS] Cache LinearLayout creation #5542

lezcano commented Jan 6, 2025 •

edited

Loading

peterbell10 commented Jan 6, 2025

lezcano commented Jan 6, 2025 •

edited

Loading

lezcano commented Jan 7, 2025 •

edited

Loading

pawelszczerbuk Jan 7, 2025

lezcano Jan 7, 2025

Jokeren Jan 7, 2025

Jokeren Jan 7, 2025

peterbell10 Jan 8, 2025

lezcano Jan 8, 2025

peterbell10 Jan 8, 2025

	llvm::sys::SmartRWMutex<true> mutex;
	llvm::sys::SmartMutex<true> mutex;

[LAYOUTS] Cache LinearLayout creation #5542

[LAYOUTS] Cache LinearLayout creation #5542

Conversation

lezcano commented Jan 6, 2025 • edited Loading

peterbell10 commented Jan 6, 2025

lezcano commented Jan 6, 2025 • edited Loading

lezcano commented Jan 7, 2025 • edited Loading

pawelszczerbuk Jan 7, 2025

Choose a reason for hiding this comment

lezcano Jan 7, 2025

Choose a reason for hiding this comment

Jokeren Jan 7, 2025

Choose a reason for hiding this comment

Jokeren Jan 7, 2025

Choose a reason for hiding this comment

peterbell10 Jan 8, 2025

Choose a reason for hiding this comment

lezcano Jan 8, 2025

Choose a reason for hiding this comment

peterbell10 Jan 8, 2025

Choose a reason for hiding this comment

lezcano commented Jan 6, 2025 •

edited

Loading

lezcano commented Jan 6, 2025 •

edited

Loading

lezcano commented Jan 7, 2025 •

edited

Loading