Skip to content

Commit

Permalink
[PYTHON][KVCACHE] Enhance the thread limit for opencl (#2216)
Browse files Browse the repository at this point in the history
It improves 2x time for tir based page attention for opencl adreno.
  • Loading branch information
krishnaraj36 authored Apr 25, 2024
1 parent 55b5c00 commit 85fffee
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion python/mlc_llm/nn/kv_cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -887,7 +887,7 @@ def _attention_decode(
THREAD_LIMIT = 512
TILE_SIZE_PER_BDX = 2
if target.kind.name == "opencl" and "android" in str(target.host):
THREAD_LIMIT = 64
THREAD_LIMIT = 256
TILE_SIZE_PER_BDX = 1
max_num_threads_per_block = get_max_num_threads_per_block(target)
thread_limit = min(max_num_threads_per_block, THREAD_LIMIT)
Expand Down

0 comments on commit 85fffee

Please sign in to comment.