[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3)#12222
Merged
simon-mo merged 7 commits intovllm-project:mainfrom jinzhen-lin:optimize_moe_align_block_sizeJan 21, 2025
+58-37
Commits
Commits on Jan 20, 2025
- committed
- committed
- committed
- committed
- authored