ci: reduce binary size #172

yzh119 · 2024-03-11T04:23:13Z

Do not generate prefill kernels for page_size=8
Build with -Xfatbin=-compress-all to reduce binary size.

Followup of #171 , @Qubitium the cuda architectures to be compiled could be controlled by environment variable TORCH_CUDA_ARCH_LIST, so I removed the gencode/archs specified in compile args.

…n dispatch (#173) The release action [failed](https://github.com/flashinfer-ai/flashinfer/actions/runs/8227731974/job/22501369048) because [action-gh-release](https://github.com/softprops/action-gh-release) action do not support uploading multiple large files at a time: softprops/action-gh-release#353 This PR changes the behavior of release job to upload artifacts in multiple batches. Also, #172 removes the instantiation of page prefill kernels for `page_size=8`, this PR fixes the behavior of `DISPATCH_PAGE_SIZE` by removing corresponding branches.

yzh119 added 2 commits March 11, 2024 04:17

upd

99b864d

upd

7a60f51

yzh119 merged commit bd5b60a into main Mar 11, 2024

yzh119 mentioned this pull request Mar 11, 2024

bugfix: Fix release wheel script and remove uninstantiated branches in dispatch #173

Merged

MasterJH5574 deleted the reduce-binary-size branch March 12, 2024 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: reduce binary size #172

ci: reduce binary size #172

yzh119 commented Mar 11, 2024

ci: reduce binary size #172

ci: reduce binary size #172

Conversation

yzh119 commented Mar 11, 2024