-
Statistics Department of JNU
- Guangzhou, China
-
23:29
(UTC +08:00) - https://github.com/DefTruth
- https://www.zhihu.com/people/qyjdef
Pinned Loading
-
lite.ai.toolkit
lite.ai.toolkit Public๐ A lite C++ toolkit of 100+ Awesome AI models, support ORT, MNN, NCNN, TNN and TensorRT. ๐๐
-
vllm-project/vllm
vllm-project/vllm PublicA high-throughput and memory-efficient inference and serving engine for LLMs
-
Awesome-LLM-Inference
Awesome-LLM-Inference Public๐A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. ๐๐
-
CUDA-Learn-Notes
CUDA-Learn-Notes Public๐150+ Tensor/CUDA Cores Kernels, โก๏ธflash-attn-mma, โก๏ธhgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 ๐๐).
-
Awesome-Diffusion-Inference
Awesome-Diffusion-Inference Public๐A curated list of Awesome Diffusion Inference Papers with codes, such as Sampling, Caching, Multi-GPUs, etc. ๐๐
-
faster-prefill-attention
faster-prefill-attention Public๐[WIP] FFPA: Yet another Faster Flash Prefill Attention with O(1)๐GPU SRAM complexity for headdim > 256, ~1.5x๐faster than SDPA EA.
If the problem persists, check the GitHub status page or contact support.