DeepSeek Native Sparse Attention pytorch implementation(Non-Official)
【手撕NSA】DeepSeek新作-原生稀疏注意力-超长文(附代码)
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
DeepSeek Native Sparse Attention pytorch implementation(Non-Official)
【手撕NSA】DeepSeek新作-原生稀疏注意力-超长文(附代码)
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention