Papers-Reading

Algorithm

Date	Paper	Key Words
2017.6.12	Attention Is All You Need	Transformer & Attention
2022.5.27	FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness	Flash Attention
2022.8.15	LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale	LLM.int8
2023.7.18	FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning	Flash Attention 2
2024.3.19	When Do We Not Need Larger Vision Models?	Scaling on Scales
2024.7.10	PaliGemma: A versatile 3B VLM for transfer	Google small VLM: Paligemma
2024.7.12	FlashAttention-3 is optimized for Hopper GPUs (e.g. H100)	Flash Attention 3
2024.7.28	Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and Insights	Advertising with Multimodal
2024.8.22	NanoFlow: Towards Optimal Large Language Model Serving Throughput	A novel serving framework: NanoFlow
2024.10.3	SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration	Sage Attention
2024.11.17	SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization	Sage Attention 2

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
Algorithm Papers/A Simple Near-Linear Pseudopolynomial Time Randomized Algorithm for Subset Sum		Algorithm Papers/A Simple Near-Linear Pseudopolynomial Time Randomized Algorithm for Subset Sum
Engineering		Engineering
Graph-theory		Graph-theory
LLM		LLM
Life		Life
Mathematica		Mathematica
NOI National Training Team		NOI National Training Team
Others		Others
Parallel Sorting		Parallel Sorting
Self-Driving		Self-Driving
TopTree		TopTree
kubernetes		kubernetes
language		language
numpy-cheat-sheet		numpy-cheat-sheet
.gitignore		.gitignore
README.md		README.md
test.ipynb		test.ipynb