Papers-Reading Algorithm Karp's 21 NP-complete problems A Simple Near-Linear Pseudopolynomial Time Randomized Algorithm for Subset Sum Palindromic Tree https://arxiv.org/abs/1506.04862 An Introduction to Quantum Computing, Without the Physics https://arxiv.org/abs/1708.03684 The Berlekamp-Massey Algorithm revisited http://hlombardi.free.fr/publis/BMAvar.pdf Video Stabilization Algorithm https://github.com/alex04072000/FuSta LLM Date Paper Key Words 2017.6.12 Attention Is All You Need Transformer & Attention 2022.5.27 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Flash Attention 2022.8.15 LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale LLM.int8 2023.7.18 FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning Flash Attention 2 2024.3.19 When Do We Not Need Larger Vision Models? Scaling on Scales 2024.7.10 PaliGemma: A versatile 3B VLM for transfer Google small VLM: Paligemma 2024.7.12 FlashAttention-3 is optimized for Hopper GPUs (e.g. H100) Flash Attention 3 2024.7.28 Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and Insights Advertising with Multimodal 2024.8.22 NanoFlow: Towards Optimal Large Language Model Serving Throughput A novel serving framework: NanoFlow 2024.10.3 SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Sage Attention 2024.11.17 SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization Sage Attention 2