You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We really appreciate the great turnout and engagement during our AAAI 25 tutorial! Given that it was a highly technical talk, we’re genuinely glad that the audience received our message well and asked so many great questions.
I did my best to include as many citations as possible, but during the talk, I also mentioned several works on the fly. Some were brought up spontaneously, so I can’t recall all of them, and others I might forgot to mention but still nevertheless some good reads; so here are some relevant ones:
State of GPT | BRK216HFS | Andrej Karpathy
A great source for understanding the stages of LLM training, along with an excellent introduction to LLM training basics.
Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis
Covers KV cache challenges in long context scenarios.
SnapKV: LLM Knows What You are Looking for Before Generation
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
Mentioned in the context of more modern token-dropping-like methods that are NIAH-capable.
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
TurboAttention: Efficient Attention Approximation For High Throughput LLMs
MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
We really appreciate the great turnout and engagement during our AAAI 25 tutorial! Given that it was a highly technical talk, we’re genuinely glad that the audience received our message well and asked so many great questions.
As promised, here are the slides: https://github.com/henryzhongsc/longctx_bench/blob/main/visualization/slides/aaai25_tutorial_tq08.pdf. This version is slightly different from the one we used in the talk — mostly just removing some joke slides to maintain a more serious online presence and fixing a few typos (since the one we used somehow wasn’t the final version).
I did my best to include as many citations as possible, but during the talk, I also mentioned several works on the fly. Some were brought up spontaneously, so I can’t recall all of them, and others I might forgot to mention but still nevertheless some good reads; so here are some relevant ones:
A great source for understanding the stages of LLM training, along with an excellent introduction to LLM training basics.
Covers KV cache challenges in long context scenarios.
Mentioned in the context of more modern token-dropping-like methods that are NIAH-capable.
As nice complements with KIVI.
An in-depth tutorial on linear attention. The latter explores how linear attention perform at scale.
Related to questions about positional embeddings and their robustness to different types of modifications.
The text was updated successfully, but these errors were encountered: