Some improvements for KV caching #1891

mseeger · 2024-12-26T21:27:13Z

Ensure that KVCache buffers are only as large as config.n_query_groups
Shrink buffers returned by KVCache to just cover input_pos entries
Clean up children of classes in model.py, in particular remove forward copies

mseeger · 2024-12-27T15:28:28Z

Can somebody help with failing tests? I don't understand why tests for Windows are failing, but pass for all other systems. And I also don't understand why the GPU tests are failing.

Andrei-Aksionov · 2024-12-28T19:03:50Z

Hello @mseeger

Thank you for another PR.

Can somebody help with failing tests? I don't understand why tests for Windows are failing

yeah, there is always something with Windows.
Perhaps we are lucky and a simple torch bump from #1893 might help 🤷

And I also don't understand why the GPU tests are failing.

I'll check it tomorrow.

litgpt/adapter.py

litgpt/adapter_v2.py

litgpt/model.py

Andrei-Aksionov · 2024-12-30T21:37:48Z

Hello @mseeger

It's quite a PR 🫠 🙂.
I left a couple of comments.
Overall it looks really good. I like how PEFT variants now look like, allows focusing on differences easily now 😊.

(I'll take a look why GPU tests are failing later.)

mseeger · 2024-12-31T10:21:57Z

OK, I reacted to comments. I also did a small change in lora.py, where the CausalSelfAttention.__init__ was still copy and paste, now it calls the superclass init.

litgpt/model.py

Andrei-Aksionov · 2024-12-31T12:36:20Z

OK, I reacted to comments. I also did a small change in lora.py, where the CausalSelfAttention.init was still copy and paste, now it calls the superclass init.

Cool, we are almost there 🙂.
There are a couple of unresolved comments left.

On my side I'll try to find and fix issues with failing GPU tests, hopefully this year 😃.

- Shrink buffers returned by KVCache to just cover input_pos entries - Refactor child classes of model.py classes to avoid copy and paste

Andrei-Aksionov · 2024-12-31T18:11:51Z

Overall, the issue with GPU+Thunder is something specific to the latter.
I'll merge the PR as is and later discuss it with Thunder team.

Thanks again for the PR (and for the patience 😊).

Happy New Year! 🚀

mseeger requested review from rasbt and lantiga as code owners December 26, 2024 21:27

mseeger force-pushed the kvcache_improvements4 branch from 69d6d6f to a65a96d Compare December 27, 2024 13:40

Andrei-Aksionov reviewed Dec 30, 2024

View reviewed changes

mseeger force-pushed the kvcache_improvements4 branch from 7f2c2ce to 3226323 Compare December 31, 2024 10:21

Andrei-Aksionov reviewed Dec 31, 2024

View reviewed changes

litgpt/model.py Outdated Show resolved Hide resolved

- Ensure that KVCache buffers are only as large as config.n_query_groups

3702b03

- Shrink buffers returned by KVCache to just cover input_pos entries - Refactor child classes of model.py classes to avoid copy and paste

mseeger force-pushed the kvcache_improvements4 branch from 3226323 to 3702b03 Compare December 31, 2024 12:50

Andrei-Aksionov added 2 commits December 31, 2024 16:32

input_pos_maxp1 as torch tensor

1b1b592

Add batch dimension for cos, sin in unsloath test

e20b7d9

Andrei-Aksionov merged commit 17a58df into Lightning-AI:main Dec 31, 2024
8 of 9 checks passed

mseeger mentioned this pull request Jan 3, 2025

Improvements of KVCache and refactoring of subclasses of classes in model.py #1867

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some improvements for KV caching #1891

Some improvements for KV caching #1891

mseeger commented Dec 26, 2024 •

edited

Loading

mseeger commented Dec 27, 2024

Andrei-Aksionov commented Dec 28, 2024

Andrei-Aksionov commented Dec 30, 2024

mseeger commented Dec 31, 2024

Andrei-Aksionov commented Dec 31, 2024

Andrei-Aksionov commented Dec 31, 2024

Some improvements for KV caching #1891

Some improvements for KV caching #1891

Conversation

mseeger commented Dec 26, 2024 • edited Loading

mseeger commented Dec 27, 2024

Andrei-Aksionov commented Dec 28, 2024

Andrei-Aksionov commented Dec 30, 2024

mseeger commented Dec 31, 2024

Andrei-Aksionov commented Dec 31, 2024

Andrei-Aksionov commented Dec 31, 2024

mseeger commented Dec 26, 2024 •

edited

Loading