Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some improvements for KV caching #1891

Merged
merged 3 commits into from
Dec 31, 2024

Conversation

mseeger
Copy link
Contributor

@mseeger mseeger commented Dec 26, 2024

  • Ensure that KVCache buffers are only as large as config.n_query_groups
  • Shrink buffers returned by KVCache to just cover input_pos entries
  • Clean up children of classes in model.py, in particular remove forward copies

@mseeger mseeger force-pushed the kvcache_improvements4 branch from 69d6d6f to a65a96d Compare December 27, 2024 13:40
@mseeger
Copy link
Contributor Author

mseeger commented Dec 27, 2024

Can somebody help with failing tests? I don't understand why tests for Windows are failing, but pass for all other systems. And I also don't understand why the GPU tests are failing.

@Andrei-Aksionov
Copy link
Collaborator

Hello @mseeger

Thank you for another PR.

Can somebody help with failing tests? I don't understand why tests for Windows are failing

yeah, there is always something with Windows.
Perhaps we are lucky and a simple torch bump from #1893 might help 🤷

And I also don't understand why the GPU tests are failing.

I'll check it tomorrow.

litgpt/adapter.py Show resolved Hide resolved
litgpt/adapter.py Outdated Show resolved Hide resolved
litgpt/adapter.py Show resolved Hide resolved
litgpt/adapter_v2.py Show resolved Hide resolved
litgpt/model.py Outdated Show resolved Hide resolved
litgpt/model.py Outdated Show resolved Hide resolved
litgpt/model.py Show resolved Hide resolved
litgpt/model.py Outdated Show resolved Hide resolved
litgpt/model.py Outdated Show resolved Hide resolved
litgpt/model.py Show resolved Hide resolved
@Andrei-Aksionov
Copy link
Collaborator

Hello @mseeger

It's quite a PR 🫠 🙂.
I left a couple of comments.
Overall it looks really good. I like how PEFT variants now look like, allows focusing on differences easily now 😊.

(I'll take a look why GPU tests are failing later.)

@mseeger mseeger force-pushed the kvcache_improvements4 branch from 7f2c2ce to 3226323 Compare December 31, 2024 10:21
@mseeger
Copy link
Contributor Author

mseeger commented Dec 31, 2024

OK, I reacted to comments. I also did a small change in lora.py, where the CausalSelfAttention.__init__ was still copy and paste, now it calls the superclass init.

litgpt/model.py Outdated Show resolved Hide resolved
@Andrei-Aksionov
Copy link
Collaborator

OK, I reacted to comments. I also did a small change in lora.py, where the CausalSelfAttention.init was still copy and paste, now it calls the superclass init.

Cool, we are almost there 🙂.
There are a couple of unresolved comments left.

On my side I'll try to find and fix issues with failing GPU tests, hopefully this year 😃.

- Shrink buffers returned by KVCache to just cover input_pos entries
- Refactor child classes of model.py classes to avoid copy and paste
@mseeger mseeger force-pushed the kvcache_improvements4 branch from 3226323 to 3702b03 Compare December 31, 2024 12:50
@Andrei-Aksionov
Copy link
Collaborator

Overall, the issue with GPU+Thunder is something specific to the latter.
I'll merge the PR as is and later discuss it with Thunder team.

Thanks again for the PR (and for the patience 😊).

Happy New Year! 🚀

@Andrei-Aksionov Andrei-Aksionov merged commit 17a58df into Lightning-AI:main Dec 31, 2024
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants