Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add real page pool tests for trie_attention_cache #902

Merged
merged 5 commits into from
Feb 5, 2025

Conversation

renxida
Copy link
Contributor

@renxida renxida commented Feb 3, 2025

Previously, we were testing with mocked page pools so the tests run faster. In this PR, I split trie_attention_cache_tests.py into 2 files:

trie_attention_cache/mock_pool_tests.py contains the old tests, and we continue to test with a mocked-up page pool to verify that the trie correctly does accounting for the pages and the evictions.

trie_attention_cache/real_pool_tests.py will contain new tests for page-copying prefix sharing, so that we won't have to recompute the entire last page's worth of KV if branching on a token. Since we're copying the page, the tests will need to not mock the page pool and actually allocate the buffer, which will make them slower. I opted to do this separately from the old tests so that we won't have to take 5-ish seconds to set up the buffer for each of the 30 ish tests.

This PR also replaces some of the nuisance print statements with logging.debug.

This is a step on the way to implement beam search (required by MLPerf).

Edit: MLPerf only requires beam search for GPT-J. Thanks @stbaione

@renxida renxida changed the title Add tests for partial-page caching in trie_attention_cache_test.py Add real page pool tests for trie_attention_cache Feb 4, 2025
@renxida renxida marked this pull request as ready for review February 4, 2025 01:29
@renxida renxida requested a review from stbaione February 4, 2025 01:29
@renxida renxida requested a review from stbaione February 4, 2025 16:42
@renxida renxida merged commit 17c8369 into nod-ai:main Feb 5, 2025
36 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants