Add real page pool tests for trie_attention_cache #902

renxida · 2025-02-03T21:14:34Z

Previously, we were testing with mocked page pools so the tests run faster. In this PR, I split trie_attention_cache_tests.py into 2 files:

trie_attention_cache/mock_pool_tests.py contains the old tests, and we continue to test with a mocked-up page pool to verify that the trie correctly does accounting for the pages and the evictions.

trie_attention_cache/real_pool_tests.py will contain new tests for page-copying prefix sharing, so that we won't have to recompute the entire last page's worth of KV if branching on a token. Since we're copying the page, the tests will need to not mock the page pool and actually allocate the buffer, which will make them slower. I opted to do this separately from the old tests so that we won't have to take 5-ish seconds to set up the buffer for each of the 30 ish tests.

This PR also replaces some of the nuisance print statements with logging.debug.

~~This is a step on the way to implement beam search (required by MLPerf).~~

Edit: MLPerf only requires beam search for GPT-J. Thanks @stbaione

shortfin/tests/apps/llm/components/kvcache/trie_attention_cache/real_pool_tests.py

renxida force-pushed the test-page-copying branch from a46d26a to 7ee7d79 Compare February 4, 2025 00:34

put test in a spearate no-mock file

0caa092

renxida force-pushed the test-page-copying branch from 54454e0 to 0caa092 Compare February 4, 2025 00:56

renxida added 2 commits February 4, 2025 00:56

remove unused imports

92d1d6e

add docstrings and make mock_pool_tests use logger

e3eb832

renxida changed the title ~~Add tests for partial-page caching in trie_attention_cache_test.py~~ Add real page pool tests for trie_attention_cache Feb 4, 2025

renxida added 2 commits February 4, 2025 01:07

rename to match shortfin/python dir structure

0907ddb

clean up some unnecessary dependencies copied over from the old tests

6e7cc74

renxida marked this pull request as ready for review February 4, 2025 01:29

renxida requested a review from stbaione February 4, 2025 01:29

stbaione reviewed Feb 4, 2025

View reviewed changes

shortfin/tests/apps/llm/components/kvcache/trie_attention_cache/real_pool_tests.py Show resolved Hide resolved

renxida requested a review from stbaione February 4, 2025 16:42

stbaione approved these changes Feb 4, 2025

View reviewed changes

renxida merged commit 17c8369 into nod-ai:main Feb 5, 2025
36 of 37 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add real page pool tests for trie_attention_cache #902

Add real page pool tests for trie_attention_cache #902

renxida commented Feb 3, 2025 •

edited

Loading

Add real page pool tests for trie_attention_cache #902

Add real page pool tests for trie_attention_cache #902

Conversation

renxida commented Feb 3, 2025 • edited Loading

renxida commented Feb 3, 2025 •

edited

Loading