-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Branch][LLM Testing] Full Testing Harness for LLMs #1216
Conversation
* initial commit * finish creation of helper objects * Update tests/conftest.py * small refactor * [Feature Branch][LLM Testing] LLM Testing Suite (#1227) * Update README.md * Update src/deepsparse/yolov8/README.md * Update text_generation.py * quality * readability * all tests passing * added some full kv cache tests * initial commit * ready for review * Delete tests/deepsparse/transformers/pipelines/proposal_text_generation_tests.md
tests/deepsparse/transformers/pipelines/test_text_generation.py
Outdated
Show resolved
Hide resolved
tests/deepsparse/transformers/pipelines/test_text_generation.py
Outdated
Show resolved
Hide resolved
tests/deepsparse/transformers/pipelines/test_text_generation.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How confident are we in our test coverage? Possibly add tests when running with deterministic
off or multiple input sequences?
…ithub.com/neuralmagic/deepsparse into feature/damian/llm_testing_feature_branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Remove ORT ground truth class and use deepsparse pipeline instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. as discussed offline - will need some refactors to move cleanly to a config based method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Support for getting this to run on a nightly basis is still pending?
The implementation of the test harness for LLMs. By default, the tests are turned off so that we do not choke GHA.
To enable tests: remove
@pytest.mark.skip(reason="Those tests are too heavy to run as a normal part of the CI.")
@pytest.mark.skip(reason="Those tests are too heavy to run as a normal part of the CI.")
and run
pytest tests/deepsparse/transformers/pipelines/test_text_generation.py
Future consideration: adding config and utilizing small toy models to make tests extremely lightweight.
Includes PRs: