Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile memory usage #59

Closed
zhuohan123 opened this issue May 3, 2023 · 0 comments · Fixed by #81
Closed

Profile memory usage #59

zhuohan123 opened this issue May 3, 2023 · 0 comments · Fixed by #81
Assignees
Labels

Comments

@zhuohan123
Copy link
Member

No description provided.

@zhuohan123 zhuohan123 self-assigned this May 3, 2023
@WoosukKwon WoosukKwon added the P0 label May 10, 2023
yukavio pushed a commit to yukavio/vllm that referenced this issue Jul 3, 2024
SUMMARY:
* updates whl generation workflow to add testing and `testmo`
integration
* add top-level generate whls workflow

TEST PLAN:
ran manually ...

---------

Co-authored-by: andy-neuma <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
@alixiaodi alixiaodi mentioned this issue Aug 2, 2024
pi314ever pushed a commit to pi314ever/vllm that referenced this issue Jan 17, 2025
remove expert_max hard code (vllm-project#47)
vLLM-Ext: Full enabling of ALiBi (vllm-project#34)
Add version inference via setuptools-scm (vllm-project#58)
Revert "vLLM-Ext: Full enabling of ALiBi (vllm-project#34)" (vllm-project#59)
Remove punica_hpu.py from vllm_hpu_extension (vllm-project#66)
Removed previous (not-pipelined) pa implementation (vllm-project#72)
Add flag to enable running softmax in fp32 (vllm-project#71)
Update calibration readme link (vllm-project#73)
allow lm_head quantization in calibration process (vllm-project#65)
Pad to bmin if value is less (vllm-project#67)
Update pyproject.toml (HabanaAI#75)

---------

Co-authored-by: Michał Kuligowski <[email protected]>
maxdebayser pushed a commit to maxdebayser/vllm that referenced this issue Feb 13, 2025
This PR enables the Spyre tests to run as a Github action. 

I realized that the model we were using for the tests `llama-194m` is
not available on HF hub, but if we want to run the tests externally we
need to use some model that is available. I've replaced it with this
one: https://huggingface.co/JackFram/llama-160m

Note I haven't actually changed the model name in the tests, I just
"hacked" it for now using a soft link in the docker container. This is
because there is some ongoing work to introduce environment variables to
control the tests and I don't want to complicate things.

For this model I see some quite weird behaviour where the tokens
produced by vLLM and HF Transformers are identical but the decode text
is slightly different (they are the same up to a leading space). I don't
think this difference is related to Spyre so I've just changed the test
to compare token ids instead.

---------

Signed-off-by: Thomas Parnell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants