-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model]: Add transformers
backend support
#11330
Merged
+528
−9
Merged
Changes from 55 commits
Commits
Show all changes
105 commits
Select commit
Hold shift + click to select a range
0bb5519
Merge
ArthurZucker 8e238f7
Merge branch 'main' into transformers-backend
ArthurZucker 6d8f1fd
revert some changes
ArthurZucker fb37617
changes are now merged with main of transformers
ArthurZucker 2d0c128
revert more changes
ArthurZucker 31c16a1
Merge remote-tracking branch 'upstream/main' into fix-history
hmellor a49aa81
Undo whitespace changes
hmellor ae2e1cf
Merge remote-tracking branch 'upstream/main' into fix-history
hmellor ff19ade
Update transformers pin
hmellor 038604b
Remove unreachable code
hmellor 882ef81
Remove dead code
hmellor f254f2c
Update to latest attention interface
hmellor 5a1a833
Always try to load `TransformersModel` if model isn't explicitly supp…
hmellor b7de34d
Temporarily remove Llama from registry
hmellor 49c4616
Deduplicate registry code slightly
hmellor 071246d
Fix profiling of Attentions
hmellor 6190591
Run `./format.sh` on `transformers.py`
hmellor 7ae8262
Fix spelling
hmellor 988586d
Undo changes to `chat.py`
hmellor 5313551
tests + md
ArthurZucker f127a03
test helium
ArthurZucker 9baefd2
fix dtype issue
ArthurZucker aff205a
Make model implementation configurable
hmellor 4efcac8
FIx previous commit
hmellor 5d3afac
`format.sh`
hmellor 20f4d48
Handle alternative vocab embed layer names
hmellor 013f880
Undo removel of `LlamaForCausalLM`
hmellor 19dc1f8
Add `RMSNorm` replacement
hmellor 7b5f146
bnb and `SupportsLoRA`
ArthurZucker e1d1e33
:Merge branch 'fix-history' of github.com:ArthurZucker/vllm into fix-…
ArthurZucker c805f9d
Change log
hmellor aadfb1b
Formatting
hmellor 544ba2d
Disable vLLM RMS Norm implementation for now
hmellor 06347f8
Only throw TP error if user is trying to use TP
hmellor 3fe40d1
Add some tests for TransformersModel
hmellor d37fd9b
remove replace norm, cleanup
ArthurZucker 86dc357
Merge branch 'fix-history' of github.com:ArthurZucker/vllm into fix-h…
ArthurZucker 4cbea32
linting and test mark
Isotr0py 96f0a3a
revert example modification
Isotr0py 91e6037
fix wrong llm.model
Isotr0py 754124a
Merge remote-tracking branch 'upstream/main' into fix-history
Isotr0py 554df59
use apply_model
Isotr0py 319cf97
Update docs/source/models/supported_models.md
ArthurZucker 88d679a
Merge branch 'main' into fix-history
ArthurZucker d346637
Update docs/source/models/supported_models.md
ArthurZucker 0f15f09
move the check to normalized arch
ArthurZucker f4c41eb
Merge branch 'fix-history' of github.com:ArthurZucker/vllm into fix-h…
ArthurZucker 2a4fc4f
fix
ArthurZucker ceabb51
revert try inspect changes
ArthurZucker 50b218a
Update test
ArthurZucker c8aac87
style
ArthurZucker f6cb8fe
Merge branch 'main' into fix-history
ArthurZucker 1896af7
style update
ArthurZucker b42e464
Merge branch 'fix-history' of github.com:ArthurZucker/vllm into fix-h…
ArthurZucker 1983511
Merge branch 'main' of https://github.com/vllm-project/vllm into fix-…
ArthurZucker 869934a
fix normalize arch
ArthurZucker df1c8b2
update test, fix gpu marker and remove trust remote as it's True by d…
ArthurZucker ffd6dce
update test
ArthurZucker 9704287
for now use `model_config.hf_config.auto_map["AutoModel"]`
ArthurZucker 9a871af
fix remote models
ArthurZucker 4f33ff8
nits
ArthurZucker fc6a7e9
remove unused kwarg class
ArthurZucker 0ab2f82
fix weight loading
ArthurZucker 44f78ef
fix test
ArthurZucker 4847836
update test!
ArthurZucker 0b348e4
Nits
ArthurZucker 2132dcf
update
ArthurZucker 20bc901
remove print
ArthurZucker 62540f2
update
ArthurZucker cfeaaae
Fix fallback, dict keys != attrs
hmellor ecf2990
cleanup
ArthurZucker 8cbd02e
Merge branch 'fix-history' of github.com:ArthurZucker/vllm into fix-h…
ArthurZucker e30000d
pre-commit
ArthurZucker 7fd638f
nit
ArthurZucker 57c5dbf
Merge remote-tracking branch 'origin/main' into fix-history
hmellor e714c05
pre-commit
hmellor fc62d7d
Remove unused line
hmellor 5475b5b
Remove `kv_caches` and update scale if it's passed
hmellor 255ed6c
eager tests do work for now
ArthurZucker be6f244
Merge branch 'fix-history' of github.com:ArthurZucker/vllm into fix-h…
ArthurZucker 4a855ea
Respond to comments
hmellor e416227
fix failing test on phi: not all remote code have AutoModel
ArthurZucker 8e92304
Merge branch 'fix-history' of github.com:ArthurZucker/vllm into fix-h…
ArthurZucker 7758ea2
Merge branch 'main' of https://github.com/vllm-project/vllm into fix-…
ArthurZucker b74886e
remove enforce eager for CI test
ArthurZucker 5dabda8
remove BNB and LORA
ArthurZucker a1bd892
remove quantized test
ArthurZucker 15327e3
update buildkite to run transformers test
ArthurZucker 3fad390
Update vllm/model_executor/model_loader/utils.py
ArthurZucker 5663a0c
fix pre-commit
ArthurZucker 17c6e02
Merge branch 'fix-history' of github.com:ArthurZucker/vllm into fix-h…
ArthurZucker 5679d4d
update
ArthurZucker 03f1844
Fix failing registry test
hmellor d001748
temp: run transformers tests first
hmellor 4741ab2
Update transformers pin in `requirements-test.txt`
hmellor 9a29e46
update deps
ArthurZucker 073ac5e
Merge branch 'fix-history' of github.com:ArthurZucker/vllm into fix-h…
ArthurZucker 5f6668f
make v1 work
Isotr0py 90be3b9
Merge branch 'main' into fix-history
Isotr0py 95c1916
fix custom model test
Isotr0py 2906626
fix incorrect backend fallback
Isotr0py 8c33bd6
fix oot registration test
Isotr0py ccbff79
add transformers tp test
Isotr0py 3647766
Update vllm/model_executor/model_loader/utils.py
Isotr0py f68af01
clean up
Isotr0py File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
"""Test the functionality of the Transformers backend. | ||
|
||
Run `pytest tests/models/test_transformers.py`. | ||
""" | ||
from contextlib import nullcontext | ||
from typing import Type | ||
|
||
import pytest | ||
|
||
from vllm.model_executor.models import ModelRegistry | ||
|
||
from ..conftest import HfRunner, VllmRunner | ||
from ..utils import multi_gpu_marks | ||
from .utils import check_logprobs_close | ||
|
||
# Delete Llama from registry so we can pretend vLLM doesn't support it | ||
del ModelRegistry.models["LlamaForCausalLM"] | ||
ArthurZucker marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Code used to generate the ilama model: | ||
# from transformers import AutoConfig, AutoModel, LlamaConfig, LlamaModel | ||
# | ||
# class IlamaConfig(LlamaConfig): | ||
# model_type = "iiama" | ||
|
||
# class IlamaModel(LlamaModel): | ||
# config_class = IlamaConfig | ||
|
||
# AutoConfig.register("iiama", IlamaConfig) | ||
# AutoModel.register(IlamaConfig, IlamaModel) | ||
|
||
# base_model = LlamaModel.from_pretrained("meta-llama/Llama-3.2-1B", torch_dtype="auto") | ||
# remote_model = IlamaModel._from_config(base_model.config) | ||
# remote_model.load_state_dict(base_model.state_dict()) | ||
# remote_model.push_to_hub("ArthurZ/Ilama-3.2-1B") | ||
|
||
|
||
def check_implementation( | ||
hf_runner: Type[HfRunner], | ||
vllm_runner: Type[VllmRunner], | ||
example_prompts: list[str], | ||
model: str, | ||
**kwargs, | ||
): | ||
max_tokens = 32 | ||
num_logprobs = 5 | ||
|
||
with vllm_runner(model, **kwargs) as vllm_model: | ||
vllm_outputs = vllm_model.generate_greedy_logprobs( | ||
example_prompts, max_tokens, num_logprobs) | ||
|
||
with hf_runner(model, **kwargs) as hf_model: | ||
hf_outputs = hf_model.generate_greedy_logprobs_limit( | ||
example_prompts, max_tokens, num_logprobs) | ||
|
||
check_logprobs_close( | ||
outputs_0_lst=hf_outputs, | ||
outputs_1_lst=vllm_outputs, | ||
name_0="hf", | ||
name_1="vllm", | ||
) | ||
|
||
|
||
@pytest.mark.parametrize("model,model_impl", | ||
[("openai-community/gpt2", "transformers"), | ||
("meta-llama/Llama-3.2-1B-Instruct", "auto"), | ||
("ArthurZ/Ilama-3.2-1B", "auto", True)]) | ||
def test_models(hf_runner, | ||
vllm_runner, | ||
example_prompts, | ||
model, | ||
model_impl, | ||
trust_remote_code=None) -> None: | ||
|
||
maybe_raises = nullcontext() | ||
if model == "openai-community/gpt2" and model_impl == "transformers": | ||
maybe_raises = pytest.raises( | ||
ValueError, | ||
match="The Transformers implementation.*not compatible with vLLM") | ||
|
||
with maybe_raises: | ||
check_implementation(hf_runner, | ||
vllm_runner, | ||
example_prompts, | ||
model, | ||
model_impl=model_impl, | ||
trust_remote_code=trust_remote_code) | ||
|
||
|
||
@multi_gpu_marks(num_gpus=2) | ||
def test_distributed( | ||
hf_runner, | ||
vllm_runner, | ||
example_prompts, | ||
): | ||
kwargs = {"model_impl": "transformers", "tensor_parallel_size": 2} | ||
check_implementation(hf_runner, vllm_runner, example_prompts, | ||
"meta-llama/Llama-3.2-1B-Instruct", **kwargs) | ||
|
||
|
||
def test_quantized( | ||
hf_runner, | ||
vllm_runner, | ||
example_prompts, | ||
): | ||
kwargs = {"model_impl": "transformers"} | ||
check_implementation(hf_runner, vllm_runner, example_prompts, | ||
"unsloth/Llama-3.2-1B-Instruct-bnb-4bit", **kwargs) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be worth documenting what features users could expect to not be supported, such as quantization