-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New microsoft/bloom-deepspeed-inference-fp16 weights not working with DeepSpeed MII #50
Comments
Seems like there is a check in place which is not letting the new weights work with MII |
Any updates on this? |
Also the same thing happens with the bigscience/bloom-350m for some reason. I just ran the example in the README and I got the |
https://github.com/huggingface/transformers-bloom-inference/blob/abe365066fec6e03ce0ea2cc8136f2da1254e2ea/bloom-inference-server/ds_inference/grpc_server.py#L33 I know this is not the most elegant method to do this :( |
Thanks for the response @mayank31398 ! |
@mrwyattii I believe your commit yesterday has fixed this? |
This is the error I got today while trying int8 inference with bloom. |
Hi @TahaBinhuraib I think MII doesn't support int8 models. https://github.com/huggingface/transformers-bloom-inference/tree/main/bloom-inference-server |
The fp16 Bloom weights are now supported. Int8 models are also supported, but currently the DeepSpeed sharded int8 weights for the Bloom model will throw an error. I'm working on a fix for this and automatic loading of the sharded weights (so you don't have to manually download the weights and define the checkpoint file list). Those changes will come in #69 and likely another PR. |
Thanks @mrwyattii |
Thanks @mrwyattii can't wait! |
@mayank31398 @TahaBinhuraib I finally found the time to fix #69 so that it works with int8. You no longer need to download the sharded checkpoint files separately and MII will handle this for you (but it will take a while as the checkpoints are quite large). I just confirmed that it's working on my side, but if you have the opportunity to test it out, please do. The script I used: import mii
mii_configs = {
"dtype": "int8",
"tensor_parallel": 4,
"port_number": 50950,
}
name = "microsoft/bloom-deepspeed-inference-int8"
mii.deploy(task='text-generation',
model=name,
deployment_name="bloom_deployment",
model_path="/data/bloom-ckpts",
mii_config=mii_configs) You will probably want to change the |
New
microsoft/bloom-deepspeed-inference-fp16
andmicrosoft/bloom-deepspeed-inference-int8
weights not working with DeepSpeed MII@jeffra @RezaYazdaniAminabadi
The list of models doesn't contain the new weights.
The text was updated successfully, but these errors were encountered: