New microsoft/bloom-deepspeed-inference-fp16 weights not working with DeepSpeed MII #50

mayank31398 · 2022-08-29T00:46:09Z

New microsoft/bloom-deepspeed-inference-fp16 and microsoft/bloom-deepspeed-inference-int8 weights not working with DeepSpeed MII

@jeffra @RezaYazdaniAminabadi

Traceback (most recent call last):
  File "scripts/bloom-inference-server/server.py", line 83, in <module>
    model = DSInferenceGRPCServer(args)
  File "/net/llm-shared-nfs/nfs/mayank/BigScience-Megatron-DeepSpeed/scripts/bloom-inference-server/ds_inference/grpc_server.py", line 36, in __init__
    mii.deploy(
  File "/net/llm-shared-nfs/nfs/yelkurdi/conda/miniconda3/envs/llmpt/lib/python3.8/site-packages/mii/deployment.py", line 70, in deploy
    mii.utils.check_if_task_and_model_is_valid(task, model)
  File "/net/llm-shared-nfs/nfs/yelkurdi/conda/miniconda3/envs/llmpt/lib/python3.8/site-packages/mii/utils.py", line 108, in check_if_task_and_model_is_valid
    assert (
AssertionError: text-generation only supports [.....]

The list of models doesn't contain the new weights.

The text was updated successfully, but these errors were encountered:

mayank31398 · 2022-08-29T00:46:42Z

Seems like there is a check in place which is not letting the new weights work with MII

mayank31398 · 2022-09-01T04:24:25Z

Any updates on this?
@jeffra @RezaYazdaniAminabadi

cderinbogaz · 2022-09-14T14:41:54Z

Also the same thing happens with the bigscience/bloom-350m for some reason.

I just ran the example in the README and I got the
AssertionError: text-generation only supports [.....]
error

mayank31398 · 2022-09-14T16:11:54Z

https://github.com/huggingface/transformers-bloom-inference/blob/abe365066fec6e03ce0ea2cc8136f2da1254e2ea/bloom-inference-server/ds_inference/grpc_server.py#L33
@cderinbogaz I hacked my way around it for now
I pass the downloaded model path and checkpoint dict for the model I need to use and the model="bigscience/bloom"

I know this is not the most elegant method to do this :(

cderinbogaz · 2022-09-14T16:27:36Z

Thanks for the response @mayank31398 !
I think its a neat solution :)

mayank31398 · 2022-10-08T03:11:18Z

@mrwyattii I believe your commit yesterday has fixed this?
Let me know.
I am closely watching this repo :)

TahaBinhuraib · 2022-10-16T14:06:20Z

weight_quantizer.quantize(transpose(sd[0][prefix + 'self_attention.query_key_value.' + 'weight']))) File "/opt/conda/lib/python3.7/site-packages/deepspeed/module_inject/replace_module.py", line 100, in copy dim=self.in_dim)[self.gpu_index].to(

This is the error I got today while trying int8 inference with bloom.

mayank31398 · 2022-10-16T14:51:19Z

Hi @TahaBinhuraib I think MII doesn't support int8 models.
Can you try vanilla DS-inference?

https://github.com/huggingface/transformers-bloom-inference/tree/main/bloom-inference-server
you can try running via a CLI/ deploy a generation server as given in the instructions ^^.

mrwyattii · 2022-10-18T20:31:20Z

The fp16 Bloom weights are now supported. Int8 models are also supported, but currently the DeepSpeed sharded int8 weights for the Bloom model will throw an error. I'm working on a fix for this and automatic loading of the sharded weights (so you don't have to manually download the weights and define the checkpoint file list). Those changes will come in #69 and likely another PR.

mayank31398 · 2022-10-19T06:06:38Z

Thanks @mrwyattii

TahaBinhuraib · 2022-10-19T08:23:40Z

Thanks @mrwyattii can't wait!

mrwyattii · 2022-10-25T23:03:12Z

@mayank31398 @TahaBinhuraib I finally found the time to fix #69 so that it works with int8. You no longer need to download the sharded checkpoint files separately and MII will handle this for you (but it will take a while as the checkpoints are quite large). I just confirmed that it's working on my side, but if you have the opportunity to test it out, please do. The script I used:

import mii

mii_configs = {
    "dtype": "int8",
    "tensor_parallel": 4,
    "port_number": 50950,
}
name = "microsoft/bloom-deepspeed-inference-int8"

mii.deploy(task='text-generation',
           model=name,
           deployment_name="bloom_deployment",
           model_path="/data/bloom-ckpts",
           mii_config=mii_configs)

You will probably want to change the model_path parameter if you run this on your local machine.

mayank31398 mentioned this issue Aug 29, 2022

Add generation server scripts using HF accelerate and DS-inference bigscience-workshop/Megatron-DeepSpeed#328

Merged

mayank31398 closed this as completed Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New microsoft/bloom-deepspeed-inference-fp16 weights not working with DeepSpeed MII #50

New microsoft/bloom-deepspeed-inference-fp16 weights not working with DeepSpeed MII #50

mayank31398 commented Aug 29, 2022

mayank31398 commented Aug 29, 2022

mayank31398 commented Sep 1, 2022

cderinbogaz commented Sep 14, 2022

mayank31398 commented Sep 14, 2022

cderinbogaz commented Sep 14, 2022

mayank31398 commented Oct 8, 2022

TahaBinhuraib commented Oct 16, 2022

mayank31398 commented Oct 16, 2022

mrwyattii commented Oct 18, 2022

mayank31398 commented Oct 19, 2022

TahaBinhuraib commented Oct 19, 2022

mrwyattii commented Oct 25, 2022

New microsoft/bloom-deepspeed-inference-fp16 weights not working with DeepSpeed MII #50

New microsoft/bloom-deepspeed-inference-fp16 weights not working with DeepSpeed MII #50

Comments

mayank31398 commented Aug 29, 2022

mayank31398 commented Aug 29, 2022

mayank31398 commented Sep 1, 2022

cderinbogaz commented Sep 14, 2022

mayank31398 commented Sep 14, 2022

cderinbogaz commented Sep 14, 2022

mayank31398 commented Oct 8, 2022

TahaBinhuraib commented Oct 16, 2022

mayank31398 commented Oct 16, 2022

mrwyattii commented Oct 18, 2022

mayank31398 commented Oct 19, 2022

TahaBinhuraib commented Oct 19, 2022

mrwyattii commented Oct 25, 2022