Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error messages spilled from persistent deployment for every request #349

Closed
weiqisun opened this issue Dec 5, 2023 · 6 comments · Fixed by #350
Closed

Error messages spilled from persistent deployment for every request #349

weiqisun opened this issue Dec 5, 2023 · 6 comments · Fixed by #350

Comments

@weiqisun
Copy link
Contributor

weiqisun commented Dec 5, 2023

Hi, in the latest release with version 0.1.2, the persistent server returns responses with more details, including prompt_length, generated_length, and finished_reason. This is awesome and super useful. Thanks for the updates! However, now the persistent server throws error messages on every request:

Traceback (most recent call last):
  File "/home/dyheal1/mambaforge/envs/test/lib/python3.10/site-packages/grpc/_server.py", line 552, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/home/dyheal1/mambaforge/envs/test/lib/python3.10/site-packages/mii/grpc_related/modelresponse_server.py", line 96, in GeneratorReply
    return task_methods.pack_response_to_proto(responses)
  File "/home/dyheal1/mambaforge/envs/test/lib/python3.10/site-packages/mii/grpc_related/task_methods.py", line 85, in pack_response_to_proto
    finish_reason=str(r.finish_reason.value),
AttributeError: 'NoneType' object has no attribute 'value'

I can still get valid responses back, with correct generated_text, finish_reason etc. But it will be great to eliminate this error message on the server side. For now as a hacky workaround, I modified line 85 in grpc_related/task_methods.py to finish_reason=str(r.finish_reason.value) if r.finish_reason is not None else "none".

Here is my setup. I started the persistent server by:

mii.serve(
    "meta-llama/Llama-2-70b-hf",
    deployment_name="mii-endpoint",
    max_length=4096,
    tensor_parallel=2,
)

And I call the server using:

client = mii.client("mii-endpoint")
response = client.generate("DeepSpeed is", max_new_tokens=50)

Then I will get the error message on the server side. I'm running this on 2 A100-80GB GPUs and the related module versions are:

mii: 0.1.2
deepspeed: 0.12.4
torch: 2.0.1
CUDA: 11.8

Any help will be appreciated and thanks again for the awesome tool you have built!

@thelongestusernameofall
Copy link

遇到了同样的问题。

@mrwyattii
Copy link
Contributor

@weiqisun I have a fix in #350. If you would like to try that branch before we merge: pip install git+https://github.com/Microsoft/DeepSpeed-MII@mrwyattii/fix-return-error

@weiqisun
Copy link
Contributor Author

weiqisun commented Dec 7, 2023

Thanks @mrwyattii! However, I'm still seeing this error message. I confirm I installed the module from your branch since I have the updated _invoke_async function now in the installed lib file.

@mrwyattii
Copy link
Contributor

mrwyattii commented Dec 7, 2023

Hmm, I'm not able to reproduce this with the fix I have in #350. Could you try adding a print statement that shows the contents of response? Please add

print(f"RANK {self.inference_pipeline.local_rank} RESPONSE:", [r.to_msg_dict() for r in responses])

just before the return statement here:
https://github.com/microsoft/DeepSpeed-MII/blob/5e12a38520b4cd9eb63a2790c3611c837f723885/mii/grpc_related/modelresponse_server.py#L96

You will want to modify this file on your local system: /home/dyheal1/mambaforge/envs/test/lib/python3.10/site-packages/mii/grpc_related/modelresponse_server.py

Share the output of that print statement. Thanks!

@weiqisun
Copy link
Contributor Author

weiqisun commented Dec 7, 2023

Actually, nvm. With a clean setup from scratch, now the error message is gone! I'm not sure if it was due to a not-properly terminated process. I previously had a server running overnight. I stopped the server this morning before updating the mii module and I still observed the same error message after the update. But then I realized there were two leftover mii processes after I terminated the server by client.terminate_server():

dyheal1   291962  0.9  0.0 41352944 474320 pts/8 Sl   11:26   0:10 /home/dyheal1/mambaforge/envs/test/bin/python -m mii.launch.multi_gpu_server --deployment-name mii-endpoint --load-balancer-port 50050 --restful-gateway-port 51080 --restful-gateway-procs 32 --load-balancer --model-config eyJtb2RlbF9uYW1lX29yX3BhdGgiOiAibWV0YS1sbGFtYS9MbGFtYS0yLTdiLWhmIiwgInRva2VuaXplciI6ICJtZXRhLWxsYW1hL0xsYW1hLTItN2ItaGYiLCAidGFzayI6ICJ0ZXh0LWdlbmVyYXRpb24iLCAidGVuc29yX3BhcmFsbGVsIjogMiwgImluZmVyZW5jZV9lbmdpbmVfY29uZmlnIjogeyJ0ZW5zb3JfcGFyYWxsZWwiOiB7InRwX3NpemUiOiAyfSwgInN0YXRlX21hbmFnZXIiOiB7Im1heF90cmFja2VkX3NlcXVlbmNlcyI6IDIwNDgsICJtYXhfcmFnZ2VkX2JhdGNoX3NpemUiOiA3NjgsICJtYXhfcmFnZ2VkX3NlcXVlbmNlX2NvdW50IjogNTEyLCAibWF4X2NvbnRleHQiOiA4MTkyLCAibWVtb3J5X2NvbmZpZyI6IHsibW9kZSI6ICJyZXNlcnZlIiwgInNpemUiOiAxMDAwMDAwMDAwfSwgIm9mZmxvYWQiOiBmYWxzZX19LCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJ6bXFfcG9ydF9udW1iZXIiOiAyNTU1NSwgInJlcGxpY2FfbnVtIjogMSwgInJlcGxpY2FfY29uZmlncyI6IFt7Imhvc3RuYW1lIjogImxvY2FsaG9zdCIsICJ0ZW5zb3JfcGFyYWxsZWxfcG9ydHMiOiBbNTAwNTEsIDUwMDUyXSwgInRvcmNoX2Rpc3RfcG9ydCI6IDI5NTAwLCAiZ3B1X2luZGljZXMiOiBbMCwgMV0sICJ6bXFfcG9ydCI6IDI1NTU1fV0sICJtYXhfbGVuZ3RoIjogNDA5NiwgImFsbF9yYW5rX291dHB1dCI6IGZhbHNlLCAic3luY19kZWJ1ZyI6IGZhbHNlLCAicHJvZmlsZV9tb2RlbF90aW1lIjogZmFsc2V9
dyheal1   292647  1.1  0.0 41353024 474248 pts/8 Sl   11:30   0:10 /home/dyheal1/mambaforge/envs/test/bin/python -m mii.launch.multi_gpu_server --deployment-name mii-endpoint --load-balancer-port 50050 --restful-gateway-port 51080 --restful-gateway-procs 32 --load-balancer --model-config eyJtb2RlbF9uYW1lX29yX3BhdGgiOiAibWV0YS1sbGFtYS9MbGFtYS0yLTdiLWhmIiwgInRva2VuaXplciI6ICJtZXRhLWxsYW1hL0xsYW1hLTItN2ItaGYiLCAidGFzayI6ICJ0ZXh0LWdlbmVyYXRpb24iLCAidGVuc29yX3BhcmFsbGVsIjogMiwgImluZmVyZW5jZV9lbmdpbmVfY29uZmlnIjogeyJ0ZW5zb3JfcGFyYWxsZWwiOiB7InRwX3NpemUiOiAyfSwgInN0YXRlX21hbmFnZXIiOiB7Im1heF90cmFja2VkX3NlcXVlbmNlcyI6IDIwNDgsICJtYXhfcmFnZ2VkX2JhdGNoX3NpemUiOiA3NjgsICJtYXhfcmFnZ2VkX3NlcXVlbmNlX2NvdW50IjogNTEyLCAibWF4X2NvbnRleHQiOiA4MTkyLCAibWVtb3J5X2NvbmZpZyI6IHsibW9kZSI6ICJyZXNlcnZlIiwgInNpemUiOiAxMDAwMDAwMDAwfSwgIm9mZmxvYWQiOiBmYWxzZX19LCAidG9yY2hfZGlzdF9wb3J0IjogMjk1MDAsICJ6bXFfcG9ydF9udW1iZXIiOiAyNTU1NSwgInJlcGxpY2FfbnVtIjogMSwgInJlcGxpY2FfY29uZmlncyI6IFt7Imhvc3RuYW1lIjogImxvY2FsaG9zdCIsICJ0ZW5zb3JfcGFyYWxsZWxfcG9ydHMiOiBbNTAwNTEsIDUwMDUyXSwgInRvcmNoX2Rpc3RfcG9ydCI6IDI5NTAwLCAiZ3B1X2luZGljZXMiOiBbMCwgMV0sICJ6bXFfcG9ydCI6IDI1NTU1fV0sICJtYXhfbGVuZ3RoIjogNDA5NiwgImFsbF9yYW5rX291dHB1dCI6IGZhbHNlLCAic3luY19kZWJ1ZyI6IGZhbHNlLCAicHJvZmlsZV9tb2RlbF90aW1lIjogZmFsc2V9

After manually killing these two processes, I started the server again and the error message is gone. Thanks for the fix!

@mrwyattii
Copy link
Contributor

Great to hear. Closing the issue, but please reopen (or create a new issue) if you see this behavior return. I will get this merged into the main branch and it will be part of the next MII release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants