Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminate_server only release memory on one gpu when using tensor_parallel #265

Closed
baojunliu opened this issue Nov 7, 2023 · 2 comments
Closed
Assignees

Comments

@baojunliu
Copy link

baojunliu commented Nov 7, 2023

I am trying to use two GPUs with tensor_parallel=2. It seems it only releases memory on one gpu. There is some process still running. The client.terminate_server doesn't seem to kill all processes. I can kill the process manually, but how can I do it properly in the python code?

import mii

client = mii.serve("mistralai/Mistral-7B-v0.1",
                   deployment_name="ray_scorer_deployment",
                   tensor_parallel=2)
response = client.generate("Deepspeed is", max_new_tokens=128)
client.terminate_server()

print(response.response)
@mrwyattii
Copy link
Contributor

This was a bug that has been fixed in #262. Please update to the latest main (we will also do a patch release with this and other bug fixes later this week).

@mrwyattii mrwyattii self-assigned this Nov 7, 2023
@mrwyattii
Copy link
Contributor

Closing, this was resolved in the latest release (v0.1.1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants