-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't pickle <class 'botocore.client.S3'> When Streaming Into More Than 1 GPU #25
Comments
The runai-model-streamer allows you to load a model across multiple GPUs from safetensors files (either S3 or file stystem), as demonstrated in the integration of the streamer into vLLM starting from version 0.6.6. In other words, there is no limitation for using vLLM with the runai streamer as the optional loader. The streaming tool by itself provides an iterator over CPU Pytorch tensors, which are stored in a safetensors file, in a concurrent and asynchronous manner. |
Here is what what I encountered when trying to load into 2 GPUs on my EC2 through Vllm
|
Opened a new issue on Vllm as well: |
@huaxuan250 We work on better integration and cloning only relevant files. |
The benchmark was done only on 1 GPU. https://www.run.ai/blog/run-ai-model-streamer-performance-benchmarks
I am wondering if this streaming tool can support any number of safe tensors to any number of GPUs?
The text was updated successfully, but these errors were encountered: