Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLamaSharp.Backend.Cuda12 v0.10.0] Unable to load the model onto multiple GPUs #617

Closed
ChengYen-Tang opened this issue Mar 19, 2024 · 3 comments
Labels
question Further information is requested

Comments

@ChengYen-Tang
Copy link
Contributor

In version cuda12 v0.10.0, the model will only load onto a single GPU, resulting in cudaMalloc failed: out of memory
There is no such issue in cuda12 v0.9.1.
OS: Ubuntu22.04

image
image

@martindevans
Copy link
Member

There was a new "GPU Split Mode" parameter introduced in 0.10.0, see here.

I don't use multiple GPU inference myself, but I would guess you need to set that to something other than "None" when loading the model.

@martindevans martindevans added the question Further information is requested label Mar 19, 2024
@ChengYen-Tang
Copy link
Contributor Author

Thanks, I will try it tomorrow.

@ChengYen-Tang
Copy link
Contributor Author

Thank you, it worked

ChengYen-Tang added a commit to ChengYen-Tang/LLamaSharp that referenced this issue Mar 20, 2024
AsakusaRinne added a commit that referenced this issue Mar 27, 2024
[LLama.KernelMemory] Fixed System.ArgumentException: EmbeddingMode must be true & #617
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants