[LLamaSharp.Backend.Cuda12 v0.10.0] Unable to load the model onto multiple GPUs #617

ChengYen-Tang · 2024-03-19T08:38:08Z

In version cuda12 v0.10.0, the model will only load onto a single GPU, resulting in cudaMalloc failed: out of memory
There is no such issue in cuda12 v0.9.1.
OS: Ubuntu22.04

The text was updated successfully, but these errors were encountered:

martindevans · 2024-03-19T14:08:54Z

There was a new "GPU Split Mode" parameter introduced in 0.10.0, see here.

I don't use multiple GPU inference myself, but I would guess you need to set that to something other than "None" when loading the model.

ChengYen-Tang · 2024-03-19T14:19:41Z

Thanks, I will try it tomorrow.

ChengYen-Tang · 2024-03-20T06:37:56Z

Thank you, it worked

[LLama.KernelMemory] Fixed System.ArgumentException: EmbeddingMode must be true & #617

martindevans added the question Further information is requested label Mar 19, 2024

ChengYen-Tang closed this as completed Mar 20, 2024

ChengYen-Tang added a commit to ChengYen-Tang/LLamaSharp that referenced this issue Mar 20, 2024

Unable to load the model onto multiple GPUs (SciSharp#617)

9e4109f

ChengYen-Tang mentioned this issue Mar 20, 2024

[LLama.KernelMemory] Fixed System.ArgumentException: EmbeddingMode must be true & #617 #615

Merged

AsakusaRinne added a commit that referenced this issue Mar 27, 2024

Merge pull request #615 from ChengYen-Tang/master

9ee6ae3

[LLama.KernelMemory] Fixed System.ArgumentException: EmbeddingMode must be true & #617

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLamaSharp.Backend.Cuda12 v0.10.0] Unable to load the model onto multiple GPUs #617

[LLamaSharp.Backend.Cuda12 v0.10.0] Unable to load the model onto multiple GPUs #617

ChengYen-Tang commented Mar 19, 2024

martindevans commented Mar 19, 2024

ChengYen-Tang commented Mar 19, 2024

ChengYen-Tang commented Mar 20, 2024

[LLamaSharp.Backend.Cuda12 v0.10.0] Unable to load the model onto multiple GPUs #617

[LLamaSharp.Backend.Cuda12 v0.10.0] Unable to load the model onto multiple GPUs #617

Comments

ChengYen-Tang commented Mar 19, 2024

martindevans commented Mar 19, 2024

ChengYen-Tang commented Mar 19, 2024

ChengYen-Tang commented Mar 20, 2024