llama_decode lock #595

martindevans · 2024-03-12T21:02:48Z

Added a lock object into SafeLlamaModelHandle which all calls to llama_decode (in the SafeLLamaContextHandle) lock first.

This prevents two contexts from running inference on the same model at the same time, which seems to be unsafe in llama.cpp. We may need an even wider lock (preventing inference on any two models simultaneously). Testing required.

…lama_decode` (in the `SafeLLamaContextHandle`) lock first. This prevents two contexts from running inference on the same model at the same time, which seems to be unsafe in llama.cpp.

…e necessary (at least with the CUDA backend).

martindevans added 2 commits March 12, 2024 21:01

Added a lock object into SafeLlamaModelHandle which all calls to `l…

011d019

…lama_decode` (in the `SafeLLamaContextHandle`) lock first. This prevents two contexts from running inference on the same model at the same time, which seems to be unsafe in llama.cpp.

Modified the lock to be global over _all_ inferences. This seems to b…

7e1f472

…e necessary (at least with the CUDA backend).

martindevans merged commit ce4de7d into SciSharp:master Mar 13, 2024
3 checks passed

martindevans deleted the llama_decode_lock branch March 13, 2024 00:33

AsakusaRinne mentioned this pull request Mar 28, 2024

Cannot add a user message after another user message (Parameter message #585

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama_decode lock #595

llama_decode lock #595

martindevans commented Mar 12, 2024

llama_decode lock #595

llama_decode lock #595

Conversation

martindevans commented Mar 12, 2024