Bug: llama-cli generates incoherent output with full gpu offload #9535
Labels
bug-unconfirmed
high severity
Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)
What happened?
Offloading 31 layers out of the 33 with an 8b model produces correct results, with 32 layers, the response is incoherent.
33 or more offloaded layers cause the instruction to be ignored, with
seed 1
, with any other seed, no response is printed.This affects conversational and normal modes as well.
llama-server
functions without problem.Name and Version
version: 3782 (8a30835)
built with clang version 20.0.0git (https://github.com/ROCm/llvm-project.git 487d0fd20dcbb6fbf926333d7b0b355788efb009) for x86_64-unknown-linux-gnu
What operating system are you seeing the problem on?
No response
Relevant log output
The text was updated successfully, but these errors were encountered: