Skip to content

Commit

Permalink
Avoid unnecessarily disabling CUDA graphs (ggml-org#7302)
Browse files Browse the repository at this point in the history
As discussed in PR ggml-org#6766, CUDA graphs were being disabled in the presence of long prompts.
This fixes the issue by avoiding the consective update counter from incrementing unnecessarily
for tokens in which cuda graphs are disabled due to batch size > 1.
  • Loading branch information
agray3 authored and teleprint-me committed May 17, 2024
1 parent 6fb91c1 commit dda1347
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion ggml-cuda.cu
Original file line number Diff line number Diff line change
Expand Up @@ -2558,7 +2558,7 @@ GGML_CALL static enum ggml_status ggml_backend_cuda_graph_compute(ggml_backend_t
}

// Disable CUDA graphs (from the next token) if the use-case is demanding too many consecutive graph updates.
if (cuda_graph_update_required) {
if (use_cuda_graph && cuda_graph_update_required) {
cuda_ctx->cuda_graph->number_consecutive_updates++;
} else {
cuda_ctx->cuda_graph->number_consecutive_updates = 0;
Expand Down

0 comments on commit dda1347

Please sign in to comment.