llama.cpp/ggml-cuda.cu at 172b78210aae0e54d3668c5de14200efab9fac23

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-02 09:12:03 +00:00

Files

agray3 dc020985b8 Avoid unnecessarily disabling CUDA graphs (#7302 )

As discussed in PR #6766, CUDA graphs were being disabled in the presence of long prompts.
This fixes the issue by avoiding the consective update counter from incrementing unnecessarily
for tokens in which cuda graphs are disabled due to batch size > 1.

2024-05-15 15:44:49 +02:00

118 KiB

Raw Blame History

View Raw

118 KiB Raw Blame History

118 KiB

Raw Blame History