llama.cpp/ggml/include/ggml-rpc.h at 0d5a470223fc90b6b6807921d68011ff06ae7f9e

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-31 08:51:55 +00:00

Files

Radoslav Gerganov 553a5c3a9f rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (#12943 )

RPC_CMD_SET_TENSOR always returns an empty response and we send this 4
times per token. We can improve TG speed if we don't wait for this empty
response.

The performance impact of this change depends on the network latency.

2025-04-25 10:08:08 +03:00

1.0 KiB

Raw Blame History

View Raw

1.0 KiB Raw Blame History

1.0 KiB

Raw Blame History