llama.cpp/tools at 81086cd6a3ca1252f0dc0f938171648399179c53 - llama.cpp - Gitea - Peisong Xiao

CS348Project/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-27 08:21:30 +00:00

Files

History

Radoslav Gerganov 68ee98ae18 server : return HTTP 400 if prompt exceeds context length (#16486 )

In streaming mode when prompt exceeds context length, the server returns
HTTP 200 status code with a JSON error in the body.  This is very
confusing and inconsistent with all other inference engines which return
HTTP 4xx error in this case.

This patch fixes this problem and makes the server return HTTP 400 in
such cases.

2025-10-10 16:11:07 +02:00

..

cmake : Do not install tools on iOS targets (#15903 )

2025-09-16 09:54:44 +07:00

cvector-generator

cmake : Do not install tools on iOS targets (#15903 )

2025-09-16 09:54:44 +07:00

cmake : Do not install tools on iOS targets (#15903 )

2025-09-16 09:54:44 +07:00

ci : use smaller model (#16168 )

2025-09-22 09:11:39 +03:00

cmake : Do not install tools on iOS targets (#15903 )

2025-09-16 09:54:44 +07:00

llama : add --no-host to disable host buffers (#16310 )

2025-10-06 19:55:53 +02:00

llama-cli: prevent spurious assistant token (#16202 )

2025-09-29 10:03:12 +03:00

chat : Granite Docling stopping (#16438 )

2025-10-06 18:59:40 +02:00

perplexity : show more kl-divergence data (#16321 )

2025-09-29 09:30:45 +03:00

ci : use smaller model (#16168 )

2025-09-22 09:11:39 +03:00

rpc : update documentation (#16441 )

2025-10-07 06:59:13 +00:00

common: introduce http.h for httplib-based client (#16373 )

2025-10-01 20:22:18 +03:00

server : return HTTP 400 if prompt exceeds context length (#16486 )

2025-10-10 16:11:07 +02:00

cmake : Do not install tools on iOS targets (#15903 )

2025-09-16 09:54:44 +07:00

model : Apertus model implementation (#15852 )

2025-10-02 20:43:22 +03:00

CMakeLists.txt

mtmd : rename llava directory to mtmd (#13311 )

2025-05-05 16:02:55 +02:00