llama.cpp/common at 11f0af5504252e453d57406a935480c909e3ff37 - llama.cpp - Gitea - Peisong Xiao

CS348Project/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-27 08:21:30 +00:00

Files

History

Georgi Gerganov d00cbea63c server : host-memory prompt caching (#16391 )

* minor : code style

* server : fix prompt similarity calculation

* server : initial host-memory prompt caching

* cont

* server : refactor

* cont

* cont : make the server task of the slot const

* cont : minor [no ci]

* server : cache prompts and checkpoints only for completion tasks

* server : improve prompt caching logic

* cont : fix check for number of cached prompts [no ci]

* server : improve caching logic, add -cram CLI arg

* server : print prompt mismatch info

* cont : better naming [no ci]

* server : improve prompt cache loading logic

* server : add option to debug the slot contents (#16482)

* server : add option to debug the slot contents

* Update tools/server/server.cpp

---------

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>

* server : add option to disable prompt cache

---------

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>

2025-10-09 18:54:51 +03:00

..

arg.cpp

server : host-memory prompt caching (#16391 )

2025-10-09 18:54:51 +03:00

arg.h

common : remove common_has_curl() (#16351 )

2025-09-30 17:39:44 +03:00

base64.hpp

llava : expose as a shared library for downstream projects (#3613 )

2023-11-07 00:36:23 +03:00

build-info.cpp.in

cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167 )

2025-06-13 10:38:52 +02:00

chat-parser.cpp

refactor: centralize CoT parsing in backend for streaming mode (#16394 )

2025-10-08 23:18:41 +03:00

chat-parser.h

model : Apertus model implementation (#15852 )

2025-10-02 20:43:22 +03:00

chat.cpp

refactor: centralize CoT parsing in backend for streaming mode (#16394 )

2025-10-08 23:18:41 +03:00

chat.h

server : host-memory prompt caching (#16391 )

2025-10-09 18:54:51 +03:00

CMakeLists.txt

common: introduce http.h for httplib-based client (#16373 )

2025-10-01 20:22:18 +03:00

common.cpp

llama : add --no-host to disable host buffers (#16310 )

2025-10-06 19:55:53 +02:00

common.h

server : host-memory prompt caching (#16391 )

2025-10-09 18:54:51 +03:00

console.cpp

console : utf-8 fix for windows stdin (#9690 )

2024-09-30 11:23:42 +03:00

console.h

gguf : new file format with flexible meta data (beta) (#2398 )

2023-08-21 23:07:43 +03:00

http.h

common: introduce http.h for httplib-based client (#16373 )

2025-10-01 20:22:18 +03:00

json-partial.cpp

sync : vendor (#13901 )

2025-05-30 16:25:45 +03:00

json-partial.h

sync : vendor (#13901 )

2025-05-30 16:25:45 +03:00

json-schema-to-grammar.cpp

common : Fix corrupted memory error on json grammar initialization (#16038 )

2025-09-17 11:08:02 +03:00

json-schema-to-grammar.h

sync : vendor (#13901 )

2025-05-30 16:25:45 +03:00

llguidance.cpp

llguidance : set tokenizer slices to default (#13424 )

2025-05-10 17:19:52 +02:00

log.cpp

Implement --log-colors with always/never/auto (#15792 )

2025-09-05 19:43:59 +01:00

log.h

Implement --log-colors with always/never/auto (#15792 )

2025-09-05 19:43:59 +01:00

ngram-cache.cpp

ggml : portability fixes for VS 2017 (#12150 )

2025-03-04 18:53:26 +02:00

ngram-cache.h

llama : use LLAMA_TOKEN_NULL (#11062 )

2025-01-06 10:52:15 +02:00

regex-partial.cpp

common: add partial regex support (#12808 )

2025-05-14 19:50:57 +01:00

regex-partial.h

common: add partial regex support (#12808 )

2025-05-14 19:50:57 +01:00

sampling.cpp

llama: print memory breakdown on exit (#15860 )

2025-09-24 16:53:48 +02:00

sampling.h

sampling : optimize samplers by reusing bucket sort (#15665 )

2025-08-31 20:41:02 +03:00

speculative.cpp

sampling : optimize samplers by reusing bucket sort (#15665 )

2025-08-31 20:41:02 +03:00

speculative.h

server : implement universal assisted decoding (#12635 )

2025-07-31 14:25:23 +02:00