Eric Curtin 
							
						 
					 
					
						
						
							
						
						9f2250ba72 
					 
					
						
						
							
							Add CLI arg to llama-run to adjust the number of threads used ( #12370 )  
						
						... 
						
						
						
						We default to 4, sometimes we want to manually adjust this
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-03-14 16:41:20 +00:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e0dbec0bc6 
					 
					
						
						
							
							llama : refactor llama_context, llama_kv_cache, llm_build_context ( #12181 )  
						
						... 
						
						
						
						* llama : refactor llama_context, llama_kv_cache, llm_build_context
ggml-ci
* graph : don't mutate the KV cache during defrag
ggml-ci
* context : reduce virtuals + remove test function
ggml-ci
* context : move interface implementation to source file + factory
ggml-ci
* graph : move KV cache build functions to llama_context impl
ggml-ci
* graph : remove model reference from build_pooling
ggml-ci
* graph : remove llama_model reference
ggml-ci
* kv_cache : provide rope factors
ggml-ci
* graph : rework inputs to use only unique_ptr, remove attn input abstraction
ggml-ci
* context : remove llama_context_i abstraction
ggml-ci
* context : clean-up
ggml-ci
* graph : clean-up
ggml-ci
* llama : remove redundant keywords (struct, enum)
ggml-ci
* model : adapt gemma3
ggml-ci
* graph : restore same attention ops as on master
ggml-ci
* llama : remove TODO + fix indent
ggml-ci 
						
						
					 
					
						2025-03-13 12:35:44 +02:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						c950a1f692 
					 
					
						
						
							
							Adding UTF-8 support to llama.cpp ( #12111 )  
						
						... 
						
						
						
						For emojis, non-alpha characters, etc.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-03-03 12:44:56 +00:00 
						 
				 
			
				
					
						
							
							
								Florent BENOIT 
							
						 
					 
					
						
						
							
						
						7ad0779f5d 
					 
					
						
						
							
							run: allow to customize prompt by env var LLAMA_PROMPT_PREFIX ( #12041 )  
						
						... 
						
						
						
						Signed-off-by: Florent Benoit <fbenoit@redhat.com > 
						
						
					 
					
						2025-02-23 17:15:51 +00:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						f777a73e18 
					 
					
						
						
							
							Some llama-run cleanups ( #11973 )  
						
						... 
						
						
						
						Use consolidated open function call from File class. Change
read_all to to_string(). Remove exclusive locking, the intent for
that lock is to avoid multiple processes writing to the same file,
it's not an issue for readers, although we may want to consider
adding a shared lock. Remove passing nullptr as reference,
references are never supposed to be null. clang-format the code
for consistent styling.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-02-23 13:14:32 +00:00 
						 
				 
			
				
					
						
							
							
								Michael Engel 
							
						 
					 
					
						
						
							
						
						0d559580a0 
					 
					
						
						
							
							run : add --chat-template-file ( #11961 )  
						
						... 
						
						
						
						Relates to: https://github.com/ggml-org/llama.cpp/issues/11178 
Added --chat-template-file CLI option to llama-run. If specified, the file
will be read and the content passed for overwriting the chat template of
the model to common_chat_templates_from_model.
Signed-off-by: Michael Engel <mengel@redhat.com > 
						
						
					 
					
						2025-02-20 10:35:11 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						63e489c025 
					 
					
						
						
							
							tool-call: refactor common chat / tool-call api (+ tests / fixes) ( #11900 )  
						
						... 
						
						
						
						* tool-call refactoring: moved common_chat_* to chat.h, common_chat_templates_init return a unique_ptr to opaque type
* addressed clang-tidy lints in [test-]chat.*
* rm minja deps from util & common & move it to common/minja/
* add name & tool_call_id to common_chat_msg
* add common_chat_tool
* added json <-> tools, msgs conversions to chat.h
* fix double bos/eos jinja avoidance hack (was preventing inner bos/eos tokens)
* fix deepseek r1 slow test (no longer <think> opening w/ new template)
* allow empty tools w/ auto + grammar
* fix & test server grammar & json_schema params w/ & w/o --jinja 
						
						
					 
					
						2025-02-18 18:03:23 +00:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						19d3c8293b 
					 
					
						
						
							
							There's a better way of clearing lines ( #11756 )  
						
						... 
						
						
						
						Use the ANSI escape code for clearing a line.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-02-09 10:34:49 +00:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						d2fe216fb2 
					 
					
						
						
							
							Make logging more verbose ( #11714 )  
						
						... 
						
						
						
						Debugged an issue with a user who was on a read-only filesystem.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-02-07 14:42:46 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						9f4cc8f8d3 
					 
					
						
						
							
							sync: minja (#11641 )  
						
						... 
						
						
						
						* `sync`: minja
182de30cdahttps://github.com/google/minja/pull/46 
https://github.com/google/minja/pull/45  
						
						
					 
					
						2025-02-05 01:00:12 +00:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						84ec8a58f7 
					 
					
						
						
							
							Name colors ( #11573 )  
						
						... 
						
						
						
						It's more descriptive, use #define's so we can use compile-time
concatenations.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-02-02 15:14:48 +00:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						ecef206ccb 
					 
					
						
						
							
							Implement s3:// protocol ( #11511 )  
						
						... 
						
						
						
						For those that want to pull from s3
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-02-01 10:30:54 +00:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						f0d4b29edf 
					 
					
						
						
							
							Parse  https://ollama.com/library/  syntax ( #11480 )  
						
						... 
						
						
						
						People search for ollama models using the web ui, this change
allows one to copy the url from the browser and for it to be
compatible with llama-run.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-01-29 11:23:10 +00:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						7fee2889e6 
					 
					
						
						
							
							Add github protocol pulling and http:// ( #11465 )  
						
						... 
						
						
						
						As pulling protocols to llama-run
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-01-28 14:45:41 +00:00 
						 
				 
			
				
					
						
							
							
								Michael Engel 
							
						 
					 
					
						
						
							
						
						2b8525d5c8 
					 
					
						
						
							
							Handle missing model in CLI parameters for llama-run ( #11399 )  
						
						... 
						
						
						
						The HTTP client in llama-run only prints an error in case the download of
a resource failed. If the model name in the CLI parameter list is missing,
this causes the application to crash.
In order to prevent this, a check for the required model parameter has been
added and errors for resource downloads get propagated to the caller.
Signed-off-by: Michael Engel <mengel@redhat.com > 
						
						
					 
					
						2025-01-28 08:32:40 +00:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						a4417ddda9 
					 
					
						
						
							
							Add new hf protocol for ollama ( #11449 )  
						
						... 
						
						
						
						https://huggingface.co/docs/hub/en/ollama 
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
					
						2025-01-27 19:36:10 +01:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						01f37edf1a 
					 
					
						
						
							
							Update llama-run README.md ( #11386 )  
						
						... 
						
						
						
						For consistency
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-01-24 09:39:24 +00:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						05f63cc9ee 
					 
					
						
						
							
							Update documentation ( #11373 )  
						
						... 
						
						
						
						To show -n, -ngl, --ngl is acceptable.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-01-23 20:04:31 +00:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						f7fb43cd0b 
					 
					
						
						
							
							Add -ngl ( #11372 )  
						
						... 
						
						
						
						Most other llama.cpp cli tools accept -ngl with a single dash.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-01-23 16:16:18 +00:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						f211d1dc10 
					 
					
						
						
							
							Treat hf.co/ prefix the same as hf:// ( #11350 )  
						
						... 
						
						
						
						ollama uses hf.co/ to specify huggingface prefix, like RamaLama
uses hf://
Treat them similarly.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-01-23 10:38:20 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						6171c9d258 
					 
					
						
						
							
							Add Jinja template support ( #11016 )  
						
						... 
						
						
						
						* Copy minja from 58f0ca6dd7https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-21 13:18:51 +00:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						2e2f8f093c 
					 
					
						
						
							
							linenoise.cpp refactoring ( #11301 )  
						
						... 
						
						
						
						More RAII mainly
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-01-21 09:32:35 +00:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						9f7add1cde 
					 
					
						
						
							
							examples : fix add_special conditions ( #11311 )  
						
						
						
						
					 
					
						2025-01-20 16:36:08 +02:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						a1649cc13f 
					 
					
						
						
							
							Adding linenoise.cpp to llama-run ( #11252 )  
						
						... 
						
						
						
						This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:
https://github.com/ericcurtin/linenoise.cpp 
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-01-18 14:42:31 +00:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						924518e2e5 
					 
					
						
						
							
							Reset color before we exit ( #11205 )  
						
						... 
						
						
						
						We don't want colors to leak post termination of llama-run.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-01-12 18:23:10 +00:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						afa8a9ec9b 
					 
					
						
						
							
							llama : add llama_vocab, functions -> methods, naming ( #11110 )  
						
						... 
						
						
						
						* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2025-01-12 11:32:42 +02:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						1bf839b1e8 
					 
					
						
						
							
							Enhance user input handling for llama-run ( #11138 )  
						
						... 
						
						
						
						The main motivation for this change is it was not handing
ctrl-c/ctrl-d correctly. Modify `read_user_input` to handle EOF,
"/bye" command, and empty input cases. Introduce `get_user_input`
function to manage user input loop and handle different return
cases.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-01-08 18:47:05 +00:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						dc7cef9f37 
					 
					
						
						
							
							llama-run : fix context size ( #11094 )  
						
						... 
						
						
						
						Set `n_ctx` equal to `n_batch` in `Opt` class. Now context size is
a more reasonable 2048.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-01-06 23:45:28 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						47182dd03f 
					 
					
						
						
							
							llama : update llama_model API names ( #11063 )  
						
						... 
						
						
						
						* llama : deprecate llama_free_model, add llama_model_free
ggml-ci
* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`
ggml-ci 
						
						
					 
					
						2025-01-06 10:55:18 +02:00 
						 
				 
			
				
					
						
							
							
								Peter 
							
						 
					 
					
						
						
							
						
						6e1531aca5 
					 
					
						
						
							
							common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON ( #11013 )  
						
						... 
						
						
						
						In common/common.cpp:
* Convert usage of stat() function call to check if file exists to standard library function std::filesystem::exists (error unable to match to correct function signature)
* Additional conditions to check if PATH_MAX is already defined in WIN32 environment (warning it is already defined in MSYS2)
In examples/run/run.cpp:
* Add io.h header inclusion (error cannot find function _get_osfhandle)
* Change initialisers for OVERLAPPED to empty struct (warning about uninitialised members)
* Add initialiser for hFile (warning it may be uninitialised)
* Add cast for curl_off_t percentage value to long int in generate_progress_prefix function (warning that curl_off_t is long long int)
In ggml/src/ggml-opencl/ggml-opencl.cpp:
* Initialise certain declared cl_mem variables to nullptr for greater safety (warning about B_d variable possibly used unassigned) 
						
						
					 
					
						2024-12-31 01:46:06 +01:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						dab76c92cc 
					 
					
						
						
							
							llama-run : include temperature option ( #10899 )  
						
						... 
						
						
						
						This commit updates the `examples/run/README.md` file to include a new
option for setting the temperature and updates the `run.cpp` file to
parse this option.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2024-12-23 01:21:40 +01:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						7909e8588d 
					 
					
						
						
							
							llama-run : improve progress bar ( #10821 )  
						
						... 
						
						
						
						Set default width to whatever the terminal is. Also fixed a small bug around
default n_gpu_layers value.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2024-12-19 03:58:00 +01:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						c27ac678dd 
					 
					
						
						
							
							Opt class for positional argument handling ( #10508 )  
						
						... 
						
						
						
						Added support for positional arguments `model` and `prompt`. Added
functionality to download via strings like:
  llama-run llama3
  llama-run ollama://granite-code
  llama-run ollama://granite-code:8b
  llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
  llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
  llama-run https://example.com/some-file1.gguf 
  llama-run some-file2.gguf
  llama-run file://some-file3.gguf
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2024-12-13 19:34:25 +01:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						7cc2d2c889 
					 
					
						
						
							
							ggml : move AMX to the CPU backend ( #10570 )  
						
						... 
						
						
						
						* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2024-11-29 21:54:58 +01:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						0cc63754b8 
					 
					
						
						
							
							Introduce llama-run ( #10291 )  
						
						... 
						
						
						
						It's like simple-chat but it uses smart pointers to avoid manual
memory cleanups. Less memory leaks in the code now. Avoid printing
multiple dots. Split code into smaller functions. Uses no exception
handling.
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2024-11-25 22:56:24 +01:00