Aman Gupta 
							
						 
					 
					
						
						
							
						
						2e42be42bd 
					 
					
						
						
							
							compare-llama-bench: add option to plot ( #14169 )  
						
						... 
						
						
						
						* compare llama-bench: add option to plot
* Address review comments: convert case + add type hints
* Add matplotlib to requirements
* fix tests
* Improve comment and fix assert condition for test
* Add back default test_name, add --plot_log_scale
* use log_scale regardless of x_values 
						
						
					 
					
						2025-06-14 10:34:20 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						ae92c1855b 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-06-10 18:39:33 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						b8e2194efc 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-06-10 09:21:56 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f3a4b1659c 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-06-01 13:43:57 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						53f925074d 
					 
					
						
						
							
							sync : vendor ( #13901 )  
						
						... 
						
						
						
						* sync : vendor
ggml-ci
* cont : fix httplib version
ggml-ci
* cont : fix lint
* cont : fix lint
* vendor : move to common folder /vendor
ggml-ci
* cont : fix lint
* cont : move httplib to /vendor + use json_fwd.hpp
ggml-ci
* cont : fix server build
ggml-ci
* cont : add missing headers
ggml-ci
* cont : header clean-up
ggml-ci 
						
						
					 
					
						2025-05-30 16:25:45 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						1c49c70d07 
					 
					
						
						
							
							sync : ggml  
						
						
						
						
					 
					
						2025-05-27 18:05:33 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						a26c4cc11e 
					 
					
						
						
							
							scripts : add option to compare commits in Debug ( #13806 )  
						
						... 
						
						
						
						* scripts : add option to compare commits in Debug
* cont : reuse existing CMAKE_OPTS 
						
						
					 
					
						2025-05-26 22:24:01 +03:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						f5cd27b71d 
					 
					
						
						
							
							server: streaming of tool calls and thoughts when --jinja is on (#12379 )  
						
						... 
						
						
						
						* add common_json w/ support for truncated json healing
* add common_chat_msg_diff
* partial common_chat_parse
* refactor parser w/ optionals
* server: wire chat diffs in stream mode
* fix trigger of thinking models (must happen after thoughts are closed)
* fix functionary v3.2 raw python!
* rename: common_chat_syntax (now contains format)
* rm common_regex.at_start
* don't return empty <think></think>
* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)
* fix QwQ 32B tool call parsing after thoughts (hermes2)
* better logs for grammar triggers
* consume spaces after parse_json_tool_calls
* fix required tool calls w/ thinking models that have pre-opened thinking tags
* fix thinking model's initial trigger + test qwq's template
* run most test_tool_call tests in stream + non-stream modes
* make functionary v3.2 parsing more strict (differentiate first match from others)
* send final diff from server, to close off raw python arguments
* support partial content streaming in Generic mode
* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)
* Update function-calling.md
* Update tool_bench.py
* chat-parser: remove input from exception (llm output may contain PII)
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com > 
						
						
					 
					
						2025-05-25 01:48:08 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d30cb5a7fa 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-05-19 13:29:56 +03:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						be1d4a13db 
					 
					
						
						
							
							scripts : fix compare-llama-bench.py show parameter ( #13514 )  
						
						
						
						
					 
					
						2025-05-14 08:41:01 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						bf79371120 
					 
					
						
						
							
							scripts : support arbitrary input file formats in compare-llama-bench.py ( #13455 )  
						
						
						
						
					 
					
						2025-05-13 15:31:12 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						1e2809bc4b 
					 
					
						
						
							
							sync : ggml  
						
						
						
						
					 
					
						2025-05-13 14:02:28 +03:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						09232370fc 
					 
					
						
						
							
							scripts : exit compare-llama-bench.py gracefully when there's nothing to compare ( #13451 )  
						
						
						
						
					 
					
						2025-05-11 16:20:39 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d879433824 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-05-07 17:28:36 +03:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						1d36b3670b 
					 
					
						
						
							
							llama : move end-user examples to tools directory ( #13249 )  
						
						... 
						
						
						
						* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-05-02 20:27:13 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						b34443923c 
					 
					
						
						
							
							sync : ggml ( #13268 )  
						
						... 
						
						
						
						* vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204)
* vulkan : add kernels for depthwise 2d convolution (OP_CONV_2D_DW)
* review: remove src_x/y < 0 checks; add performance tests
* sync : ggml
ggml-ci
* vulkan : fix lint (#0 )
---------
Co-authored-by: Acly <aclysia@gmail.com > 
						
						
					 
					
						2025-05-02 20:54:30 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						b1dd4d08e8 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-05-01 20:15:34 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8d33d740c3 
					 
					
						
						
							
							sync : ggml  
						
						
						
						
					 
					
						2025-05-01 10:00:39 +03:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						19e899ce21 
					 
					
						
						
							
							scripts: n_depth for compare-llama-bench [no ci] ( #13201 )  
						
						
						
						
					 
					
						2025-04-29 23:32:04 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						63b4911494 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-04-24 17:32:47 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						526739b879 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-04-14 09:26:15 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						47ba87d0a4 
					 
					
						
						
							
							sync : ggml  
						
						
						
						
					 
					
						2025-04-11 00:17:47 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						eb420e1148 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-04-11 00:17:47 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e4bf72d631 
					 
					
						
						
							
							scripts : fix sync-ggml-am.sh  
						
						
						
						
					 
					
						2025-04-11 00:17:47 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						a4e46e28f9 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-04-07 18:44:17 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						0114a32da0 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-03-31 15:07:32 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d3f1f0acfb 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-03-30 08:33:31 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						029c693fdc 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-03-27 10:09:29 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						771d84371c 
					 
					
						
						
							
							scripts : update sync + fix cmake merge  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-03-27 10:09:29 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						df0665a483 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-03-27 09:04:38 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						102ac1891d 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-03-07 14:49:44 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						669912d9a5 
					 
					
						
						
							
							tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )  
						
						... 
						
						
						
						* sampler: turn lazy grammar trigger words to regexes
* add scripts/tool_bench.sh & .py
* constrain llama json output regardless of function name if matches at beginning
* update relaxed newline space rule in grammar tests
* support add_generation_prompt query parameter (useful for /apply_template)
* Update src/llama-grammar.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-03-05 13:05:13 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						a057897ad4 
					 
					
						
						
							
							llama : add xcframework build script ( #11996 )  
						
						... 
						
						
						
						* llama : add xcframework build script
This commit adds a script to build an XCFramework for Apple
ios, macos, visionos, and tvos platforms.
The generated XCFramework can then be added to a project and used in
the same way as a regular framework. The llama.swiftui example project
has been updated to use the XCFramework and can be started using the
following command:
```console
$ open examples/llama.swiftui/llama.swiftui.xcodeproj/
```
Refs: https://github.com/ggml-org/llama.cpp/issues/10747 
* examples : remove llama.cpp (source dir ref) from project.pbxproj
This commit removes the reference to llama.cpp from the project.pbxproj
file since Package.swift has been removed.
* ci : updated build.yml to use build-xcframework.sh
* ci : add xcframework build to github releases
This commit adds the ability to create a GitHub release with the
xcframework build artifact.
* scripts : add apple app validation scripts
This commit adds scripts that can validate the iOS, macOS, tvOS, and
VisionOS applications. The scripts create a simple test app project,
copy the llama.xcframework to the test project, build and archive the
app, create an IPA from the archive, and validate the IPA using altool.
The motivation for this is to provide some basic validation and
hopefully avoid having to manually validate apps in Xcode.
* llama : remove Package.swift
This commit removes the Package.swift file, as we are now building an
XCFramework for the project.
* llama : remove Sources and spm-headers directories
* llama : use TargetConditionals.h for visionOS/tvOS 
						
						
					 
					
						2025-03-05 06:30:31 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						dfd6b2c0be 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-03-03 18:18:11 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						3d1cf3cf33 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-03-03 18:18:11 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8371d44595 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-03-03 18:18:11 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						aede2074f6 
					 
					
						
						
							
							scripts : sync-ggml-am.sh fix  
						
						
						
						
					 
					
						2025-03-03 18:18:11 +02:00 
						 
				 
			
				
					
						
							
							
								MoonRide303 
							
						 
					 
					
						
						
							
						
						5137da7b8c 
					 
					
						
						
							
							scripts: corrected encoding when getting chat template ( #11866 ) ( #11907 )  
						
						... 
						
						
						
						Signed-off-by: MoonRide303 <moonride303@gmail.com > 
						
						
					 
					
						2025-02-18 10:30:16 +01:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						6dde178248 
					 
					
						
						
							
							scripts: fix compare-llama-bench commit hash logic ( #11891 )  
						
						
						
						
					 
					
						2025-02-15 20:23:22 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						68ff663a04 
					 
					
						
						
							
							repo : update links to new url ( #11886 )  
						
						... 
						
						
						
						* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci 
						
						
					 
					
						2025-02-15 16:40:57 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						c7f460ab88 
					 
					
						
						
							
							server: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none (#11607 )  
						
						... 
						
						
						
						* extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B
* tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template
* tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out
* server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability
* tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-02-13 10:05:16 +00:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						0fb77f821f 
					 
					
						
						
							
							sync : ggml  
						
						
						
						
					 
					
						2025-02-12 21:46:02 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8a59053f63 
					 
					
						
						
							
							sync : ggml  
						
						
						
						
					 
					
						2025-02-06 21:23:03 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						7c9e0ca520 
					 
					
						
						
							
							sync : ggml  
						
						
						
						
					 
					
						2025-02-04 12:59:21 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8ec05832fa 
					 
					
						
						
							
							sync : ggml  
						
						
						
						
					 
					
						2025-02-03 14:57:08 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						8b576b6c55 
					 
					
						
						
							
							Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars ( #9639 )  
						
						... 
						
						
						
						---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-01-30 19:13:58 +00:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						815857791d 
					 
					
						
						
							
							sync : ggml  
						
						
						
						
					 
					
						2025-01-29 11:25:29 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						6171c9d258 
					 
					
						
						
							
							Add Jinja template support ( #11016 )  
						
						... 
						
						
						
						* Copy minja from 58f0ca6dd7https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-21 13:18:51 +00:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f26c874179 
					 
					
						
						
							
							scripts : restore hf.sh ( #11288 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-18 13:18:32 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f11cfdfd7f 
					 
					
						
						
							
							ci : use -no-cnv in gguf-split tests ( #11254 )  
						
						... 
						
						
						
						* ci : use -no-cnv in gguf-split tests
ggml-ci
* ci : use -no-cnv in requantize tests
ggml-ci
* scripts : fix [no ci] 
						
						
					 
					
						2025-01-15 18:28:35 +02:00