Đinh Trọng Huy 
							
						 
					 
					
						
						
							
						
						291f2b6913 
					 
					
						
						
							
							llama : add support for DistilBert ( #13907 )  
						
						 
						
						... 
						
						
						
						* add distilbert
* small fixes
* add note for LLM_ARCH_DISTIL_BERT
* Use MODEL_ARCH.BERT for DistilBert
---------
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp > 
						
						
							
  b5540
 
						
					 
					
						2025-05-30 11:56:02 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								zhangkaihuo 
							
						 
					 
					
						
						
							
						
						2c90da4c7e 
					 
					
						
						
							
							llama : use llm_build_granite for minicpm ( #13911 )  
						
						 
						
						
						
						
							
  b5539
 
						
					 
					
						2025-05-30 10:31:48 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Christian Kastner 
							
						 
					 
					
						
						
							
						
						ec9e0301fe 
					 
					
						
						
							
							cmake: Guard GGML_CPU_ALL_VARIANTS by architecture ( #13890 )  
						
						 
						
						
						
						
							
  b5538
 
						
					 
					
						2025-05-30 01:28:54 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						e83ba3e460 
					 
					
						
						
							
							llama : add support for jina-reranker-v2 ( #13900 )  
						
						 
						
						
						
						
							
  b5537
 
						
					 
					
						2025-05-29 21:42:31 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						2b131621e6 
					 
					
						
						
							
							gguf-py : add support for sub_type (in arrays) in GGUFWriter add_key_value method ( #13561 )  
						
						 
						
						
						
						
							
  gguf-v0.17.0
 
						
					 
					
						2025-05-29 15:36:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Yibo Cai 
							
						 
					 
					
						
						
							
						
						54a2c7a8cd 
					 
					
						
						
							
							arm64: optimize q4_k_q8_k kernel with i8mm ( #13886 )  
						
						 
						
						... 
						
						
						
						This PR improves q4_k_q8_k gemm kernel with arm64 i8mm instruction.
Tested on neoverse-n2 with llama3 8b q4_k_m quantization model.
- 34% ~ 50% S_PP uplift for all batch sizes
- 12% ~ 37% S_TG uplift for batch size 4 and above
Perplexity doesn't change with this PR.
```
// tested on neoverse-n2
$ llama-batched-bench \
      -m Meta-Llama-3-8B-Instruct-Q4_K_M.gguf \
      --no-mmap -fa \
      -c 8192 -b 4096 -ub 512 -npp 128 -ntg 128 \
      -npl 1,2,4,8,16,32 \
      -t 64
---------------------------------------------------------------------
|    PP |     TG |    B |       S_PP t/s      |       S_TG t/s      |
|       |        |      | original |  this pr | original |  this pr |
|-------|--------|------|----------|----------|----------|----------|
|   128 |    128 |    1 |   110.12 |   147.83 |    24.36 |    24.28 |
|   128 |    128 |    2 |   121.16 |   172.42 |    46.36 |    47.93 |
|   128 |    128 |    4 |   120.15 |   169.75 |    74.68 |    84.00 |
|   128 |    128 |    8 |   130.97 |   196.81 |    91.04 |   114.74 |
|   128 |    128 |   16 |   131.01 |   196.88 |   101.43 |   135.79 |
|   128 |    128 |   32 |   130.85 |   196.51 |   106.97 |   147.29 |
---------------------------------------------------------------------
``` 
						
						
							
  b5535
 
						
					 
					
						2025-05-29 14:39:20 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Christian Kastner 
							
						 
					 
					
						
						
							
						
						21fcc21ad5 
					 
					
						
						
							
							cmake: Factor out CPU architecture detection ( #13883 )  
						
						 
						
						... 
						
						
						
						* cmake: Define function for querying architecture
The tests and results match exactly those of ggml/src/CMakeLists.txt
* Switch arch detection over to new function 
						
						
							
  b5534
 
						
					 
					
						2025-05-29 12:50:25 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Vineel Abhinav 
							
						 
					 
					
						
						
							
						
						dd8ba93416 
					 
					
						
						
							
							ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm ( #13882 )  
						
						 
						
						... 
						
						
						
						* F32-Mamba-Seq_Scan-SVE
* Fix formatting
* ggml : missing space
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
  b5533
 
						
					 
					
						2025-05-29 12:18:43 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						66c92061f5 
					 
					
						
						
							
							tests : remove json.hpp from a test ( #13880 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
							
  b5532
 
						
					 
					
						2025-05-29 12:17:16 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						5ca82fc1d7 
					 
					
						
						
							
							convert : workaround for AutoConfig dummy labels ( #13881 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-05-29 10:00:57 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						6385b843a8 
					 
					
						
						
							
							llama : add RobertaForSequenceClassification reranker support ( #13875 )  
						
						 
						
						
						
						
							
  b5530
 
						
					 
					
						2025-05-29 08:15:01 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Vineel Abhinav 
							
						 
					 
					
						
						
							
						
						1b8fb8152d 
					 
					
						
						
							
							ggml: aarch64: Implement SVE F32 kernels for vector functions ( #13843 )  
						
						 
						
						... 
						
						
						
						* F32-Mamba-SVE
* F32-Mamba-SVE
* Resolve test errors-1
* Resolve test errors-2
* F32-vec-SVE
* F32-vec-SVE
* F32-vec-SVE 
						
						
							
  b5529
 
						
					 
					
						2025-05-29 09:01:33 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Beinsezii 
							
						 
					 
					
						
						
							
						
						53ae30640e 
					 
					
						
						
							
							gguf-py : fix SafetensorRemote return on undefined size (< 0) ( #13841 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-05-28 23:50:20 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						763d06edb7 
					 
					
						
						
							
							llama : fix KV shift for qwen2vl ( #13870 )  
						
						 
						
						... 
						
						
						
						* llama : fix KV shift for qwen2vl
* add ref to the PR 
						
						
							
  b5527
 
						
					 
					
						2025-05-28 22:35:31 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						10961339b2 
					 
					
						
						
							
							mtmd : move helpers to dedicated library ( ⚠️  breaking change) ( #13866 )  
						
						 
						
						... 
						
						
						
						* mtmd : move helpers to dedicated library
* fix server build
* rm leftover cmakelist code 
						
						
							
  b5526
 
						
					 
					
						2025-05-28 22:35:22 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								bandoti 
							
						 
					 
					
						
						
							
						
						d98f2a35fc 
					 
					
						
						
							
							ci: disable LLAMA_CURL for Linux cross-builds ( #13871 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-05-28 15:46:47 -03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Đinh Trọng Huy 
							
						 
					 
					
						
						
							
						
						e0e3aa231d 
					 
					
						
						
							
							llama : add support for BertForSequenceClassification reranker ( #13858 )  
						
						 
						
						... 
						
						
						
						* convert: add support for BertForSequenceClassification
* add support for reranking using BertForSequenceClassification
* merge checks of eos and sep
* fix lint
---------
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp > 
						
						
							
  b5524
 
						
					 
					
						2025-05-28 19:01:58 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Đinh Trọng Huy 
							
						 
					 
					
						
						
							
						
						aa6dff05be 
					 
					
						
						
							
							convert: small addition to support LlamaModel ( #13838 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp > 
						
						
							
						
					 
					
						2025-05-28 16:34:18 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sky 
							
						 
					 
					
						
						
							
						
						c962ae3382 
					 
					
						
						
							
							server: fix remove 'image_url'/'input_audio' json-object effectlly for 'llama_params' in multimodal-model-mode ( #13853 )  
						
						 
						
						... 
						
						
						
						[fix]: remove 'image_url'/'input_audio' effectlly for 'llama_params' in multimodal-model-mode 
						
						
							
  b5522
 
						
					 
					
						2025-05-28 16:33:54 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						a3938fb53d 
					 
					
						
						
							
							convert : fix qwen omni conversion ( #13859 )  
						
						 
						
						... 
						
						
						
						* convert : fix qwen omni conversion
* fix typo 
						
						
							
						
					 
					
						2025-05-28 16:12:35 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Alex Fanthome 
							
						 
					 
					
						
						
							
						
						f7873fc698 
					 
					
						
						
							
							tests : change umlaut test ( #11600 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-05-28 15:49:28 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						a68247439b 
					 
					
						
						
							
							CUDA: fix FA tg at long context for CC >= 8.9 ( #13852 )  
						
						 
						
						
						
						
							
  b5519
 
						
					 
					
						2025-05-28 13:33:37 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						26b79b6cb3 
					 
					
						
						
							
							convert : fix tensor naming conflict for llama 4 vision ( #13836 )  
						
						 
						
						... 
						
						
						
						* convert : fix tensor naming conflict for llama 4 vision
* add comment 
						
						
							
						
					 
					
						2025-05-28 10:05:54 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								leo-pony 
							
						 
					 
					
						
						
							
						
						1e8659e65a 
					 
					
						
						
							
							CANN: Add SOC TYPE printing in cmake configuration ( #13837 )  
						
						 
						
						
						
						
							
  b5517
 
						
					 
					
						2025-05-28 11:54:20 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								lhez 
							
						 
					 
					
						
						
							
						
						a3c30846e4 
					 
					
						
						
							
							opencl: add new ops - argsort, div, sub, addrows, sigmoid, group_norm ( #13787 )  
						
						 
						
						... 
						
						
						
						* opencl: add `argsort`
* opencl: add `div`
* opencl: add `add_rows`
* opencl: add `sub`
* opencl: add `sigmoid`, both `f16` and `f32`
* opencl: add `group_norm` 
						
						
							
  b5516
 
						
					 
					
						2025-05-27 12:56:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								lhez 
							
						 
					 
					
						
						
							
						
						1701d4c54f 
					 
					
						
						
							
							opencl: mark mul_mat f32f32 as supporting non-contiguous tensors ( #13790 )  
						
						 
						
						
						
						
							
  b5515
 
						
					 
					
						2025-05-27 12:53:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						bef8176387 
					 
					
						
						
							
							vulkan: use timestamp queries for GGML_VULKAN_PERF ( #13817 )  
						
						 
						
						... 
						
						
						
						Also change it to be controlled by an env var rather than cmake flag 
						
						
							
  b5514
 
						
					 
					
						2025-05-27 18:39:07 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						34b7c0439e 
					 
					
						
						
							
							cmake : add llama-cparams.cpp to build ( #13832 )  
						
						 
						
						
						
						
							
  b5513
 
						
					 
					
						2025-05-27 19:08:44 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Akarshan Biswas 
							
						 
					 
					
						
						
							
						
						f3101a8cc6 
					 
					
						
						
							
							SYCL: add gelu_erf kernel ( #13749 )  
						
						 
						
						... 
						
						
						
						* SYCL: add gelu_erf kernel
* refactor code
Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com >
* Use scope_op_debug_print
---------
Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com > 
						
						
							
  b5512
 
						
					 
					
						2025-05-27 20:52:59 +05:30  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						1c49c70d07 
					 
					
						
						
							
							sync : ggml  
						
						 
						
						
						
						
							
						
					 
					
						2025-05-27 18:05:33 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						a8ea03d8ad 
					 
					
						
						
							
							ggml : add ggml_repeat_4d ( #13824 )  
						
						 
						
						
						
						
							
  b5510
 
						
					 
					
						2025-05-27 15:53:55 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								xctan 
							
						 
					 
					
						
						
							
						
						05f6ac6283 
					 
					
						
						
							
							ggml : riscv: add xtheadvector support ( #13720 )  
						
						 
						
						... 
						
						
						
						* ggml : riscv: add xtheadvector support
* ggml : clean up some macro usage 
						
						
							
  b5509
 
						
					 
					
						2025-05-27 16:21:36 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						bc583e3c63 
					 
					
						
						
							
							mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) ( #13784 )  
						
						 
						
						... 
						
						
						
						* mtmd : allow multiple modalities at the same time
* refactor mtmd tokenizer
* fix compile
* ok, missing SinusoidsPositionEmbedding
* first working version
* fix style
* more strict validate of n_embd
* refactor if..else to switch
* fix regression
* add test for 3B
* update docs
* fix tokenizing with add_special
* add more tests
* fix test case "huge"
* rm redundant code
* set_position_mrope_1d rm n_tokens 
						
						
							
  b5508
 
						
					 
					
						2025-05-27 14:06:10 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								bandoti 
							
						 
					 
					
						
						
							
						
						72b090da2c 
					 
					
						
						
							
							docs: remove link for llama-cli function calling ( #13810 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-05-27 08:52:40 -03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Christian Kastner 
							
						 
					 
					
						
						
							
						
						7fe03e7446 
					 
					
						
						
							
							ggml-cpu: x86 feature detection is specific to x86 ( #13811 )  
						
						 
						
						
						
						
							
  b5506
 
						
					 
					
						2025-05-27 13:18:39 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						952f3953c1 
					 
					
						
						
							
							ggml : allow CUDA graphs when using pipeline parallelism ( #13814 )  
						
						 
						
						
						
						
							
  b5505
 
						
					 
					
						2025-05-27 13:05:18 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						81713121ee 
					 
					
						
						
							
							kv-cells : track min/max used cells and per-sequence positions ( #13808 )  
						
						 
						
						... 
						
						
						
						* kv-cells : track min/max used cells and per-sequence positions
ggml-ci
* kv-cells : fix pos-modification updates for seq_pos
ggml-ci
* kv-cells : add comments
ggml-ci 
						
						
							
  b5504
 
						
					 
					
						2025-05-27 13:49:41 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f9cd68398b 
					 
					
						
						
							
							sampling : make sure samplers return at least 1 token ( #13822 )  
						
						 
						
						... 
						
						
						
						* sampling : min-p should always return at least one token
ggml-ci
* sampling : same for typical sampling
* tests : sampling tests use min_keep == 0
ggml-ci 
						
						
							
  b5503
 
						
					 
					
						2025-05-27 12:07:52 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						4f81b33e32 
					 
					
						
						
							
							llama : validate seq id batch input ( #13809 )  
						
						 
						
						... 
						
						
						
						* llama : validate seq id batch input
ggml-ci
* cont : fix the fix
ggml-ci 
						
						
							
  b5502
 
						
					 
					
						2025-05-27 09:40:59 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						cdf94a1802 
					 
					
						
						
							
							server: --offline mode ( #13804 )  
						
						 
						
						... 
						
						
						
						* server: --offline mode (env: LLAMA_OFFLINE)
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
							
  b5501
 
						
					 
					
						2025-05-26 22:34:27 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						a26c4cc11e 
					 
					
						
						
							
							scripts : add option to compare commits in Debug ( #13806 )  
						
						 
						
						... 
						
						
						
						* scripts : add option to compare commits in Debug
* cont : reuse existing CMAKE_OPTS 
						
						
							
						
					 
					
						2025-05-26 22:24:01 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						4265a87b59 
					 
					
						
						
							
							cuda : avoid cuGetErrorString ( #13791 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
							
  b5499
 
						
					 
					
						2025-05-26 22:14:52 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Akarshan Biswas 
							
						 
					 
					
						
						
							
						
						6f180b915c 
					 
					
						
						
							
							SYCL: Add non contiguous support in RMS_NORM and NORM kernels ( #13611 )  
						
						 
						
						... 
						
						
						
						* SYCL: Add non contiguous input support to norm kernel
* refactor and add RMS_NORM non contiguous input support
ggml-ci
* restore subgroup reduction for multi-subgroup thread blocks in norm kernels
* Swap grid dims of nsamples and nrows
ggml-ci
* Revert "Swap grid dims of nsamples and nrows"
This reverts commit 43be2d657fec7f7fba54e2cd154106bc0fc45adf.
* restore not required changes
ggml-ci
* address review comments: change it to more like SYCL
* Use a common function to calculate offset
* remove wrap around logic for handling broadcasts
* remove static from calculate_offset fn and use ceil_div 
						
						
							
  b5498
 
						
					 
					
						2025-05-26 21:10:36 +05:30  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						03f582ae8f 
					 
					
						
						
							
							server: fix streaming crashes ( #13786 )  
						
						 
						
						... 
						
						
						
						* add preludes to content on partial regex match
* allow all parsers to parse non-tool-call content.
* tweak order of <|python_tag|> vs <function= parsing for functionary v3.1 format. still not ideal but hopefully less prone to crash 
						
						
							
  b5497
 
						
					 
					
						2025-05-26 16:03:57 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								standby24x7 
							
						 
					 
					
						
						
							
						
						88c125f2ac 
					 
					
						
						
							
							examples/training: Fix file name in README ( #13803 )  
						
						 
						
						... 
						
						
						
						This patch fixes binary file names in README.md.
Signed-off-by: Masanari Iida <standby24x7@gmail.com > 
						
						
							
						
					 
					
						2025-05-26 16:55:24 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						d74e94c1b3 
					 
					
						
						
							
							server: fix format of streamed tool call deltas (diff name, fix id location) (#13800 )  
						
						 
						
						... 
						
						
						
						* fix deltas of tool_call.function.name
* fix tool_call.id (was in tool_call.function.id!) + add function type
* add tool_call.type
* populate empty tool_call.function.arguments on first delta 
						
						
							
  b5495
 
						
					 
					
						2025-05-26 14:56:49 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						f13847cfb5 
					 
					
						
						
							
							server: fix regression on streamed non-chat completion w/ stops ( #13785 )  
						
						 
						
						... 
						
						
						
						* more forgiving message diffs: partial stop words aren't erased, full stops are
* Add (slow) server test for completion + stream + stop 
						
						
							
  b5494
 
						
					 
					
						2025-05-26 14:16:37 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						79c137f776 
					 
					
						
						
							
							examples : allow extracting embeddings from decoder contexts ( #13797 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
							
  b5493
 
						
					 
					
						2025-05-26 14:03:54 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						22229314fc 
					 
					
						
						
							
							llama : clarify deprecation message ( #13794 )  
						
						 
						
						
						
						
							
  b5492
 
						
					 
					
						2025-05-26 12:57:50 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Romain Biessy 
							
						 
					 
					
						
						
							
						
						9012eb9b45 
					 
					
						
						
							
							sycl: Add more debug prints ( #13640 )  
						
						 
						
						
						
						
							
						
					 
					
						2025-05-26 10:28:53 +02:00