Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						9b3d833189 
					 
					
						
						
							
							cuda : fix compile warning ( #7454 )  
						
						
						
						
							
 
						
					 
					
						2024-05-22 12:36:37 +03:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						95fb0aefab 
					 
					
						
						
							
							CUDA: remove incorrect precision check ( #7454 )  
						
						
						
						
							
 
						
					 
					
						2024-05-22 10:24:29 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						3e5faa8503 
					 
					
						
						
							
							cuda : fix rope + add tests ( #7452 )  
						
						... 
						
						
						
						* cuda : fix rope pos data
ggml-ci
* ggml : drop mode & 1 == 1 support for ggml_rope
ggml-ci
* ggml : support freq_factors for f16 rope (CPU)
ggml-ci
* tests : add rope tests using frequency factors
ggml-ci 
						
						
							
 
						
					 
					
						2024-05-22 11:01:35 +03:00 
						 
				 
			
				
					
						
							
							
								liuwei-git 
							
						 
					 
					
						
						
							
						
						201cc11afa 
					 
					
						
						
							
							llama : add phi3 128K model support ( #7225 )  
						
						... 
						
						
						
						* add phi3 128k support in convert-hf-to-gguf
* add phi3 128k support in cuda
* address build warnings on llama.cpp
* adjust index value in cuda long rope freq factors
* add long rope support in ggml cpu backend
* make freq factors only depend on ctx size
* remove unused rope scaling type 'su' frin gguf converter
* fix flint warnings on convert-hf-to-gguf.py
* set to the short freq factor when context size is small than trained context size
* add one line of comments
* metal : support rope freq_factors
* ggml : update ggml_rope_ext API to support freq. factors
* backends : add dev messages to support rope freq. factors
* minor : style
* tests : update to use new rope API
* backends : fix pragma semicolons
* minor : cleanup
* llama : move rope factors from KV header to tensors
* llama : remove tmp assert
* cuda : fix compile warning
* convert : read/write n_head_kv
* llama : fix uninitialized tensors
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
 
						
					 
					
						2024-05-21 23:28:32 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						6369bf0433 
					 
					
						
						
							
							metal : handle F16 inf values, fix FA partial offload ( #7434 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
						
					 
					
						2024-05-21 23:03:42 +03:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						e402de364b 
					 
					
						
						
							
							grammars: fix resampling logic regression (#7424 )  
						
						
						
						
							
						
					 
					
						2024-05-21 20:40:00 +01:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						fcf6538ba6 
					 
					
						
						
							
							CUDA: fix unused warning in mmq.cu ( #7442 )  
						
						
						
						
							
 
						
					 
					
						2024-05-21 20:27:12 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						c3f8d58356 
					 
					
						
						
							
							tests : test-tokenizer-0.sh print more info ( #7402 )  
						
						
						
						
							
						
					 
					
						2024-05-21 19:53:48 +03:00 
						 
				 
			
				
					
						
							
							
								Amir 
							
						 
					 
					
						
						
							
						
						11474e756d 
					 
					
						
						
							
							examples: cache hf model when --model not provided ( #7353 )  
						
						... 
						
						
						
						* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided 
						
						
							
 
						
					 
					
						2024-05-21 17:13:12 +03:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						d8ee902227 
					 
					
						
						
							
							CUDA: deduplicate mmq code ( #7397 )  
						
						
						
						
							
 
						
					 
					
						2024-05-21 16:02:12 +02:00 
						 
				 
			
				
					
						
							
							
								jaime-m-p 
							
						 
					 
					
						
						
							
						
						d7e852c1bc 
					 
					
						
						
							
							Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) ( #7425 )  
						
						... 
						
						
						
						* Update brute force test: add_special
* Update brute force test: default values for add_bos_token and add_eos_token
* Enable rtrim when pre-inserting BOS
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Revert "server : fix test regexes" 
						
						
							
						
					 
					
						2024-05-21 14:39:48 +02:00 
						 
				 
			
				
					
						
							
							
								jaime-m-p 
							
						 
					 
					
						
						
							
						
						917dc8cfa6 
					 
					
						
						
							
							Tokenizer SPM fixes for phi-3 and llama-spm ( #7375 )  
						
						... 
						
						
						
						* Update brute force test: special tokens
* Fix added tokens
  - Try to read 'added_tokens.json'.
  - Try to read 'tokenizer_config.json'.
  - Try to read 'tokenizer.json'.
* Fix special tokens rtrim
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* server : fix test regexes 
						
						
							
 
						
					 
					
						2024-05-20 20:15:57 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						fabf30b4c4 
					 
					
						
						
							
							llama : remove Persimmon ( #7408 )  
						
						... 
						
						
						
						* llama : remove Persimmon
* requirements : remove 
						
						
							
 
						
					 
					
						2024-05-21 02:35:28 +10:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						20385cebcc 
					 
					
						
						
							
							perplexity: update README FP16 results [no ci] ( #7413 )  
						
						
						
						
							
						
					 
					
						2024-05-20 18:15:38 +02:00 
						 
				 
			
				
					
						
							
							
								Radoslav Gerganov 
							
						 
					 
					
						
						
							
						
						db10f01310 
					 
					
						
						
							
							rpc : track allocated buffers ( #7411 )  
						
						... 
						
						
						
						* rpc : track allocated buffers
ref: #7407 
* rpc : pack rpc_tensor tightly 
						
						
							
 
						
					 
					
						2024-05-20 16:36:55 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						3bc10cb485 
					 
					
						
						
							
							server : fix temperature + disable some tests ( #7409 )  
						
						... 
						
						
						
						* server : fix temperature
* server : disable tests relying on parallel determinism
* ci : change server Debug -> RelWithDebInfo 
						
						
							
 
						
					 
					
						2024-05-20 22:10:03 +10:00 
						 
				 
			
				
					
						
							
							
								AidanBeltonS 
							
						 
					 
					
						
						
							
						
						6bf9b66fa3 
					 
					
						
						
							
							[SYCL] Update SYCL upscale operation ( #7321 )  
						
						... 
						
						
						
						* Update SYCL upscale operation
* Formatting
* Remove messages 
						
						
							
 
						
					 
					
						2024-05-20 16:38:23 +05:30 
						 
				 
			
				
					
						
							
							
								Bingan 
							
						 
					 
					
						
						
							
						
						26cd4237bc 
					 
					
						
						
							
							Update README.md ( #7410 )  
						
						
						
						
							
						
					 
					
						2024-05-20 11:55:34 +02:00 
						 
				 
			
				
					
						
							
							
								Herman Semenov 
							
						 
					 
					
						
						
							
						
						213e90ed73 
					 
					
						
						
							
							ggml-opencl, llama: using reserve() if count already known ( #7272 )  
						
						
						
						
							
 
						
					 
					
						2024-05-20 10:33:21 +03:00 
						 
				 
			
				
					
						
							
							
								junchao-loongson 
							
						 
					 
					
						
						
							
						
						65c58207ec 
					 
					
						
						
							
							ggml : add loongarch lsx and lasx support ( #6454 )  
						
						... 
						
						
						
						* add loongarch lsx and lasx optimize code
* Add loongarch compilation support to makefile
* revert stb_image.h
* opt bytes_from_nibbles_32 and sum_i16_pairs_float
* fix undeclared
* format code
* update
* update 2
---------
Co-authored-by: Jinyang He <hejinyang@loongson.cn > 
						
						
							
 
						
					 
					
						2024-05-20 10:19:21 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						1cc0155d04 
					 
					
						
						
							
							server : tuning tests ( #7388 )  
						
						... 
						
						
						
						* server : don't pass temperature as string
* server : increase timeout
* tests : fix the fix 0.8f -> 0.8
ggml-ci
* tests : set explicit temperature 
						
						
							
						
					 
					
						2024-05-20 10:16:41 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e932094d58 
					 
					
						
						
							
							server : return error on too large embedding input ( #7389 )  
						
						
						
						
							
 
						
					 
					
						2024-05-20 08:56:05 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						2789baf480 
					 
					
						
						
							
							tests : fix --keep_split -> --keep-split ( #7374 )  
						
						
						
						
							
						
					 
					
						2024-05-20 08:55:09 +03:00 
						 
				 
			
				
					
						
							
							
								Srihari-mcw 
							
						 
					 
					
						
						
							
						
						33c8d50acc 
					 
					
						
						
							
							Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 ( #7258 )  
						
						
						
						
							
 
						
					 
					
						2024-05-20 12:18:39 +10:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						d359f30921 
					 
					
						
						
							
							llama : remove MPI backend ( #7395 )  
						
						
						
						
							
 
						
					 
					
						2024-05-20 01:17:03 +02:00 
						 
				 
			
				
					
						
							
							
								Fred Douglas 
							
						 
					 
					
						
						
							
						
						1ea2a0036e 
					 
					
						
						
							
							quantize : fix --keep-split check ( #7374 )  
						
						
						
						
							
 
						
					 
					
						2024-05-19 19:37:04 +03:00 
						 
				 
			
				
					
						
							
							
								0cc4m 
							
						 
					 
					
						
						
							
						
						f030ec1f7a 
					 
					
						
						
							
							Vulkan Embedding Fix ( #7360 )  
						
						... 
						
						
						
						* Fix empty Vulkan host buffers
Add fp32 fp16 matmul shader
Fix matmul shader alignment
* Remove deprecated tensor->backend uses
* Fix Vulkan validation errors on embedding models with no offloaded layers
* Fix Vulkan llava segfault when not offloading layers 
						
						
							
 
						
					 
					
						2024-05-19 17:19:53 +02:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						e4e6f67be6 
					 
					
						
						
							
							ggml : fix another case of quants nans ( #7387 )  
						
						
						
						
							
 
						
					 
					
						2024-05-19 17:08:46 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						5ca49cbecd 
					 
					
						
						
							
							ggml: implement quantized KV cache for FA ( #7372 )  
						
						
						
						
							
 
						
					 
					
						2024-05-19 16:46:13 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						1b01f06db0 
					 
					
						
						
							
							server: add test for token probs ( #7347 )  
						
						
						
						
							
						
					 
					
						2024-05-19 16:26:02 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						41858392e1 
					 
					
						
						
							
							server: fix seed being reported back ( #7382 )  
						
						
						
						
							
 
						
					 
					
						2024-05-19 17:06:33 +03:00 
						 
				 
			
				
					
						
							
							
								Anas Ahouzi 
							
						 
					 
					
						
						
							
						
						6aade19ee7 
					 
					
						
						
							
							Add StableLM2 pre-tokenizer ( #7349 )  
						
						... 
						
						
						
						* Add StableLM pre-tokenizer
* Fix space
* Fix trailing whitespace 
						
						
							
 
						
					 
					
						2024-05-19 22:46:46 +10:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						ab33f7a338 
					 
					
						
						
							
							cuda : clear error after buffer allocation failure ( #7376 )  
						
						
						
						
							
 
						
					 
					
						2024-05-19 14:19:37 +02:00 
						 
				 
			
				
					
						
							
							
								Brian 
							
						 
					 
					
						
						
							
						
						e23b974f4c 
					 
					
						
						
							
							labeler.yml: Use settings from ggerganov/llama.cpp [no ci] ( #7363 )  
						
						... 
						
						
						
						https://github.com/actions/labeler#using-configuration-path-input-together-with-the-actionscheckout-action 
Recommends the use of checkout action to use the correct repo context
when applying settings for PR labels
e.g.
    steps:
    - uses: actions/checkout@v4 # Uploads repository content to the runner
      with:
        repository: "owner/repositoryName" # The one of the available inputs, visit https://github.com/actions/checkout#readme  to find more
    - uses: actions/labeler@v5
      with:
        configuration-path: 'path/to/the/uploaded/configuration/file' 
					
						2024-05-19 20:51:03 +10:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						854d365aba 
					 
					
						
						
							
							cmake : update android comments ( #7341 )  
						
						
						
						
							
 
						
					 
					
						2024-05-19 11:01:01 +03:00 
						 
				 
			
				
					
						
							
							
								fraxy-v 
							
						 
					 
					
						
						
							
						
						f5bf761747 
					 
					
						
						
							
							Capture CUDA logging output ( #7298 )  
						
						... 
						
						
						
						* logging: output capture in cuda module
* fix compile error
* fix: vsnprintf terminates with 0, string use not correct
* post review
* Update llama.cpp
Co-authored-by: slaren <slarengh@gmail.com >
* Update llama.cpp
Co-authored-by: slaren <slarengh@gmail.com >
---------
Co-authored-by: slaren <slarengh@gmail.com > 
						
						
							
 
						
					 
					
						2024-05-19 00:44:42 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						059031b8c4 
					 
					
						
						
							
							ci : re-enable sanitizer runs ( #7358 )  
						
						... 
						
						
						
						* Revert "ci : temporary disable sanitizer builds (#6128 )"
This reverts commit 4f6d1337ca 
						
						
							
 
						
					 
					
						2024-05-18 18:55:54 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						511182eabb 
					 
					
						
						
							
							android : use "ci-android" branch for CI ( #7341 )  
						
						... 
						
						
						
						* android : use "ci-android" branch for CI
* ggml : disable SIMD exp and silu for 32-bit ARM
ggml-ci
* android : do not fetch, use add_subdirectory instead
* cmake : provide binary dir 
						
						
							
 
						
					 
					
						2024-05-18 20:40:39 +10:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						133d99c599 
					 
					
						
						
							
							CUDA: deduplicate FlashAttention code ( #7352 )  
						
						
						
						
							
 
						
					 
					
						2024-05-18 12:36:25 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						cb42c29427 
					 
					
						
						
							
							server: correct --threads documentation [no ci] ( #7362 )  
						
						
						
						
							
						
					 
					
						2024-05-18 11:10:47 +02:00 
						 
				 
			
				
					
						
							
							
								Engininja2 
							
						 
					 
					
						
						
							
						
						d233b507cd 
					 
					
						
						
							
							cuda : add half2 __shfl_xor() for ROCm 5.5 ( #7263 )  
						
						
						
						
							
						
					 
					
						2024-05-18 10:05:17 +02:00 
						 
				 
			
				
					
						
							
							
								Steffen Röcker 
							
						 
					 
					
						
						
							
						
						0f98acfac6 
					 
					
						
						
							
							llama : add support for larger Granite Code Models (20B, 34B) ( #7324 )  
						
						... 
						
						
						
						Tie the weights for ARCH_STARCODER to support the larger Granite code models.
Partially addresses ggerganov/issues/7116
There still remains to be a few things to fix.
Currently requires `--override-kv tokenizer.ggml.add_bos_token=bool:false` 
						
						
							
 
						
					 
					
						2024-05-18 11:04:55 +03:00 
						 
				 
			
				
					
						
							
							
								strawberrymelonpanda 
							
						 
					 
					
						
						
							
						
						ca57e0f35e 
					 
					
						
						
							
							perplexity : ndot progress and show stats with < 100 tasks ( #7348 )  
						
						... 
						
						
						
						Fix floating point error with ndot printing, allow end stats on lower task numbers if multiple-choice tasks. 
						
						
							
 
						
					 
					
						2024-05-18 10:57:08 +03:00 
						 
				 
			
				
					
						
							
							
								0cc4m 
							
						 
					 
					
						
						
							
						
						c1b295eea5 
					 
					
						
						
							
							Update and fix Vulkan soft_max and argsort implementations ( #7237 )  
						
						... 
						
						
						
						* Update and fix Vulkan softmax implementation
* Update and fix Vulkan argsort implementation 
						
						
							
 
						
					 
					
						2024-05-18 08:10:58 +02:00 
						 
				 
			
				
					
						
							
							
								Brian 
							
						 
					 
					
						
						
							
						
						de73196344 
					 
					
						
						
							
							github-actions-labeler: initial commit ( #7330 )  
						
						... 
						
						
						
						* github-actions-labeler: initial commit [no ci]
* github actions: remove priority auto labeling [no ci] 
						
						
							
						
					 
					
						2024-05-18 16:04:23 +10:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						b49a13dd2f 
					 
					
						
						
							
							convert : fix set_vocab_sentencepiece ( #6866 )  
						
						... 
						
						
						
						* convert : fix set_vocab_sentencepiece
* Update convert-hf-to-gguf.py 
						
						
							
						
					 
					
						2024-05-18 08:46:20 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						05834841dc 
					 
					
						
						
							
							ggml : fix quants nans when all the group weights are very close to zero ( #7313 )  
						
						
						
						
							
 
						
					 
					
						2024-05-18 02:39:54 +02:00 
						 
				 
			
				
					
						
							
							
								Engininja2 
							
						 
					 
					
						
						
							
						
						ef277de2ad 
					 
					
						
						
							
							cmake : fix typo in AMDGPU_TARGETS ( #7356 )  
						
						
						
						
							
 
						
					 
					
						2024-05-18 02:39:25 +02:00 
						 
				 
			
				
					
						
							
							
								jaime-m-p 
							
						 
					 
					
						
						
							
						
						b43272afa2 
					 
					
						
						
							
							Unicode codepoint flags for custom regexs ( #7245 )  
						
						... 
						
						
						
						* Replace CODEPOINT_TYPE_* with codepoint_flags
* Update and bugfix brute force random test
* Deterministic brute force random test
* Unicode normalization NFD
* Get rid of BOM 
						
						
							
 
						
					 
					
						2024-05-18 01:09:13 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						0fc1e820a9 
					 
					
						
						
							
							CUDA: faster large batch FA without tensor cores ( #7314 )  
						
						
						
						
							
 
						
					 
					
						2024-05-17 18:54:52 +02:00