Neo Zhang Jianyu 
							
						 
					 
					
						
						
							
						
						a6a8f8d09c 
					 
					
						
						
							
							Update docs/backend/SYCL.md  
						
						... 
						
						
						
						Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com > 
						
						
							
						
					 
					
						2024-09-17 16:25:43 +08:00 
						 
				 
			
				
					
						
							
							
								arthw 
							
						 
					 
					
						
						
							
						
						8241151f16 
					 
					
						
						
							
							set context default to avoid memory issue, update guide  
						
						
						
						
							
						
					 
					
						2024-09-14 09:01:05 +08:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						feff4aa846 
					 
					
						
						
							
							server : add loading html page while model is loading ( #9468 )  
						
						... 
						
						
						
						* Adding loading page for '/' server requests
* set content when model is loading
* removed loading html file
* updated cmakelist
* updated makefile
* cleaned up whitespace
* cleanup for PR removed error
* updated server test to handle 503 HTML
* updated server test to handle 503 HTML
* ca†ch 503 before parsing json
* revert test
* account for both api and web browser requests
* precommit corrections
* eol fix
* revert changes to pre-commit
* removed print statement
* made loading message more descriptive
* also support .html files
---------
Co-authored-by: VJHack <flymyplane21@gmail.com >
Co-authored-by: Vinesh Janarthanan <36610342+VJHack@users.noreply.github.com > 
						
						
							
 
						
					 
					
						2024-09-13 14:23:11 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						0abc6a2c25 
					 
					
						
						
							
							llama : llama_perf + option to disable timings during decode ( #9355 )  
						
						... 
						
						
						
						* llama : llama_perf + option to disable timings during decode
ggml-ci
* common : add llama_arg
* Update src/llama.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* perf : separate functions in the API
ggml-ci
* perf : safer pointer handling + naming update
ggml-ci
* minor : better local var name
* perf : abort on invalid sampler pointer
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com > 
						
						
							
 
						
					 
					
						2024-09-13 09:53:38 +03:00 
						 
				 
			
				
					
						
							
							
								Gilad S. 
							
						 
					 
					
						
						
							
						
						bd35cb0ae3 
					 
					
						
						
							
							feat: remove a sampler from a chain ( #9445 )  
						
						... 
						
						
						
						* feat: remove a sampler from a chain
* fix: return removed sampler
* fix: safer casting 
						
						
							
 
						
					 
					
						2024-09-13 03:54:49 +02:00 
						 
				 
			
				
					
						
							
							
								Mathijs Henquet 
							
						 
					 
					
						
						
							
						
						78203641fe 
					 
					
						
						
							
							server : Add option to return token pieces in /tokenize endpoint ( #9108 )  
						
						... 
						
						
						
						* server : added with_pieces functionality to /tokenize endpoint
* server : Add tokenize with pieces tests to server.feature
* Handle case if tokenizer splits along utf8 continuation bytes
* Add example of token splitting
* Remove trailing ws
* Fix trailing ws
* Maybe fix ci
* maybe this fix windows ci?
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
							
 
						
					 
					
						2024-09-12 22:30:11 +02:00 
						 
				 
			
				
					
						
							
							
								Dou Xinpeng 
							
						 
					 
					
						
						
							
						
						e6b7801bd1 
					 
					
						
						
							
							cann: Add host buffer type for Ascend NPU ( #9406 )  
						
						... 
						
						
						
						* feat: Add host buffer type for Ascend NPU(CANN backend)
* fix some checking errors
* Add a few comments 
						
						
							
 
						
					 
					
						2024-09-12 19:46:43 +08:00 
						 
				 
			
				
					
						
							
							
								fengerhu1 
							
						 
					 
					
						
						
							
						
						e665744317 
					 
					
						
						
							
							llava : fix the script error in MobileVLM README ( #9054 )  
						
						... 
						
						
						
						Signed-off-by: Erhu Feng <2748250768@qq.com > 
						
						
							
 
						
					 
					
						2024-09-12 14:34:22 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						d4c3c10fad 
					 
					
						
						
							
							lora : raise error if lm_head is ignored ( #9103 )  
						
						... 
						
						
						
						* lora : raise error if lm_head is ignored
* fix style
* clarify comment 
						
						
							
						
					 
					
						2024-09-12 14:33:57 +03:00 
						 
				 
			
				
					
						
							
							
								Michael Podvitskiy 
							
						 
					 
					
						
						
							
						
						2a825116b6 
					 
					
						
						
							
							cmake : fix for builds without GGML_CDEF_PUBLIC ( #9338 )  
						
						... 
						
						
						
						* `GGML_TARGET_DEFINES-NOTFOUND` fix for builds without `GGML_CDEF_PUBLIC`
* Update CMakeLists.txt, spaces fix 
						
						
							
 
						
					 
					
						2024-09-12 14:30:01 +03:00 
						 
				 
			
				
					
						
							
							
								Huang Qi 
							
						 
					 
					
						
						
							
						
						4dc4f5f14a 
					 
					
						
						
							
							ci : update HIP SDK to 24.Q3 (ROCm 6.1) ( #9329 )  
						
						
						
						
							
 
						
					 
					
						2024-09-12 14:28:43 +03:00 
						 
				 
			
				
					
						
							
							
								daminho 
							
						 
					 
					
						
						
							
						
						c837981bba 
					 
					
						
						
							
							py : add Phi-1.5/Phi-2 tokenizer ( #9361 )  
						
						... 
						
						
						
						* add phi2 tokenizer
* add phi name to convert_hf_to_gguf_update.py
* make tokenizer_pre consistent; llama.cpp work 
						
						
							
						
					 
					
						2024-09-12 14:28:20 +03:00 
						 
				 
			
				
					
						
							
							
								Trivikram Kamat 
							
						 
					 
					
						
						
							
						
						3c26a1644d 
					 
					
						
						
							
							ci : bump actions/checkout to v4 ( #9377 )  
						
						
						
						
							
						
					 
					
						2024-09-12 14:27:45 +03:00 
						 
				 
			
				
					
						
							
							
								Michael Podvitskiy 
							
						 
					 
					
						
						
							
						
						ff76e18516 
					 
					
						
						
							
							cmake : fixed the order of linking libraries for llama-quantize ( #9450 )  
						
						
						
						
							
 
						
					 
					
						2024-09-12 14:27:14 +03:00 
						 
				 
			
				
					
						
							
							
								Molly Sophia 
							
						 
					 
					
						
						
							
						
						39f852f440 
					 
					
						
						
							
							py : add special tokens in hf_converter for RWKV v6 ( #9428 )  
						
						... 
						
						
						
						Signed-off-by: Molly Sophia <mollysophia379@gmail.com > 
						
						
							
						
					 
					
						2024-09-12 14:25:16 +03:00 
						 
				 
			
				
					
						
							
							
								Ahmad Tameem 
							
						 
					 
					
						
						
							
						
						2b00fa7997 
					 
					
						
						
							
							riscv : modify Makefile and add a RISCV_VECT to print log info ( #9442 )  
						
						... 
						
						
						
						- Added ggml_cpu_has_riscv_v() in GGML to print system info in log
- Modified Makefile to only use flag when cross compiling for RISC-V 
						
						
							
 
						
					 
					
						2024-09-12 14:24:31 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d6a04f872d 
					 
					
						
						
							
							ggml : hide ggml_object, ggml_cgraph, ggml_hash_set ( #9408 )  
						
						... 
						
						
						
						* ggml : hide ggml_object, ggml_cgraph, ggml_hash_set
ggml-ci
* ggml : add ggml-impl.h to backends
* ggml : fix compiler warnings
ggml-ci
* ggml : add assert upon adding nodes 
						
						
							
 
						
					 
					
						2024-09-12 14:23:49 +03:00 
						 
				 
			
				
					
						
							
							
								Neo Zhang Jianyu 
							
						 
					 
					
						
						
							
						
						c9c8575a1a 
					 
					
						
						
							
							enhance run script to be easy to change the parameters ( #9448 )  
						
						... 
						
						
						
						Co-authored-by: arthw <14088817+arthw@users.noreply.github.com > 
						
						
							
 
						
					 
					
						2024-09-12 17:44:17 +08:00 
						 
				 
			
				
					
						
							
							
								Xinpeng Dou 
							
						 
					 
					
						
						
							
						
						df4b7945ae 
					 
					
						
						
							
							cann: Fix error when running a non-exist op ( #9424 )  
						
						
						
						
							
 
						
					 
					
						2024-09-12 09:02:35 +08:00 
						 
				 
			
				
					
						
							
							
								Faisal Zaghloul 
							
						 
					 
					
						
						
							
						
						449ccfb6f5 
					 
					
						
						
							
							Add Jais to list of supported models ( #9439 )  
						
						... 
						
						
						
						Co-authored-by: fmz <quic_fzaghlou@quic.com > 
						
						
							
						
					 
					
						2024-09-12 02:29:53 +02:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						1b28061400 
					 
					
						
						
							
							llama : skip token bounds check when evaluating embeddings ( #9437 )  
						
						
						
						
							
 
						
					 
					
						2024-09-11 17:52:13 +02:00 
						 
				 
			
				
					
						
							
							
								Pavel Zloi 
							
						 
					 
					
						
						
							
						
						8db003a19d 
					 
					
						
						
							
							py : support converting local models ( #7547 )  
						
						... 
						
						
						
						* Support of converting local models added to convert-hf-to-gguf-update.py
* Description fixed
* shutil added to imports 
						
						
							
						
					 
					
						2024-09-11 15:29:51 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						0996c5597f 
					 
					
						
						
							
							llava : correct args for minicpmv-cli ( #9429 )  
						
						
						
						
							
 
						
					 
					
						2024-09-11 12:59:13 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						5bb2c5dbd2 
					 
					
						
						
							
							files : remove accidentally added lora_test submodule ( #9430 )  
						
						
						
						
							
						
					 
					
						2024-09-11 13:02:09 +03:00 
						 
				 
			
				
					
						
							
							
								Farbod Bijary 
							
						 
					 
					
						
						
							
						
						67155ab7f5 
					 
					
						
						
							
							feat: Implements retrying logic for downloading models using --model-url flag ( #9255 )  
						
						... 
						
						
						
						* feat: Implements retrying logic for downloading models using --model-url flag
* Update common/common.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* Update common/common.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* apply comments
* implements a retry function to avoid duplication
* fix editorconfig
* change function name
---------
Co-authored-by: farbod <farbod.bjary82@gmail.com >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: slaren <slarengh@gmail.com >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
							
 
						
					 
					
						2024-09-11 11:22:37 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						5af118efda 
					 
					
						
						
							
							CUDA: fix --split-mode row race condition ( #9413 )  
						
						
						
						
							
 
						
					 
					
						2024-09-11 10:22:40 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d2b496bff4 
					 
					
						
						
							
							batched-bench : remove unused code ( #9305 )  
						
						
						
						
							
 
						
					 
					
						2024-09-11 10:03:54 +03:00 
						 
				 
			
				
					
						
							
							
								R0CKSTAR 
							
						 
					 
					
						
						
							
						
						b34e023480 
					 
					
						
						
							
							musa: remove Clang builtins mapping ( #9421 )  
						
						... 
						
						
						
						Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com > 
						
						
							
 
						
					 
					
						2024-09-11 03:46:55 +02:00 
						 
				 
			
				
					
						
							
							
								Alberto Cabrera Pérez 
							
						 
					 
					
						
						
							
						
						51b6038636 
					 
					
						
						
							
							sycl : update support conditions  ( #9394 )  
						
						... 
						
						
						
						* sycl : update support condition to im2col
Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com >
* Added TODO to remind supporting FP32 im2col
---------
Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com > 
						
						
							
 
						
					 
					
						2024-09-11 08:53:42 +08:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						cb9c933eb2 
					 
					
						
						
							
							flake.lock: Update ( #9360 )  
						
						... 
						
						
						
						Flake lock file updates:
• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/af510d4a62d071ea13925ce41c95e3dec816c01d?narHash=sha256-ODYRm8zHfLTH3soTFWE452ydPYz2iTvr9T8ftDMUQ3E%3D' (2024-08-30)
  → 'github:hercules-ci/flake-parts/567b938d64d4b4112ee253b9274472dc3a346eb6?narHash=sha256-%2Bebgonl3NbiKD2UD0x4BszCZQ6sTfL4xioaM49o5B3Y%3D' (2024-09-01)
• Updated input 'flake-parts/nixpkgs-lib':
    'a5d394176e356624c120 
						
						
							
						
					 
					
						2024-09-10 15:46:59 -07:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						6cd4e03444 
					 
					
						
						
							
							arg : bring back missing ifdef ( #9411 )  
						
						... 
						
						
						
						* arg : bring back missing ifdef
* replace with llama_supports_gpu_offload 
						
						
							
 
						
					 
					
						2024-09-10 22:41:29 +02:00 
						 
				 
			
				
					
						
							
							
								matteo 
							
						 
					 
					
						
						
							
						
						8d300bd35f 
					 
					
						
						
							
							enable --special arg for llama-server ( #9419 )  
						
						... 
						
						
						
						Co-authored-by: matteo serva <matteo.serva@gmail.com > 
						
						
							
 
						
					 
					
						2024-09-10 22:40:59 +02:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						49006c67b4 
					 
					
						
						
							
							llama : move random seed generation to the samplers ( #9398 )  
						
						... 
						
						
						
						* llama_sampler_penalties : clamp penalty_last_n to zero 
						
						
							
 
						
					 
					
						2024-09-10 18:04:25 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						00ba2ff781 
					 
					
						
						
							
							metal : fix compile warning with GGML_METAL_NDEBUG ( #0 )  
						
						
						
						
							
 
						
					 
					
						2024-09-10 10:17:43 +03:00 
						 
				 
			
				
					
						
							
							
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						83008b7cfe 
					 
					
						
						
							
							llama : update llm_build_copy_mask_state comment [no ci] ( #9385 )  
						
						... 
						
						
						
						This commit updates the comment, which seems to contain a typo or be an
outdated comment, in the copy_mask_state function changing the variable
n_rs to n_kv.
I believe this change is correct and what the comment wants to
convey is to copy the states that are not going to be used in the
upcoming processing, which are the tokens states from n_seqs up to
the number of possible token states n_kv. 
						
						
							
						
					 
					
						2024-09-10 10:03:21 +03:00 
						 
				 
			
				
					
						
							
							
								Molly Sophia 
							
						 
					 
					
						
						
							
						
						0b4ac75772 
					 
					
						
						
							
							RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list ( #9387 )  
						
						... 
						
						
						
						Signed-off-by: Molly Sophia <mollysophia379@gmail.com > 
						
						
							
 
						
					 
					
						2024-09-10 10:02:30 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						fb3f249815 
					 
					
						
						
							
							make : do not run llama-gen-docs when building ( #9399 )  
						
						
						
						
							
 
						
					 
					
						2024-09-10 09:23:33 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						bfe76d4a17 
					 
					
						
						
							
							common : move arg parser code to arg.cpp ( #9388 )  
						
						... 
						
						
						
						* common : move arg parser to arg.cpp
* better categorize args
* add cmake
* missing climits
* missing cstdarg
* common : more explicit includes
* fix build
* refactor gpt_params_parse
* update server readme
* fix test
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
 
						
					 
					
						2024-09-09 23:36:09 +02:00 
						 
				 
			
				
					
						
							
							
								Radoslav Gerganov 
							
						 
					 
					
						
						
							
						
						293bebe077 
					 
					
						
						
							
							rpc : fix segfault with nkvo ( #9389 )  
						
						... 
						
						
						
						* rpc : fix nkvo
* rpc : buf_size must not be static
ref: #9337 
---------
Co-authored-by: slaren <slarengh@gmail.com > 
						
						
							
 
						
					 
					
						2024-09-09 18:40:10 +03:00 
						 
				 
			
				
					
						
							
							
								Prashant Vithule 
							
						 
					 
					
						
						
							
						
						5fac4d5764 
					 
					
						
						
							
							ggml : vector length agnostic SVE support ( #9290 )  
						
						... 
						
						
						
						* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths
* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths
* Removed WhiteSpaces
* ggml : style changes + fix 512-bit nb loop check
- fix local scope in switch cases
- consistent predicate names
- empty lines when necessary
- opening braces, spaces
- const-correctness
- add asserts
* Update ggml/src/ggml-quants.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
 
						
					 
					
						2024-09-09 18:37:18 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						5fb5e24811 
					 
					
						
						
							
							llama : minor sampling refactor (2) ( #9386 )  
						
						
						
						
							
 
						
					 
					
						2024-09-09 17:10:46 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						38ca6f644b 
					 
					
						
						
							
							readme : update hot topics  
						
						
						
						
							
						
					 
					
						2024-09-09 15:51:37 +03:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						8e6e2fbe14 
					 
					
						
						
							
							CUDA: fix variable name conflict for Windows build ( #9382 )  
						
						
						
						
							
 
						
					 
					
						2024-09-09 14:22:53 +02:00 
						 
				 
			
				
					
						
							
							
								Antonis Makropoulos 
							
						 
					 
					
						
						
							
						
						5ed087573e 
					 
					
						
						
							
							readme : add LLMUnity to UI projects ( #9381 )  
						
						... 
						
						
						
						* add LLMUnity to UI projects
* add newline to examples/rpc/README.md to fix editorconfig-checker unit test 
						
						
							
						
					 
					
						2024-09-09 14:21:38 +03:00 
						 
				 
			
				
					
						
							
							
								Radoslav Gerganov 
							
						 
					 
					
						
						
							
						
						54f376d0b9 
					 
					
						
						
							
							rpc : update README [no ci] ( #9320 )  
						
						... 
						
						
						
						Update README with instructions how to offload model layers to both
local and remote devices 
						
						
							
						
					 
					
						2024-09-09 11:04:39 +03:00 
						 
				 
			
				
					
						
							
							
								Dan Johansson 
							
						 
					 
					
						
						
							
						
						b2e89a3274 
					 
					
						
						
							
							Arm AArch64: Documentation updates ( #9321 )  
						
						... 
						
						
						
						* Arm AArch64: Documentation updates
* Update docs/build.md to include information on how to enable the Arm optimized gemm/gemv kernels
* Update examples/quantize/README.md with information on the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats
* Add newline to the end of docs/build.md 
						
						
							
						
					 
					
						2024-09-09 10:02:45 +03:00 
						 
				 
			
				
					
						
							
							
								Markus Tavenrath 
							
						 
					 
					
						
						
							
						
						daa9623ab0 
					 
					
						
						
							
							Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. ( #9118 )  
						
						... 
						
						
						
						* Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early.
* fix compile issues
* Fix issues where the last submit wasn't executed or handled properly.
* remove trailing whitespace
* Repair GGML_VULKAN_CHECK_RESULTS
* Increase submit counter only if actual work has been submitted and increase submit count to 100.
* Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled. 
						
						
							
 
						
					 
					
						2024-09-08 21:43:48 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e079bffb66 
					 
					
						
						
							
							cuda : fix FA Q src index (1 -> 0) ( #9374 )  
						
						
						
						
							
 
						
					 
					
						2024-09-08 22:01:02 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						3f7ccfd649 
					 
					
						
						
							
							common : bring back missing args, add env var duplication check ( #9375 )  
						
						... 
						
						
						
						* common : bring back missing args
* move duplication check to test-arg-parser
* add check for duplicated env var
* correct default values 
						
						
							
 
						
					 
					
						2024-09-08 18:08:55 +02:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						a249843d89 
					 
					
						
						
							
							common : restore --n-gpu-layers ( #9371 )  
						
						
						
						
							
 
						
					 
					
						2024-09-08 16:44:42 +02:00