M. Yusuf Sarıgöz 
							
						 
					 
					
						
						
							
						
						af1c9966c8 
					 
					
						
						
							
							gguf : start write tensor info  
						
						
						
						
							
						
					 
					
						2023-07-27 10:32:31 +03:00 
						 
				 
			
				
					
						
							
							
								M. Yusuf Sarıgöz 
							
						 
					 
					
						
						
							
						
						8332d26123 
					 
					
						
						
							
							refactor: reduce code duplication and better API  
						
						
						
						
							
						
					 
					
						2023-07-27 09:48:08 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d8491fc7e3 
					 
					
						
						
							
							gguf : add comments  
						
						
						
						
							
						
					 
					
						2023-07-26 23:00:24 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5628ec7163 
					 
					
						
						
							
							gguf : read / write sample models  
						
						
						
						
							
						
					 
					
						2023-07-26 22:40:45 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e46870f5af 
					 
					
						
						
							
							gguf : gguf.c is now part of ggml.c  
						
						
						
						
							
						
					 
					
						2023-07-26 18:55:32 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d313c0fa33 
					 
					
						
						
							
							gguf : simplify gguf_get_val  
						
						
						
						
							
						
					 
					
						2023-07-26 18:53:57 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						cb871fa022 
					 
					
						
						
							
							gguf : do not support passing existing ggml_context to gguf_init  
						
						
						
						
							
						
					 
					
						2023-07-26 18:48:52 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						860c9c63ce 
					 
					
						
						
							
							gguf : add gguf_get_tensor_name()  
						
						
						
						
							
						
					 
					
						2023-07-26 18:21:14 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						78b226a959 
					 
					
						
						
							
							gguf : initial model loading - not tested  
						
						
						
						
							
						
					 
					
						2023-07-26 18:21:14 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d91b985d2d 
					 
					
						
						
							
							gguf : read tensor info  
						
						
						
						
							
						
					 
					
						2023-07-26 18:21:13 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8d6acfec12 
					 
					
						
						
							
							gguf : read header + meta data  
						
						
						
						
							
						
					 
					
						2023-07-26 18:21:13 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						6873148771 
					 
					
						
						
							
							gguf : first API pass  
						
						
						
						
							
						
					 
					
						2023-07-26 18:21:13 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						7e82d25f40 
					 
					
						
						
							
							ci : disable CI temporary to not waste energy  
						
						
						
						
							
						
					 
					
						2023-07-26 18:21:13 +03:00 
						 
				 
			
				
					
						
							
							
								M. Yusuf Sarıgöz 
							
						 
					 
					
						
						
							
						
						bae6b125f6 
					 
					
						
						
							
							wip : implement GGUF ( #2397 )  
						
						... 
						
						
						
						* Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384 )
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
* WIP: python class to write GGUF, incomplete C apı for reading
---------
Co-authored-by: Kawrakow <48489457+ikawrakow@users.noreply.github.com >
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
						
					 
					
						2023-07-26 18:21:13 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						4d698495ea 
					 
					
						
						
							
							gguf : init  
						
						
						
						
							
						
					 
					
						2023-07-26 18:21:12 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						5488fb789e 
					 
					
						
						
							
							ggml : allocate graphs in a context ( #2392 )  
						
						... 
						
						
						
						* ggml : graph allocation in contexts
* allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx
* llama.cpp : allocate graph in the context
* add GGML_PAD
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-26 15:56:53 +02:00 
						 
				 
			
				
					
						
							
							
								Kawrakow 
							
						 
					 
					
						
						
							
						
						eb542d3932 
					 
					
						
						
							
							Add LLAMA_DEFAULT_RMS_EPS so we can change the default ( #2384 )  
						
						... 
						
						
						
						Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-25 18:35:53 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						07aaa0f63f 
					 
					
						
						
							
							ggml : fix ggml_flash_attn to use op_params ( #2387 )  
						
						... 
						
						
						
						* ggml : fix ggml_flash_attn to use op_params 
						
						
							
 
						
					 
					
						2023-07-25 16:20:12 +02:00 
						 
				 
			
				
					
						
							
							
								ldwang 
							
						 
					 
					
						
						
							
						
						fce48caf9a 
					 
					
						
						
							
							convert.py : support bpe tokenizer ( #2228 )  
						
						... 
						
						
						
						* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com >
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com >
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <ftgreat@gmail.com >
---------
Signed-off-by: ldwang <ftgreat@gmail.com >
Co-authored-by: ldwang <ftgreat@gmail.com > 
						
						
							
						
					 
					
						2023-07-25 16:22:09 +03:00 
						 
				 
			
				
					
						
							
							
								Jiahao Li 
							
						 
					 
					
						
						
							
						
						875086bdb9 
					 
					
						
						
							
							ggml : relax contiguous constraints in activation function ( #2371 )  
						
						
						
						
							
 
						
					 
					
						2023-07-25 15:58:32 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						da1889834a 
					 
					
						
						
							
							ggml : improve graph build time via hash table lookup ( #2329 )  
						
						... 
						
						
						
						* improve graph build time
* ggml_tensor : use 1 bit per flag
* use a hash table instead 
						
						
							
 
						
					 
					
						2023-07-25 15:32:20 +03:00 
						 
				 
			
				
					
						
							
							
								Hesen Peng 
							
						 
					 
					
						
						
							
						
						82552b7f54 
					 
					
						
						
							
							build : fix line breaking error in build-info.sh ( #2349 )  
						
						... 
						
						
						
						* fix line breaking
* build number line break removal 
						
						
							
						
					 
					
						2023-07-25 15:24:09 +03:00 
						 
				 
			
				
					
						
							
							
								Xiao-Yong Jin 
							
						 
					 
					
						
						
							
						
						0c06204fb3 
					 
					
						
						
							
							main : add --in-prefix-bos to prefix BOS to user inputs; keep EOS ( #2304 )  
						
						... 
						
						
						
						* add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS
The BOS precedes the string specified by `--in-prefix`.
Model generated EOS is now kept in the context.
It provides a way to strictly following the prompt format used in
Llama-2-chat.
The EOS handling also benefits some existing finetunes that uses
EOS to mark the end of turn.
* examples/common: move input_prefix_bos to other bools 
						
						
							
 
						
					 
					
						2023-07-25 15:19:11 +03:00 
						 
				 
			
				
					
						
							
							
								Eve 
							
						 
					 
					
						
						
							
						
						1fed755b1f 
					 
					
						
						
							
							ci : add non-AVX scalar build/test ( #2356 )  
						
						... 
						
						
						
						* noavx build and test
* we don't need to remove f16c in windows 
						
						
							
 
						
					 
					
						2023-07-25 15:16:13 +03:00 
						 
				 
			
				
					
						
							
							
								katsu560 
							
						 
					 
					
						
						
							
						
						be2301bcda 
					 
					
						
						
							
							k_quants : add AVX support to dot functions with QK_K as 64 ( #2339 )  
						
						... 
						
						
						
						* add AVX to ggml_vec_dot_q2_K_q8_K()
* add AVX to ggml_vec_dot_q3_K_q8_K()
* add AVX to ggml_vec_dot_q4_K_q8_K()
* add AVX to ggml_vec_dot_q5_K_q8_K()
* add AVX to ggml_vec_dot_q6_K_q8_K()
* refactor AVX code in ggml_vec_dot_q6_K_q8_K() 
						
						
							
 
						
					 
					
						2023-07-25 15:13:41 +03:00 
						 
				 
			
				
					
						
							
							
								Shouzheng Liu 
							
						 
					 
					
						
						
							
						
						1aa18ef994 
					 
					
						
						
							
							metal : concurrently dispatch commands ( #2358 )  
						
						... 
						
						
						
						* metal: concurrently dispatch commands
Function `ggml_metal_graph_find_concurrency` will run and write
commands that can be issued concurrently to metal context `concur_list`
array, when `ggml_metal_graph_compute` is called for the first time.
* metal: don't call find_concurrency automatically.
* metal : code style changes
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-25 15:00:19 +03:00 
						 
				 
			
				
					
						
							
							
								Kawrakow 
							
						 
					 
					
						
						
							
						
						9a08eaf3c4 
					 
					
						
						
							
							Another speed gain for Q4_0 and Q4_1 on Metal ( #2375 )  
						
						... 
						
						
						
						* Another speed gain for Q4_0 and Q4_1 on Metal
* Have N_DST, etc., be template parameters
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
						
					 
					
						2023-07-25 13:48:29 +03:00 
						 
				 
			
				
					
						
							
							
								Kawrakow 
							
						 
					 
					
						
						
							
						
						129d844c87 
					 
					
						
						
							
							Fix Q4_K and Q5_K for QK_K = 64 on CUDA ( #2359 )  
						
						... 
						
						
						
						* Fix Q4_K and Q5_K for QK_K = 64
* Very slightly better Q5_K bit fiddling
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-25 13:48:04 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						d5512b782b 
					 
					
						
						
							
							server: add rms_norm_eps parameter ( #2380 )  
						
						
						
						
							
 
						
					 
					
						2023-07-25 12:36:17 +03:00 
						 
				 
			
				
					
						
							
							
								Henri Vasserman 
							
						 
					 
					
						
						
							
						
						c798308e3a 
					 
					
						
						
							
							[Server] Escape HTML in webchat ( #2368 )  
						
						... 
						
						
						
						* escape HTML in webchat
* add amp 
						
						
							
 
						
					 
					
						2023-07-25 10:27:34 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						41c674161f 
					 
					
						
						
							
							make rms_norm_eps a parameter ( #2374 )  
						
						... 
						
						
						
						* make rms_norm_eps a parameter
* add rms_norm_eps to command line
* fix baby llama, test-grad0
* use scientific notation for eps param in the help
ggml-ci 
						
						
							
 
						
					 
					
						2023-07-24 17:57:12 +02:00 
						 
				 
			
				
					
						
							
							
								Aarni Koskela 
							
						 
					 
					
						
						
							
						
						b3f138d058 
					 
					
						
						
							
							Chat UI extras ( #2366 )  
						
						... 
						
						
						
						* makefile: correct deps for server
* server: tighten settings layout a little
* server: expose all currently configured generation params in UI
* server: expose remaining generation params, for the adventurous
* server: embetter mirostat fields 
						
						
							
 
						
					 
					
						2023-07-24 17:54:22 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5b2b2dc6ae 
					 
					
						
						
							
							ggml : sync (unary ops refactor, static-correctness) ( #2370 )  
						
						... 
						
						
						
						* ggml : sync (unary ops, tests)
ggml-ci
* tests : remove unnecessary funcs 
						
						
							
 
						
					 
					
						2023-07-24 14:46:21 +03:00 
						 
				 
			
				
					
						
							
							
								Kawrakow 
							
						 
					 
					
						
						
							
						
						42f70cb2f6 
					 
					
						
						
							
							Fix scalar version of Q5_K when QK_K = 64 ( #2362 )  
						
						... 
						
						
						
						Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-24 12:55:02 +03:00 
						 
				 
			
				
					
						
							
							
								Evan Jones 
							
						 
					 
					
						
						
							
						
						84e09a7d8b 
					 
					
						
						
							
							llama : add grammar-based sampling ( #1773 )  
						
						... 
						
						
						
						* llama, main : constrain sampling to grammar
* allow loading grammar from file
* fix whitespace errors
* handle & print parser errors
* add comments to grammar syntax and allow newlines where unambiguous
* add missing include
* support alternates in root rule
* fix bugs with empty token and EOS
* adjust JSON grammar
* remove swp file
* rewrite ternary expressions
Co-authored-by: Henri Vasserman <henv@hot.ee >
* use struct for grammar elements and add Unicode support
* add unicode escapes
* add inverse char ranges
* only sample full tokens (no peeking or truncation)
* llama : minor style changes
blindly applied in online editor - hopefully I didn't break something
* update help text
* add warning message if EOS is disabled
---------
Co-authored-by: Henri Vasserman <henv@hot.ee >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-23 23:58:10 -04:00 
						 
				 
			
				
					
						
							
							
								Kawrakow 
							
						 
					 
					
						
						
							
						
						2f9cf974a0 
					 
					
						
						
							
							Some more Q4_K and Q5_K speedup on CUDA ( #2346 )  
						
						... 
						
						
						
						* Faster Q5_K on CUDA
* Small Q5_K improvement on older GPUs
* Spped up Q4_K on CUDA
GTX1660: 29.5 ms/t -> 25.6 ms/t
RTX4080: 8.40 ms/t -> 8.25 ms/t
* Spped up Q4_K on CUDA
GTX1660: 36.7 ms/t -> 35.6 ms/t
RTX4080:  9.8 ms/t ->  9.5 ms/t
* Address PR comments
* Add some comments to satisfy PR reviewer
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-24 00:19:47 +03:00 
						 
				 
			
				
					
						
							
							
								IgnacioFDM 
							
						 
					 
					
						
						
							
						
						4f06592cc6 
					 
					
						
						
							
							Add gqa parameter support to the server ( #2351 )  
						
						... 
						
						
						
						* Add gqa parameter support to the server
* Change help from stderr to stdout 
						
						
							
 
						
					 
					
						2023-07-23 23:31:17 +03:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						70d26ac388 
					 
					
						
						
							
							Fix __dp4a documentation ( #2348 )  
						
						
						
						
							
						
					 
					
						2023-07-23 17:49:06 +02:00 
						 
				 
			
				
					
						
							
							
								wzy 
							
						 
					 
					
						
						
							
						
						57921ca6db 
					 
					
						
						
							
							common : n_threads == -1 uses std::thread::hardware_concurrency() ( #2347 )  
						
						... 
						
						
						
						* Fix  #2345 , fix incorrect n_threads
* Update examples/common.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-23 16:33:02 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						3602ac4255 
					 
					
						
						
							
							fix n_tasks ( #2342 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2023-07-23 15:19:39 +02:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						95a6c595e7 
					 
					
						
						
							
							ggml: move op parameters from tensors to ggml_tensor::op_params ( #2333 )  
						
						... 
						
						
						
						* ggml: move op parameters from tensors to ggml_tensor::op_params
* alibi: use memcpy for float params
* remove `src[1] = NULL` in ops 
						
						
							
 
						
					 
					
						2023-07-23 14:36:02 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e76d630df1 
					 
					
						
						
							
							llama : grouped-query attention + LLaMAv2 70B support ( #2276 )  
						
						... 
						
						
						
						* CUDA: GQA implementation
* llama : support for GQA and LLaMAv2 70B
ggml-ci
* py : fix hparams parsing (if-else blocks)
ggml-ci
* py : oh boy ..
ggml-ci
* help : fix gqa value for 70B
ggml-ci
---------
Co-authored-by: JohannesGaessler <johannesg@5d6.de > 
						
						
							
 
						
					 
					
						2023-07-23 15:09:47 +03:00 
						 
				 
			
				
					
						
							
							
								maddes8cht 
							
						 
					 
					
						
						
							
						
						1d0824b247 
					 
					
						
						
							
							llama : print help to stdout ( #2338 )  
						
						
						
						
							
 
						
					 
					
						2023-07-23 14:59:48 +03:00 
						 
				 
			
				
					
						
							
							
								wzy 
							
						 
					 
					
						
						
							
						
						bc3ec2cdc9 
					 
					
						
						
							
							flake : support nix build '.#opencl' ( #2337 )  
						
						
						
						
							
						
					 
					
						2023-07-23 14:57:02 +03:00 
						 
				 
			
				
					
						
							
							
								Christian Demsar 
							
						 
					 
					
						
						
							
						
						a940458e48 
					 
					
						
						
							
							llama : print max tensor size to stderr ( #2336 )  
						
						
						
						
							
 
						
					 
					
						2023-07-23 14:56:34 +03:00 
						 
				 
			
				
					
						
							
							
								Jose Maldonado 
							
						 
					 
					
						
						
							
						
						91171b8072 
					 
					
						
						
							
							make : fix CLBLAST compile support in FreeBSD ( #2331 )  
						
						... 
						
						
						
						* Fix Makefile for CLBLAST compile support and instructions for compile llama.cpp FreeBSD
* More general use-case for CLBLAST support (Linux and FreeBSD) 
						
						
							
 
						
					 
					
						2023-07-23 14:52:08 +03:00 
						 
				 
			
				
					
						
							
							
								AustinMroz 
							
						 
					 
					
						
						
							
						
						355c80f49e 
					 
					
						
						
							
							examples : simplify vim plugin ( #2327 )  
						
						... 
						
						
						
						Uses builtin json_encode and json_decode functions to simplify escaping
Removes the need for temp files 
						
						
							
						
					 
					
						2023-07-23 14:16:48 +03:00 
						 
				 
			
				
					
						
							
							
								Jiahao Li 
							
						 
					 
					
						
						
							
						
						83a00ce69b 
					 
					
						
						
							
							metal : support bcast add & dup & cont op ( #2323 )  
						
						
						
						
							
						
					 
					
						2023-07-23 14:00:37 +03:00 
						 
				 
			
				
					
						
							
							
								Kawrakow 
							
						 
					 
					
						
						
							
						
						d2a43664f9 
					 
					
						
						
							
							Speed up Q4_K ( #2322 )  
						
						... 
						
						
						
						Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-23 08:49:20 +03:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						b9b7d94fc1 
					 
					
						
						
							
							CUDA: Fixed 7b q3_K_S with mul_mat_vec_q ( #2313 )  
						
						
						
						
							
 
						
					 
					
						2023-07-22 21:27:34 +02:00