M. Yusuf Sarıgöz 
							
						 
					 
					
						
						
							
						
						ea5f9ad2ca 
					 
					
						
						
							
							gguf : fix writing gguf arrays  
						
						
						
						
							
						
					 
					
						2023-07-29 12:25:43 +03:00 
						 
				 
			
				
					
						
							
							
								M. Yusuf Sarıgöz 
							
						 
					 
					
						
						
							
						
						d54f53ca51 
					 
					
						
						
							
							gguf : add tokenization constants  
						
						
						
						
							
						
					 
					
						2023-07-29 12:04:45 +03:00 
						 
				 
			
				
					
						
							
							
								M. Yusuf Sarıgöz 
							
						 
					 
					
						
						
							
						
						06f423a8e1 
					 
					
						
						
							
							gguf : write sample tensors to read  
						
						
						
						
							
						
					 
					
						2023-07-29 10:26:26 +03:00 
						 
				 
			
				
					
						
							
							
								M. Yusuf Sarıgöz 
							
						 
					 
					
						
						
							
						
						08dc8fd884 
					 
					
						
						
							
							gguf : do not hardcode tensor names to read  
						
						
						
						
							
						
					 
					
						2023-07-29 10:24:46 +03:00 
						 
				 
			
				
					
						
							
							
								M. Yusuf Sarıgöz 
							
						 
					 
					
						
						
							
						
						9475cdb7a3 
					 
					
						
						
							
							Merge branch 'gguf-write-tokenization' into gguf  
						
						
						
						
							
						
					 
					
						2023-07-29 00:36:35 +03:00 
						 
				 
			
				
					
						
							
							
								M. Yusuf Sarıgöz 
							
						 
					 
					
						
						
							
						
						1495735aac 
					 
					
						
						
							
							gguf : fix writing tensors  
						
						
						
						
							
						
					 
					
						2023-07-29 00:26:22 +03:00 
						 
				 
			
				
					
						
							
							
								klosax 
							
						 
					 
					
						
						
							
						
						3492f848d7 
					 
					
						
						
							
							gguf : add gguf_find_key ( #2438 )  
						
						... 
						
						
						
						* gguf.cpp : find key example
* ggml.h : add gguf_find_key
* ggml.c : add gguf_find_key 
						
						
							
						
					 
					
						2023-07-28 23:45:24 +03:00 
						 
				 
			
				
					
						
							
							
								M. Yusuf Sarıgöz 
							
						 
					 
					
						
						
							
						
						11ef380c2a 
					 
					
						
						
							
							GGUF : write tensor ( #2426 )  
						
						... 
						
						
						
						* WIP: Write tensor
* GGUF : Support writing tensors in Python
* refactor : rm unused import and upd todos
* fix : fix errors upd writing example
* rm example.gguf
* gitignore *.gguf
* undo formatting 
						
						
							
						
					 
					
						2023-07-28 11:34:16 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d2bb3ac10b 
					 
					
						
						
							
							convert.py : remove GGML vocab + other obsolete stuff  
						
						
						
						
							
						
					 
					
						2023-07-27 16:36:35 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						68f53485e4 
					 
					
						
						
							
							convert.py : start a new simplified implementation by removing old stuff  
						
						
						
						
							
						
					 
					
						2023-07-27 15:56:53 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						158be8f7f4 
					 
					
						
						
							
							gguf.py : some code style changes  
						
						
						
						
							
						
					 
					
						2023-07-27 15:37:06 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d2b6ca13ad 
					 
					
						
						
							
							gguf : add array support  
						
						
						
						
							
						
					 
					
						2023-07-27 14:53:07 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d89533dff6 
					 
					
						
						
							
							gguf : expose the gguf_type enum through the API for now  
						
						
						
						
							
						
					 
					
						2023-07-27 11:10:34 +03:00 
						 
				 
			
				
					
						
							
							
								M. Yusuf Sarıgöz 
							
						 
					 
					
						
						
							
						
						c85d3178b3 
					 
					
						
						
							
							refactor : reduce code duplication and better API ( #2415 )  
						
						
						
						
							
						
					 
					
						2023-07-27 10:29:29 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d8491fc7e3 
					 
					
						
						
							
							gguf : add comments  
						
						
						
						
							
						
					 
					
						2023-07-26 23:00:24 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5628ec7163 
					 
					
						
						
							
							gguf : read / write sample models  
						
						
						
						
							
						
					 
					
						2023-07-26 22:40:45 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e46870f5af 
					 
					
						
						
							
							gguf : gguf.c is now part of ggml.c  
						
						
						
						
							
						
					 
					
						2023-07-26 18:55:32 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d313c0fa33 
					 
					
						
						
							
							gguf : simplify gguf_get_val  
						
						
						
						
							
						
					 
					
						2023-07-26 18:53:57 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						cb871fa022 
					 
					
						
						
							
							gguf : do not support passing existing ggml_context to gguf_init  
						
						
						
						
							
						
					 
					
						2023-07-26 18:48:52 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						860c9c63ce 
					 
					
						
						
							
							gguf : add gguf_get_tensor_name()  
						
						
						
						
							
						
					 
					
						2023-07-26 18:21:14 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						78b226a959 
					 
					
						
						
							
							gguf : initial model loading - not tested  
						
						
						
						
							
						
					 
					
						2023-07-26 18:21:14 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d91b985d2d 
					 
					
						
						
							
							gguf : read tensor info  
						
						
						
						
							
						
					 
					
						2023-07-26 18:21:13 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8d6acfec12 
					 
					
						
						
							
							gguf : read header + meta data  
						
						
						
						
							
						
					 
					
						2023-07-26 18:21:13 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						6873148771 
					 
					
						
						
							
							gguf : first API pass  
						
						
						
						
							
						
					 
					
						2023-07-26 18:21:13 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						7e82d25f40 
					 
					
						
						
							
							ci : disable CI temporary to not waste energy  
						
						
						
						
							
						
					 
					
						2023-07-26 18:21:13 +03:00 
						 
				 
			
				
					
						
							
							
								M. Yusuf Sarıgöz 
							
						 
					 
					
						
						
							
						
						bae6b125f6 
					 
					
						
						
							
							wip : implement GGUF ( #2397 )  
						
						... 
						
						
						
						* Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384 )
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com >
* WIP: python class to write GGUF, incomplete C apı for reading
---------
Co-authored-by: Kawrakow <48489457+ikawrakow@users.noreply.github.com >
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
						
					 
					
						2023-07-26 18:21:13 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						4d698495ea 
					 
					
						
						
							
							gguf : init  
						
						
						
						
							
						
					 
					
						2023-07-26 18:21:12 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						5488fb789e 
					 
					
						
						
							
							ggml : allocate graphs in a context ( #2392 )  
						
						... 
						
						
						
						* ggml : graph allocation in contexts
* allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx
* llama.cpp : allocate graph in the context
* add GGML_PAD
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-26 15:56:53 +02:00 
						 
				 
			
				
					
						
							
							
								Kawrakow 
							
						 
					 
					
						
						
							
						
						eb542d3932 
					 
					
						
						
							
							Add LLAMA_DEFAULT_RMS_EPS so we can change the default ( #2384 )  
						
						... 
						
						
						
						Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-25 18:35:53 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						07aaa0f63f 
					 
					
						
						
							
							ggml : fix ggml_flash_attn to use op_params ( #2387 )  
						
						... 
						
						
						
						* ggml : fix ggml_flash_attn to use op_params 
						
						
							
 
						
					 
					
						2023-07-25 16:20:12 +02:00 
						 
				 
			
				
					
						
							
							
								ldwang 
							
						 
					 
					
						
						
							
						
						fce48caf9a 
					 
					
						
						
							
							convert.py : support bpe tokenizer ( #2228 )  
						
						... 
						
						
						
						* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com >
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com >
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <ftgreat@gmail.com >
---------
Signed-off-by: ldwang <ftgreat@gmail.com >
Co-authored-by: ldwang <ftgreat@gmail.com > 
						
						
							
						
					 
					
						2023-07-25 16:22:09 +03:00 
						 
				 
			
				
					
						
							
							
								Jiahao Li 
							
						 
					 
					
						
						
							
						
						875086bdb9 
					 
					
						
						
							
							ggml : relax contiguous constraints in activation function ( #2371 )  
						
						
						
						
							
 
						
					 
					
						2023-07-25 15:58:32 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						da1889834a 
					 
					
						
						
							
							ggml : improve graph build time via hash table lookup ( #2329 )  
						
						... 
						
						
						
						* improve graph build time
* ggml_tensor : use 1 bit per flag
* use a hash table instead 
						
						
							
 
						
					 
					
						2023-07-25 15:32:20 +03:00 
						 
				 
			
				
					
						
							
							
								Hesen Peng 
							
						 
					 
					
						
						
							
						
						82552b7f54 
					 
					
						
						
							
							build : fix line breaking error in build-info.sh ( #2349 )  
						
						... 
						
						
						
						* fix line breaking
* build number line break removal 
						
						
							
						
					 
					
						2023-07-25 15:24:09 +03:00 
						 
				 
			
				
					
						
							
							
								Xiao-Yong Jin 
							
						 
					 
					
						
						
							
						
						0c06204fb3 
					 
					
						
						
							
							main : add --in-prefix-bos to prefix BOS to user inputs; keep EOS ( #2304 )  
						
						... 
						
						
						
						* add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS
The BOS precedes the string specified by `--in-prefix`.
Model generated EOS is now kept in the context.
It provides a way to strictly following the prompt format used in
Llama-2-chat.
The EOS handling also benefits some existing finetunes that uses
EOS to mark the end of turn.
* examples/common: move input_prefix_bos to other bools 
						
						
							
 
						
					 
					
						2023-07-25 15:19:11 +03:00 
						 
				 
			
				
					
						
							
							
								Eve 
							
						 
					 
					
						
						
							
						
						1fed755b1f 
					 
					
						
						
							
							ci : add non-AVX scalar build/test ( #2356 )  
						
						... 
						
						
						
						* noavx build and test
* we don't need to remove f16c in windows 
						
						
							
 
						
					 
					
						2023-07-25 15:16:13 +03:00 
						 
				 
			
				
					
						
							
							
								katsu560 
							
						 
					 
					
						
						
							
						
						be2301bcda 
					 
					
						
						
							
							k_quants : add AVX support to dot functions with QK_K as 64 ( #2339 )  
						
						... 
						
						
						
						* add AVX to ggml_vec_dot_q2_K_q8_K()
* add AVX to ggml_vec_dot_q3_K_q8_K()
* add AVX to ggml_vec_dot_q4_K_q8_K()
* add AVX to ggml_vec_dot_q5_K_q8_K()
* add AVX to ggml_vec_dot_q6_K_q8_K()
* refactor AVX code in ggml_vec_dot_q6_K_q8_K() 
						
						
							
 
						
					 
					
						2023-07-25 15:13:41 +03:00 
						 
				 
			
				
					
						
							
							
								Shouzheng Liu 
							
						 
					 
					
						
						
							
						
						1aa18ef994 
					 
					
						
						
							
							metal : concurrently dispatch commands ( #2358 )  
						
						... 
						
						
						
						* metal: concurrently dispatch commands
Function `ggml_metal_graph_find_concurrency` will run and write
commands that can be issued concurrently to metal context `concur_list`
array, when `ggml_metal_graph_compute` is called for the first time.
* metal: don't call find_concurrency automatically.
* metal : code style changes
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-25 15:00:19 +03:00 
						 
				 
			
				
					
						
							
							
								Kawrakow 
							
						 
					 
					
						
						
							
						
						9a08eaf3c4 
					 
					
						
						
							
							Another speed gain for Q4_0 and Q4_1 on Metal ( #2375 )  
						
						... 
						
						
						
						* Another speed gain for Q4_0 and Q4_1 on Metal
* Have N_DST, etc., be template parameters
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
						
					 
					
						2023-07-25 13:48:29 +03:00 
						 
				 
			
				
					
						
							
							
								Kawrakow 
							
						 
					 
					
						
						
							
						
						129d844c87 
					 
					
						
						
							
							Fix Q4_K and Q5_K for QK_K = 64 on CUDA ( #2359 )  
						
						... 
						
						
						
						* Fix Q4_K and Q5_K for QK_K = 64
* Very slightly better Q5_K bit fiddling
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-25 13:48:04 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						d5512b782b 
					 
					
						
						
							
							server: add rms_norm_eps parameter ( #2380 )  
						
						
						
						
							
 
						
					 
					
						2023-07-25 12:36:17 +03:00 
						 
				 
			
				
					
						
							
							
								Henri Vasserman 
							
						 
					 
					
						
						
							
						
						c798308e3a 
					 
					
						
						
							
							[Server] Escape HTML in webchat ( #2368 )  
						
						... 
						
						
						
						* escape HTML in webchat
* add amp 
						
						
							
 
						
					 
					
						2023-07-25 10:27:34 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						41c674161f 
					 
					
						
						
							
							make rms_norm_eps a parameter ( #2374 )  
						
						... 
						
						
						
						* make rms_norm_eps a parameter
* add rms_norm_eps to command line
* fix baby llama, test-grad0
* use scientific notation for eps param in the help
ggml-ci 
						
						
							
 
						
					 
					
						2023-07-24 17:57:12 +02:00 
						 
				 
			
				
					
						
							
							
								Aarni Koskela 
							
						 
					 
					
						
						
							
						
						b3f138d058 
					 
					
						
						
							
							Chat UI extras ( #2366 )  
						
						... 
						
						
						
						* makefile: correct deps for server
* server: tighten settings layout a little
* server: expose all currently configured generation params in UI
* server: expose remaining generation params, for the adventurous
* server: embetter mirostat fields 
						
						
							
 
						
					 
					
						2023-07-24 17:54:22 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5b2b2dc6ae 
					 
					
						
						
							
							ggml : sync (unary ops refactor, static-correctness) ( #2370 )  
						
						... 
						
						
						
						* ggml : sync (unary ops, tests)
ggml-ci
* tests : remove unnecessary funcs 
						
						
							
 
						
					 
					
						2023-07-24 14:46:21 +03:00 
						 
				 
			
				
					
						
							
							
								Kawrakow 
							
						 
					 
					
						
						
							
						
						42f70cb2f6 
					 
					
						
						
							
							Fix scalar version of Q5_K when QK_K = 64 ( #2362 )  
						
						... 
						
						
						
						Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-24 12:55:02 +03:00 
						 
				 
			
				
					
						
							
							
								Evan Jones 
							
						 
					 
					
						
						
							
						
						84e09a7d8b 
					 
					
						
						
							
							llama : add grammar-based sampling ( #1773 )  
						
						... 
						
						
						
						* llama, main : constrain sampling to grammar
* allow loading grammar from file
* fix whitespace errors
* handle & print parser errors
* add comments to grammar syntax and allow newlines where unambiguous
* add missing include
* support alternates in root rule
* fix bugs with empty token and EOS
* adjust JSON grammar
* remove swp file
* rewrite ternary expressions
Co-authored-by: Henri Vasserman <henv@hot.ee >
* use struct for grammar elements and add Unicode support
* add unicode escapes
* add inverse char ranges
* only sample full tokens (no peeking or truncation)
* llama : minor style changes
blindly applied in online editor - hopefully I didn't break something
* update help text
* add warning message if EOS is disabled
---------
Co-authored-by: Henri Vasserman <henv@hot.ee >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-23 23:58:10 -04:00 
						 
				 
			
				
					
						
							
							
								Kawrakow 
							
						 
					 
					
						
						
							
						
						2f9cf974a0 
					 
					
						
						
							
							Some more Q4_K and Q5_K speedup on CUDA ( #2346 )  
						
						... 
						
						
						
						* Faster Q5_K on CUDA
* Small Q5_K improvement on older GPUs
* Spped up Q4_K on CUDA
GTX1660: 29.5 ms/t -> 25.6 ms/t
RTX4080: 8.40 ms/t -> 8.25 ms/t
* Spped up Q4_K on CUDA
GTX1660: 36.7 ms/t -> 35.6 ms/t
RTX4080:  9.8 ms/t ->  9.5 ms/t
* Address PR comments
* Add some comments to satisfy PR reviewer
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
 
						
					 
					
						2023-07-24 00:19:47 +03:00 
						 
				 
			
				
					
						
							
							
								IgnacioFDM 
							
						 
					 
					
						
						
							
						
						4f06592cc6 
					 
					
						
						
							
							Add gqa parameter support to the server ( #2351 )  
						
						... 
						
						
						
						* Add gqa parameter support to the server
* Change help from stderr to stdout 
						
						
							
 
						
					 
					
						2023-07-23 23:31:17 +03:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						70d26ac388 
					 
					
						
						
							
							Fix __dp4a documentation ( #2348 )  
						
						
						
						
							
						
					 
					
						2023-07-23 17:49:06 +02:00