eric8607242 
							
						 
					 
					
						
						
							
						
						ee1b497c98 
					 
					
						
						
							
							llama : support more diverse tokenizers? ( #2420 )  
						
						 
						
						... 
						
						
						
						* supporting more diverse tokenizers
* Update llama.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
  master-ee1b497
 
						
					 
					
						2023-07-28 21:10:05 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d73b8d48b4 
					 
					
						
						
							
							examples : fix whitespace  
						
						 
						
						
						
						
							
						
					 
					
						2023-07-28 21:05:08 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								nhamanasu 
							
						 
					 
					
						
						
							
						
						34ae1caf7f 
					 
					
						
						
							
							examples : server chat mode with llama2 ( #2400 )  
						
						 
						
						... 
						
						
						
						* add: server chat mode with llama2
* fix: remove the unnecessary last \n 
						
						
							
						
					 
					
						2023-07-28 21:02:10 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Weird Constructor 
							
						 
					 
					
						
						
							
						
						d91f3f0c55 
					 
					
						
						
							
							readme : fix the description of the Tail free sampling (TFS) method ( #2431 )  
						
						 
						
						
						
						
							
						
					 
					
						2023-07-28 11:44:43 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Rand Xie 
							
						 
					 
					
						
						
							
						
						65cdf34bdc 
					 
					
						
						
							
							llama : use n_embd_gqa instead of n_embd to handle llama-2 70B ( #2433 )  
						
						 
						
						
						
						
							
						
					 
					
						2023-07-28 11:42:53 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								niansa/tuxifan 
							
						 
					 
					
						
						
							
						
						edcc7ae7d2 
					 
					
						
						
							
							Obtaining LLaMA 2 instructions ( #2308 )  
						
						 
						
						... 
						
						
						
						* Obtaining LLaMA 2 instructions
* Removed sharing warning for LLaMA 2
* Linked TheBloke's GGML repos
* Add LLaMA 2 to list of supported models
* Added LLaMA 2 usage instructions
* Added links to LLaMA 2 70B models 
						
						
							
						
					 
					
						2023-07-28 03:14:11 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								mj-shifu 
							
						 
					 
					
						
						
							
						
						7c529cede6 
					 
					
						
						
							
							convert.py : Update to support 70B HF format model files ( #2427 )  
						
						 
						
						... 
						
						
						
						* convert.py : fix llama 2 70b conversion from Huggingface 
						
						
							
						
					 
					
						2023-07-27 14:39:17 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						1a941869cb 
					 
					
						
						
							
							metal : disable graph concurrency optimization due to bug ( #2413 )  
						
						 
						
						
						
						
							
  master-1a94186
 
						
					 
					
						2023-07-27 11:00:54 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								slaren 
							
						 
					 
					
						
						
							
						
						b5472ea0ad 
					 
					
						
						
							
							ggml : fix assert in ggml_set_unary_op ( #2410 )  
						
						 
						
						
						
						
							
  master-b5472ea
 
						
					 
					
						2023-07-26 23:57:23 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Cebtenzzre 
							
						 
					 
					
						
						
							
						
						6df1f5940f 
					 
					
						
						
							
							make : build with -Wmissing-prototypes ( #2394 )  
						
						 
						
						
						
						
							
  master-6df1f59
 
						
					 
					
						2023-07-26 21:00:04 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								slaren 
							
						 
					 
					
						
						
							
						
						5488fb789e 
					 
					
						
						
							
							ggml : allocate graphs in a context ( #2392 )  
						
						 
						
						... 
						
						
						
						* ggml : graph allocation in contexts
* allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx
* llama.cpp : allocate graph in the context
* add GGML_PAD
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
  master-5488fb7
 
						
					 
					
						2023-07-26 15:56:53 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Kawrakow 
							
						 
					 
					
						
						
							
						
						eb542d3932 
					 
					
						
						
							
							Add LLAMA_DEFAULT_RMS_EPS so we can change the default ( #2384 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
  master-eb542d3
 
						
					 
					
						2023-07-25 18:35:53 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								slaren 
							
						 
					 
					
						
						
							
						
						07aaa0f63f 
					 
					
						
						
							
							ggml : fix ggml_flash_attn to use op_params ( #2387 )  
						
						 
						
						... 
						
						
						
						* ggml : fix ggml_flash_attn to use op_params 
						
						
							
  master-07aaa0f
 
						
					 
					
						2023-07-25 16:20:12 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ldwang 
							
						 
					 
					
						
						
							
						
						fce48caf9a 
					 
					
						
						
							
							convert.py : support bpe tokenizer ( #2228 )  
						
						 
						
						... 
						
						
						
						* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com >
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com >
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <ftgreat@gmail.com >
---------
Signed-off-by: ldwang <ftgreat@gmail.com >
Co-authored-by: ldwang <ftgreat@gmail.com > 
						
						
							
						
					 
					
						2023-07-25 16:22:09 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jiahao Li 
							
						 
					 
					
						
						
							
						
						875086bdb9 
					 
					
						
						
							
							ggml : relax contiguous constraints in activation function ( #2371 )  
						
						 
						
						
						
						
							
  master-875086b
 
						
					 
					
						2023-07-25 15:58:32 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								slaren 
							
						 
					 
					
						
						
							
						
						da1889834a 
					 
					
						
						
							
							ggml : improve graph build time via hash table lookup ( #2329 )  
						
						 
						
						... 
						
						
						
						* improve graph build time
* ggml_tensor : use 1 bit per flag
* use a hash table instead 
						
						
							
  master-da18898
 
						
					 
					
						2023-07-25 15:32:20 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Hesen Peng 
							
						 
					 
					
						
						
							
						
						82552b7f54 
					 
					
						
						
							
							build : fix line breaking error in build-info.sh ( #2349 )  
						
						 
						
						... 
						
						
						
						* fix line breaking
* build number line break removal 
						
						
							
						
					 
					
						2023-07-25 15:24:09 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xiao-Yong Jin 
							
						 
					 
					
						
						
							
						
						0c06204fb3 
					 
					
						
						
							
							main : add --in-prefix-bos to prefix BOS to user inputs; keep EOS ( #2304 )  
						
						 
						
						... 
						
						
						
						* add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS
The BOS precedes the string specified by `--in-prefix`.
Model generated EOS is now kept in the context.
It provides a way to strictly following the prompt format used in
Llama-2-chat.
The EOS handling also benefits some existing finetunes that uses
EOS to mark the end of turn.
* examples/common: move input_prefix_bos to other bools 
						
						
							
  master-0c06204
 
						
					 
					
						2023-07-25 15:19:11 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Eve 
							
						 
					 
					
						
						
							
						
						1fed755b1f 
					 
					
						
						
							
							ci : add non-AVX scalar build/test ( #2356 )  
						
						 
						
						... 
						
						
						
						* noavx build and test
* we don't need to remove f16c in windows 
						
						
							
  master-1fed755
 
						
					 
					
						2023-07-25 15:16:13 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								katsu560 
							
						 
					 
					
						
						
							
						
						be2301bcda 
					 
					
						
						
							
							k_quants : add AVX support to dot functions with QK_K as 64 ( #2339 )  
						
						 
						
						... 
						
						
						
						* add AVX to ggml_vec_dot_q2_K_q8_K()
* add AVX to ggml_vec_dot_q3_K_q8_K()
* add AVX to ggml_vec_dot_q4_K_q8_K()
* add AVX to ggml_vec_dot_q5_K_q8_K()
* add AVX to ggml_vec_dot_q6_K_q8_K()
* refactor AVX code in ggml_vec_dot_q6_K_q8_K() 
						
						
							
  master-be2301b
 
						
					 
					
						2023-07-25 15:13:41 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Shouzheng Liu 
							
						 
					 
					
						
						
							
						
						1aa18ef994 
					 
					
						
						
							
							metal : concurrently dispatch commands ( #2358 )  
						
						 
						
						... 
						
						
						
						* metal: concurrently dispatch commands
Function `ggml_metal_graph_find_concurrency` will run and write
commands that can be issued concurrently to metal context `concur_list`
array, when `ggml_metal_graph_compute` is called for the first time.
* metal: don't call find_concurrency automatically.
* metal : code style changes
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
  master-1aa18ef
 
						
					 
					
						2023-07-25 15:00:19 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Kawrakow 
							
						 
					 
					
						
						
							
						
						9a08eaf3c4 
					 
					
						
						
							
							Another speed gain for Q4_0 and Q4_1 on Metal ( #2375 )  
						
						 
						
						... 
						
						
						
						* Another speed gain for Q4_0 and Q4_1 on Metal
* Have N_DST, etc., be template parameters
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
						
					 
					
						2023-07-25 13:48:29 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Kawrakow 
							
						 
					 
					
						
						
							
						
						129d844c87 
					 
					
						
						
							
							Fix Q4_K and Q5_K for QK_K = 64 on CUDA ( #2359 )  
						
						 
						
						... 
						
						
						
						* Fix Q4_K and Q5_K for QK_K = 64
* Very slightly better Q5_K bit fiddling
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
  master-129d844
 
						
					 
					
						2023-07-25 13:48:04 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								slaren 
							
						 
					 
					
						
						
							
						
						d5512b782b 
					 
					
						
						
							
							server: add rms_norm_eps parameter ( #2380 )  
						
						 
						
						
						
						
							
  master-d5512b7
 
						
					 
					
						2023-07-25 12:36:17 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Henri Vasserman 
							
						 
					 
					
						
						
							
						
						c798308e3a 
					 
					
						
						
							
							[Server] Escape HTML in webchat ( #2368 )  
						
						 
						
						... 
						
						
						
						* escape HTML in webchat
* add amp 
						
						
							
  master-c798308
 
						
					 
					
						2023-07-25 10:27:34 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								slaren 
							
						 
					 
					
						
						
							
						
						41c674161f 
					 
					
						
						
							
							make rms_norm_eps a parameter ( #2374 )  
						
						 
						
						... 
						
						
						
						* make rms_norm_eps a parameter
* add rms_norm_eps to command line
* fix baby llama, test-grad0
* use scientific notation for eps param in the help
ggml-ci 
						
						
							
  master-41c6741
 
						
					 
					
						2023-07-24 17:57:12 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Aarni Koskela 
							
						 
					 
					
						
						
							
						
						b3f138d058 
					 
					
						
						
							
							Chat UI extras ( #2366 )  
						
						 
						
						... 
						
						
						
						* makefile: correct deps for server
* server: tighten settings layout a little
* server: expose all currently configured generation params in UI
* server: expose remaining generation params, for the adventurous
* server: embetter mirostat fields 
						
						
							
  master-b3f138d
 
						
					 
					
						2023-07-24 17:54:22 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5b2b2dc6ae 
					 
					
						
						
							
							ggml : sync (unary ops refactor, static-correctness) ( #2370 )  
						
						 
						
						... 
						
						
						
						* ggml : sync (unary ops, tests)
ggml-ci
* tests : remove unnecessary funcs 
						
						
							
  master-5b2b2dc
 
						
					 
					
						2023-07-24 14:46:21 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Kawrakow 
							
						 
					 
					
						
						
							
						
						42f70cb2f6 
					 
					
						
						
							
							Fix scalar version of Q5_K when QK_K = 64 ( #2362 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
  master-42f70cb
 
						
					 
					
						2023-07-24 12:55:02 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Evan Jones 
							
						 
					 
					
						
						
							
						
						84e09a7d8b 
					 
					
						
						
							
							llama : add grammar-based sampling ( #1773 )  
						
						 
						
						... 
						
						
						
						* llama, main : constrain sampling to grammar
* allow loading grammar from file
* fix whitespace errors
* handle & print parser errors
* add comments to grammar syntax and allow newlines where unambiguous
* add missing include
* support alternates in root rule
* fix bugs with empty token and EOS
* adjust JSON grammar
* remove swp file
* rewrite ternary expressions
Co-authored-by: Henri Vasserman <henv@hot.ee >
* use struct for grammar elements and add Unicode support
* add unicode escapes
* add inverse char ranges
* only sample full tokens (no peeking or truncation)
* llama : minor style changes
blindly applied in online editor - hopefully I didn't break something
* update help text
* add warning message if EOS is disabled
---------
Co-authored-by: Henri Vasserman <henv@hot.ee >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
  master-84e09a7
 
						
					 
					
						2023-07-23 23:58:10 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Kawrakow 
							
						 
					 
					
						
						
							
						
						2f9cf974a0 
					 
					
						
						
							
							Some more Q4_K and Q5_K speedup on CUDA ( #2346 )  
						
						 
						
						... 
						
						
						
						* Faster Q5_K on CUDA
* Small Q5_K improvement on older GPUs
* Spped up Q4_K on CUDA
GTX1660: 29.5 ms/t -> 25.6 ms/t
RTX4080: 8.40 ms/t -> 8.25 ms/t
* Spped up Q4_K on CUDA
GTX1660: 36.7 ms/t -> 35.6 ms/t
RTX4080:  9.8 ms/t ->  9.5 ms/t
* Address PR comments
* Add some comments to satisfy PR reviewer
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
  master-2f9cf97
 
						
					 
					
						2023-07-24 00:19:47 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								IgnacioFDM 
							
						 
					 
					
						
						
							
						
						4f06592cc6 
					 
					
						
						
							
							Add gqa parameter support to the server ( #2351 )  
						
						 
						
						... 
						
						
						
						* Add gqa parameter support to the server
* Change help from stderr to stdout 
						
						
							
  master-4f06592
 
						
					 
					
						2023-07-23 23:31:17 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						70d26ac388 
					 
					
						
						
							
							Fix __dp4a documentation ( #2348 )  
						
						 
						
						
						
						
							
						
					 
					
						2023-07-23 17:49:06 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								wzy 
							
						 
					 
					
						
						
							
						
						57921ca6db 
					 
					
						
						
							
							common : n_threads == -1 uses std::thread::hardware_concurrency() ( #2347 )  
						
						 
						
						... 
						
						
						
						* Fix  #2345 , fix incorrect n_threads
* Update examples/common.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
  master-57921ca
 
						
					 
					
						2023-07-23 16:33:02 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								slaren 
							
						 
					 
					
						
						
							
						
						3602ac4255 
					 
					
						
						
							
							fix n_tasks ( #2342 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
							
  master-3602ac4
 
						
					 
					
						2023-07-23 15:19:39 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								slaren 
							
						 
					 
					
						
						
							
						
						95a6c595e7 
					 
					
						
						
							
							ggml: move op parameters from tensors to ggml_tensor::op_params ( #2333 )  
						
						 
						
						... 
						
						
						
						* ggml: move op parameters from tensors to ggml_tensor::op_params
* alibi: use memcpy for float params
* remove `src[1] = NULL` in ops 
						
						
							
  master-95a6c59
 
						
					 
					
						2023-07-23 14:36:02 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e76d630df1 
					 
					
						
						
							
							llama : grouped-query attention + LLaMAv2 70B support ( #2276 )  
						
						 
						
						... 
						
						
						
						* CUDA: GQA implementation
* llama : support for GQA and LLaMAv2 70B
ggml-ci
* py : fix hparams parsing (if-else blocks)
ggml-ci
* py : oh boy ..
ggml-ci
* help : fix gqa value for 70B
ggml-ci
---------
Co-authored-by: JohannesGaessler <johannesg@5d6.de > 
						
						
							
  master-e76d630
 
						
					 
					
						2023-07-23 15:09:47 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								maddes8cht 
							
						 
					 
					
						
						
							
						
						1d0824b247 
					 
					
						
						
							
							llama : print help to stdout ( #2338 )  
						
						 
						
						
						
						
							
  master-1d0824b
 
						
					 
					
						2023-07-23 14:59:48 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								wzy 
							
						 
					 
					
						
						
							
						
						bc3ec2cdc9 
					 
					
						
						
							
							flake : support nix build '.#opencl' ( #2337 )  
						
						 
						
						
						
						
							
						
					 
					
						2023-07-23 14:57:02 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Christian Demsar 
							
						 
					 
					
						
						
							
						
						a940458e48 
					 
					
						
						
							
							llama : print max tensor size to stderr ( #2336 )  
						
						 
						
						
						
						
							
  master-a940458
 
						
					 
					
						2023-07-23 14:56:34 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jose Maldonado 
							
						 
					 
					
						
						
							
						
						91171b8072 
					 
					
						
						
							
							make : fix CLBLAST compile support in FreeBSD ( #2331 )  
						
						 
						
						... 
						
						
						
						* Fix Makefile for CLBLAST compile support and instructions for compile llama.cpp FreeBSD
* More general use-case for CLBLAST support (Linux and FreeBSD) 
						
						
							
  master-91171b8
 
						
					 
					
						2023-07-23 14:52:08 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								AustinMroz 
							
						 
					 
					
						
						
							
						
						355c80f49e 
					 
					
						
						
							
							examples : simplify vim plugin ( #2327 )  
						
						 
						
						... 
						
						
						
						Uses builtin json_encode and json_decode functions to simplify escaping
Removes the need for temp files 
						
						
							
						
					 
					
						2023-07-23 14:16:48 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jiahao Li 
							
						 
					 
					
						
						
							
						
						83a00ce69b 
					 
					
						
						
							
							metal : support bcast add & dup & cont op ( #2323 )  
						
						 
						
						
						
						
							
						
					 
					
						2023-07-23 14:00:37 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Kawrakow 
							
						 
					 
					
						
						
							
						
						d2a43664f9 
					 
					
						
						
							
							Speed up Q4_K ( #2322 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com > 
						
						
							
  master-d2a4366
 
						
					 
					
						2023-07-23 08:49:20 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						b9b7d94fc1 
					 
					
						
						
							
							CUDA: Fixed 7b q3_K_S with mul_mat_vec_q ( #2313 )  
						
						 
						
						
						
						
							
  master-b9b7d94
 
						
					 
					
						2023-07-22 21:27:34 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						b47b8a9cfe 
					 
					
						
						
							
							llama : optimize memory buffers ( #2325 )  
						
						 
						
						
						
						
							
  master-b47b8a9
 
						
					 
					
						2023-07-22 21:17:57 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								klosax 
							
						 
					 
					
						
						
							
						
						b5fe67f8c6 
					 
					
						
						
							
							Perplexity: Compute scores correlated to HellaSwag ( #2312 )  
						
						 
						
						... 
						
						
						
						* Add parameter --perplexity-lines to perplexity.cpp 
						
						
							
  master-b5fe67f
 
						
					 
					
						2023-07-22 14:21:24 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								whoreson 
							
						 
					 
					
						
						
							
						
						24baa54ac1 
					 
					
						
						
							
							examples : basic VIM plugin  
						
						 
						
						... 
						
						
						
						VIM plugin for server exe 
						
						
							
						
					 
					
						2023-07-22 13:34:51 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						dd6c67d3cb 
					 
					
						
						
							
							ci : fix args  
						
						 
						
						
						
						
							
						
					 
					
						2023-07-22 12:00:56 +03:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5d500e8ccf 
					 
					
						
						
							
							ci : add 7B CUDA tests ( #2319 )  
						
						 
						
						... 
						
						
						
						* ci : add 7B CUDA tests
ggml-ci
* ci : add Q2_K to the tests
* ci : bump CUDA ppl chunks
ggml-ci
* ci : increase CUDA TG len + add --ignore-eos
* ci : reduce CUDA ppl cunks down to 4 to save time 
						
						
							
						
					 
					
						2023-07-22 11:48:22 +03:00