Kyle Mistele 
							
						 
					 
					
						
						
							
						
						39baaf55a1 
					 
					
						
						
							
							docker : add server-first container images ( #5157 )  
						
						... 
						
						
						
						* feat: add Dockerfiles for each platform that user ./server instead of ./main
* feat: update .github/workflows/docker.yml to build server-first docker containers
* doc: add information about running the server with Docker to README.md
* doc: add information about running with docker to the server README
* doc: update n-gpu-layers to show correct GPU usage
* fix(doc): update container tag from `server` to `server-cuda` for README example on running server container with CUDA 
						
						
					 
					
						2024-01-28 09:55:31 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						aad0b01d73 
					 
					
						
						
							
							readme : update hot topics  
						
						
						
						
					 
					
						2024-01-26 10:52:33 +02:00 
						 
				 
			
				
					
						
							
							
								XiaotaoChen 
							
						 
					 
					
						
						
							
						
						fe54033b69 
					 
					
						
						
							
							readme : add MobileVLM 1.7B/3B to the supported models list ( #5107 )  
						
						... 
						
						
						
						Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com > 
						
						
					 
					
						2024-01-25 22:14:32 +02:00 
						 
				 
			
				
					
						
							
							
								adel boussaken 
							
						 
					 
					
						
						
							
						
						48e2b13372 
					 
					
						
						
							
							Add a dart/flutter binding to README.md ( #4882 )  
						
						
						
						
					 
					
						2024-01-20 03:05:43 -05:00 
						 
				 
			
				
					
						
							
							
								iohub 
							
						 
					 
					
						
						
							
						
						18adb4e9bb 
					 
					
						
						
							
							readme : add 3rd party collama reference to UI list ( #4840 )  
						
						... 
						
						
						
						Add a VSCode extension for llama.cpp reference to UI list 
						
						
					 
					
						2024-01-09 18:45:54 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						a9a8c5de3d 
					 
					
						
						
							
							readme : add link to SOTA models  
						
						
						
						
					 
					
						2024-01-08 20:25:17 +02:00 
						 
				 
			
				
					
						
							
							
								Lars Grammel 
							
						 
					 
					
						
						
							
						
						b7e7982953 
					 
					
						
						
							
							readme : add lgrammel/modelfusion JS/TS client for llama.cpp ( #4814 )  
						
						
						
						
					 
					
						2024-01-07 22:24:11 +02:00 
						 
				 
			
				
					
						
							
							
								automaticcat 
							
						 
					 
					
						
						
							
						
						24a447e20a 
					 
					
						
						
							
							ggml : add ggml_cpu_has_avx_vnni() ( #4589 )  
						
						... 
						
						
						
						* feat: add avx_vnni based on intel documents
* ggml: add avx vnni based on intel document
* llama: add avx vnni information display
* docs: add more details about using oneMKL and oneAPI for intel processors
* docs: add more details about using oneMKL and oneAPI for intel processors
* docs: add more details about using oneMKL and oneAPI for intel processors
* docs: add more details about using oneMKL and oneAPI for intel processors
* docs: add more details about using oneMKL and oneAPI for intel processors
* Update ggml.c
Fix indentation upgate
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2023-12-30 10:07:48 +02:00 
						 
				 
			
				
					
						
							
							
								manikbhandari 
							
						 
					 
					
						
						
							
						
						ea5497df5d 
					 
					
						
						
							
							gpt2 : Add gpt2 architecture integration ( #4555 )  
						
						
						
						
					 
					
						2023-12-28 15:03:57 +01:00 
						 
				 
			
				
					
						
							
							
								Paul Tsochantaris 
							
						 
					 
					
						
						
							
						
						a206137f92 
					 
					
						
						
							
							Adding Emeltal reference to UI list ( #4629 )  
						
						
						
						
					 
					
						2023-12-25 18:09:53 +02:00 
						 
				 
			
				
					
						
							
							
								Shintarou Okada 
							
						 
					 
					
						
						
							
						
						753be377b6 
					 
					
						
						
							
							llama : add PLaMo model ( #3557 )  
						
						... 
						
						
						
						* add plamo mock
* add tensor loading
* plamo convert
* update norm
* able to compile
* fix norm_rms_eps hparam
* runnable
* use inp_pos
* seems ok
* update kqv code
* remove develop code
* update README
* shuffle attn_q.weight and attn_output.weight for broadcasting
* remove plamo_llm_build_kqv and use llm_build_kqv
* fix style
* update
* llama : remove obsolete KQ_scale
* plamo : fix tensor names for correct GPU offload
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2023-12-24 15:35:49 +02:00 
						 
				 
			
				
					
						
							
							
								FantasyGmm 
							
						 
					 
					
						
						
							
						
						a55876955b 
					 
					
						
						
							
							cuda : fix jetson compile error ( #4560 )  
						
						... 
						
						
						
						* fix old jetson compile error
* Update Makefile
* update jetson detect and cuda version detect
* update cuda marco define
* update makefile and cuda,fix some issue
* Update README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update Makefile
* Update README.md
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2023-12-22 17:11:12 +02:00 
						 
				 
			
				
					
						
							
							
								Michael Kesper 
							
						 
					 
					
						
						
							
						
						28cb35a0ec 
					 
					
						
						
							
							make : add LLAMA_HIP_UMA option ( #4587 )  
						
						... 
						
						
						
						NB: LLAMA_HIP_UMA=1 (or any value) adds MK_CPPFLAG -DGGML_HIP_UMA 
						
						
					 
					
						2023-12-22 10:03:25 +02:00 
						 
				 
			
				
					
						
							
							
								Deins 
							
						 
					 
					
						
						
							
						
						2bb98279c5 
					 
					
						
						
							
							readme : add zig bindings ( #4581 )  
						
						
						
						
					 
					
						2023-12-22 08:49:54 +02:00 
						 
				 
			
				
					
						
							
							
								Erik Garrison 
							
						 
					 
					
						
						
							
						
						0f630fbc92 
					 
					
						
						
							
							cuda : ROCm AMD Unified Memory Architecture (UMA) handling ( #4449 )  
						
						... 
						
						
						
						* AMD ROCm: handle UMA memory VRAM expansions
This resolves  #2797  by allowing ROCm AMD GPU users with a UMA to
dynamically expand the VRAM allocated to the GPU.
Without this, AMD ROCm users with shared CPU/GPU memory usually are
stuck with the BIOS-set (or fixed) framebuffer VRAM, making it
impossible to load more than 1-2 layers.
Note that the model is duplicated in RAM because it's loaded once for
the CPU and then copied into a second set of allocations that are
managed by the HIP UMA system. We can fix this later.
* clarify build process for ROCm on linux with cmake
* avoid using deprecated ROCm hipMallocHost
* keep simplifying the change required for UMA
* cmake: enable UMA-compatible allocation when LLAMA_HIP_UMA=ON 
						
						
					 
					
						2023-12-21 21:45:32 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						c083718c89 
					 
					
						
						
							
							readme : update coding guidelines  
						
						
						
						
					 
					
						2023-12-21 19:27:14 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						b1306c4394 
					 
					
						
						
							
							readme : update hot topics  
						
						
						
						
					 
					
						2023-12-17 20:16:23 +02:00 
						 
				 
			
				
					
						
							
							
								BarfingLemurs 
							
						 
					 
					
						
						
							
						
						0353a18401 
					 
					
						
						
							
							readme : update supported model list ( #4457 )  
						
						
						
						
					 
					
						2023-12-14 09:38:49 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						113f9942fc 
					 
					
						
						
							
							readme : update hot topics  
						
						
						
						
					 
					
						2023-12-13 14:05:38 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						bcc0eb4591 
					 
					
						
						
							
							llama : per-layer KV cache + quantum K cache ( #4309 )  
						
						... 
						
						
						
						* per-layer KV
* remove unnecessary copies
* less code duplication, offload k and v separately
* llama : offload KV cache per-layer
* llama : offload K shift tensors
* llama : offload for rest of the model arches
* llama : enable offload debug temporarily
* llama : keep the KV related layers on the device
* llama : remove mirrors, perform Device -> Host when partial offload
* common : add command-line arg to disable KV cache offloading
* llama : update session save/load
* llama : support quantum K cache (#4312 )
* llama : support quantum K cache (wip)
* metal : add F32 -> Q8_0 copy kernel
* cuda : add F32 -> Q8_0 copy kernel
ggml-ci
* cuda : use mmv kernel for quantum cache ops
* llama : pass KV cache type through API
* llama : fix build
ggml-ci
* metal : add F32 -> Q4_0 copy kernel
* metal : add F32 -> Q4_1 copy kernel
* cuda : wip
* cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels
* llama-bench : support type_k/type_v
* metal : use mm kernel only for quantum KV cache
* cuda : add comment
* llama : remove memory_f16 and kv_f16 flags
---------
Co-authored-by: slaren <slarengh@gmail.com >
* readme : add API change notice
---------
Co-authored-by: slaren <slarengh@gmail.com > 
						
						
					 
					
						2023-12-07 13:03:17 +02:00 
						 
				 
			
				
					
						
							
							
								vodkaslime 
							
						 
					 
					
						
						
							
						
						524907aa76 
					 
					
						
						
							
							readme : fix ( #4135 )  
						
						... 
						
						
						
						* fix: readme
* chore: resolve comments
* chore: resolve comments 
						
						
					 
					
						2023-11-30 23:49:21 +02:00 
						 
				 
			
				
					
						
							
							
								Dawid Wysocki 
							
						 
					 
					
						
						
							
						
						74daabae69 
					 
					
						
						
							
							readme : fix typo ( #4253 )  
						
						... 
						
						
						
						llama.cpp uses GitHub Actions, not Gitlab Actions. 
						
						
					 
					
						2023-11-30 23:43:32 +02:00 
						 
				 
			
				
					
						
							
							
								Peter Sugihara 
							
						 
					 
					
						
						
							
						
						4fea3420ee 
					 
					
						
						
							
							readme : add FreeChat ( #4248 )  
						
						
						
						
					 
					
						2023-11-29 09:16:34 +02:00 
						 
				 
			
				
					
						
							
							
								Kasumi 
							
						 
					 
					
						
						
							
						
						0dab8cd7cc 
					 
					
						
						
							
							readme : add Amica to UI list ( #4230 )  
						
						
						
						
					 
					
						2023-11-27 19:39:42 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						9656026b53 
					 
					
						
						
							
							readme : update hot topics  
						
						
						
						
					 
					
						2023-11-26 20:42:51 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						04814e718e 
					 
					
						
						
							
							readme : update hot topics  
						
						
						
						
					 
					
						2023-11-25 12:02:13 +02:00 
						 
				 
			
				
					
						
							
							
								Aaryaman Vasishta 
							
						 
					 
					
						
						
							
						
						b35f3d0def 
					 
					
						
						
							
							readme : use PATH for Windows ROCm ( #4195 )  
						
						... 
						
						
						
						* Update README.md to use PATH for Windows ROCm
* Update README.md
* Update README.md 
						
						
					 
					
						2023-11-24 09:52:39 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d103d935c0 
					 
					
						
						
							
							readme : update hot topics  
						
						
						
						
					 
					
						2023-11-23 13:51:22 +02:00 
						 
				 
			
				
					
						
							
							
								Aaryaman Vasishta 
							
						 
					 
					
						
						
							
						
						dfc7cd48b1 
					 
					
						
						
							
							readme : update ROCm Windows instructions ( #4122 )  
						
						... 
						
						
						
						* Update README.md
* Update README.md
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com >
---------
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com > 
						
						
					 
					
						2023-11-20 17:02:46 +02:00 
						 
				 
			
				
					
						
							
							
								Galunid 
							
						 
					 
					
						
						
							
						
						36eed0c42c 
					 
					
						
						
							
							stablelm : StableLM support ( #3586 )  
						
						... 
						
						
						
						* Add support for stablelm-3b-4e1t
* Supports GPU offloading of (n-1) layers 
						
						
					 
					
						2023-11-14 11:17:12 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						c049b37d7b 
					 
					
						
						
							
							readme : update hot topics  
						
						
						
						
					 
					
						2023-11-13 14:18:08 +02:00 
						 
				 
			
				
					
						
							
							
								Richard Kiss 
							
						 
					 
					
						
						
							
						
						532dd74e38 
					 
					
						
						
							
							Fix some documentation typos/grammar mistakes ( #4032 )  
						
						... 
						
						
						
						* typos
* Update examples/parallel/README.md
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com >
---------
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com > 
						
						
					 
					
						2023-11-11 23:04:58 -07:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						224e7d5b14 
					 
					
						
						
							
							readme : add notice about  #3912  
						
						
						
						
					 
					
						2023-11-02 20:44:12 +02:00 
						 
				 
			
				
					
						
							
							
								Ian Scrivener 
							
						 
					 
					
						
						
							
						
						5a42a5f8e8 
					 
					
						
						
							
							readme : remove unsupported node.js library ( #3703 )  
						
						... 
						
						
						
						- https://github.com/Atome-FE/llama-node  is quite out of date
- doesn't support recent/current llama.cpp functionality 
						
						
					 
					
						2023-10-22 21:16:43 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d1031cf49c 
					 
					
						
						
							
							sampling : refactor init to use llama_sampling_params ( #3696 )  
						
						... 
						
						
						
						* sampling : refactor init to use llama_sampling_params
* llama : combine repetition, frequency and presence penalties in 1 call
* examples : remove embd-input and gptneox-wip
* sampling : rename penalty params + reduce size of "prev" vector
* sampling : add llama_sampling_print helper
* sampling : hide prev behind API and apply #3661 
ggml-ci 
						
						
					 
					
						2023-10-20 21:07:23 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						004797f6ac 
					 
					
						
						
							
							readme : update hot topics  
						
						
						
						
					 
					
						2023-10-18 21:44:43 +03:00 
						 
				 
			
				
					
						
							
							
								BarfingLemurs 
							
						 
					 
					
						
						
							
						
						8402566a7c 
					 
					
						
						
							
							readme : update hot-topics & models, detail windows release in usage ( #3615 )  
						
						... 
						
						
						
						* Update README.md
* Update README.md
* Update README.md
* move "Running on Windows" section below "Prepare data and run"
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2023-10-17 21:13:21 +03:00 
						 
				 
			
				
					
						
							
							
								ldwang 
							
						 
					 
					
						
						
							
						
						5fe268a4d9 
					 
					
						
						
							
							readme : add Aquila2 links ( #3610 )  
						
						... 
						
						
						
						Signed-off-by: ldwang <ftgreat@gmail.com >
Co-authored-by: ldwang <ftgreat@gmail.com > 
						
						
					 
					
						2023-10-17 18:52:33 +03:00 
						 
				 
			
				
					
						
							
							
								Ian Scrivener 
							
						 
					 
					
						
						
							
						
						f3040beaab 
					 
					
						
						
							
							typo : it is --n-gpu-layers not --gpu-layers ( #3592 )  
						
						... 
						
						
						
						fixed a typo in the MacOS Metal run doco 
						
						
					 
					
						2023-10-12 14:10:50 +03:00 
						 
				 
			
				
					
						
							
							
								Galunid 
							
						 
					 
					
						
						
							
						
						9f6ede19f3 
					 
					
						
						
							
							Add MPT model to supported models in README.md ( #3574 )  
						
						
						
						
					 
					
						2023-10-10 19:02:49 -04:00 
						 
				 
			
				
					
						
							
							
								Xingchen Song(宋星辰) 
							
						 
					 
					
						
						
							
						
						c5b49360d0 
					 
					
						
						
							
							readme : add bloom ( #3570 )  
						
						
						
						
					 
					
						2023-10-10 19:28:50 +03:00 
						 
				 
			
				
					
						
							
							
								BarfingLemurs 
							
						 
					 
					
						
						
							
						
						1faaae8c2b 
					 
					
						
						
							
							readme : update models, cuda + ppl instructions ( #3510 )  
						
						
						
						
					 
					
						2023-10-06 22:13:36 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						beabc8cfb0 
					 
					
						
						
							
							readme : add project status link  
						
						
						
						
					 
					
						2023-10-04 16:50:44 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						40e07a60f9 
					 
					
						
						
							
							llama.cpp : add documentation about rope_freq_base and scale values ( #3401 )  
						
						... 
						
						
						
						* llama.cpp : add documentation about rope_freq_base and scale values
* add notice to hot topics 
						
						
					 
					
						2023-09-29 18:42:32 +02:00 
						 
				 
			
				
					
						
							
							
								BarfingLemurs 
							
						 
					 
					
						
						
							
						
						0a4a4a0982 
					 
					
						
						
							
							readme : update hot topics + model links ( #3399 )  
						
						
						
						
					 
					
						2023-09-29 15:50:35 +03:00 
						 
				 
			
				
					
						
							
							
								Andrew Duffy 
							
						 
					 
					
						
						
							
						
						569550df20 
					 
					
						
						
							
							readme : add link to grammars app ( #3388 )  
						
						... 
						
						
						
						* Add link to grammars app per @ggernagov suggestion
Adding a sentence in the Grammars section of README to point to grammar app, per https://github.com/ggerganov/llama.cpp/discussions/2494#discussioncomment-7138211 
* Update README.md 
						
						
					 
					
						2023-09-29 14:15:57 +03:00 
						 
				 
			
				
					
						
							
							
								Pierre Alexandre SCHEMBRI 
							
						 
					 
					
						
						
							
						
						4aea3b846e 
					 
					
						
						
							
							readme : add Mistral AI release 0.1 ( #3362 )  
						
						
						
						
					 
					
						2023-09-28 15:13:37 +03:00 
						 
				 
			
				
					
						
							
							
								BarfingLemurs 
							
						 
					 
					
						
						
							
						
						ffe88a36a9 
					 
					
						
						
							
							readme : add some recent perplexity and bpw measurements to READMES, link for k-quants ( #3340 )  
						
						... 
						
						
						
						* Update README.md
* Update README.md
* Update README.md with k-quants bpw measurements 
						
						
					 
					
						2023-09-27 18:30:36 +03:00 
						 
				 
			
				
					
						
							
							
								2f38b454 
							
						 
					 
					
						
						
							
						
						1726f9626f 
					 
					
						
						
							
							docs: Fix typo CLBlast_DIR var. ( #3330 )  
						
						
						
						
					 
					
						2023-09-25 20:24:52 +02:00 
						 
				 
			
				
					
						
							
							
								Lee Drake 
							
						 
					 
					
						
						
							
						
						bc9d3e3971 
					 
					
						
						
							
							Update README.md ( #3289 )  
						
						... 
						
						
						
						* Update README.md
* Update README.md
Co-authored-by: slaren <slarengh@gmail.com >
---------
Co-authored-by: slaren <slarengh@gmail.com > 
						
						
					 
					
						2023-09-21 21:00:24 +02:00