llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-12 10:47:01 +00:00

Author	SHA1	Message	Date
Oleksandr Kuvshynov	e4376270d9	llama.cpp: fix warning message (#11839 ) There was a typo-like error, which would print the same number twice if request is received with n_predict > server-side config. Before the fix: ``` slot launch_slot_: id 0 \| task 0 \| n_predict = 4096 exceeds server configuration, setting to 4096 ``` After the fix: ``` slot launch_slot_: id 0 \| task 0 \| n_predict = 8192 exceeds server configuration, setting to 4096 ``` b4704	2025-02-13 08:25:34 +02:00
Daniel Bevenius	3e69319772	llama : update llama_decode_internal ref [no ci] (#11840 ) This commit updates the comment in llama_kv_cache.h to reflect the change of the function name from llama_decode_internal to llama_decode_impl.	2025-02-13 08:07:51 +02:00
Diego Devesa	a394039db0	ggml-cpu : add chunking support to mul_mat_id (#11666 ) * ggml-cpu : add chunking support to mul_mat_id * allocate chunk counter in wdata parallelize src1 quantization by column to allows parallelization even when there is only one row * disable for arm * cleanup * better way to disable for arm * fix uninitialized counter when using 1 thread only * revert test-backend-ops changes b4702	2025-02-13 01:02:38 +01:00
Xuan-Son Nguyen	be3bbd6215	ggml : x2 speed for WASM by optimizing SIMD (#11453 ) * ggml : x2 speed for WASM by optimizing SIMD * fix bad merging * rm trailing spaces * rm redundant clamp * better quantize_row_q8_K Co-authored-by: camel-cdr <camel-cdr@protonmail.com> * remove memset that causes buffer overflow Co-authored-by: camel-cdr <camel-cdr@protonmail.com> --------- Co-authored-by: camel-cdr <camel-cdr@protonmail.com>	2025-02-13 00:33:45 +01:00
Woof Dog	31afcbee0e	server : (webui) Give copy button back to all message bubbles (#11814 ) * All messages get the copy button * Update index.html.gz	2025-02-12 23:47:11 +01:00
uvos	5c4284d57b	HIP: Remove GCN from list of devices that avoid MMQ (#11831 ) b4699	2025-02-12 22:25:28 +01:00
JC	bfd11a2344	Fix: Compile failure due to Microsoft STL breaking change (#11836 ) b4698	2025-02-12 21:36:11 +01:00
Georgi Gerganov	0fb77f821f	sync : ggml	2025-02-12 21:46:02 +02:00
uvos	e598697d63	HIP: Switch to std::vector in rocblas version check (#11820 ) b4696	2025-02-12 17:25:03 +01:00
Georgi Gerganov	fbe6a07256	context : rename to llama_context_kv_self	2025-02-12 17:16:44 +02:00
Georgi Gerganov	6ee86e5e0f	graph : restore ubatch in build_cb ggml-ci	2025-02-12 16:29:15 +02:00
bandoti	fef0cbeadf	cleanup: fix compile warnings associated with gnu_printf (#11811 ) b4695	2025-02-12 10:06:53 -04:00
Richard	748ee9fe93	ggml : fix multi-threaded clamp_f32 (#11824 ) * Bug fix for clamp_f32 When using tensors larger than 1d clamp operation does not work due to the restriction of returning if ith is not 0. * Bug fix for clamp_f32 * Bug fix for clamp_f32 b4694	2025-02-12 15:57:33 +02:00
Georgi Gerganov	f63aeecce6	llama : models now build their graphs using llama_graph_i ggml-ci	2025-02-12 15:08:40 +02:00
Weizhao Ouyang	198b1ec611	ggml-cpu: Fix duplicate MATMUL_INT8 (#11817 ) Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>	2025-02-12 13:22:58 +01:00
Johannes Gäßler	c3d6af7cd2	CUDA: fix CUDART_VERSION checks (#11821 ) b4692	2025-02-12 13:16:39 +01:00
Georgi Gerganov	0ab50f1bbb	context : prepare llama_model graph build ggml-ci	2025-02-12 14:09:55 +02:00
Georgi Gerganov	e633dc171a	context : introduce llama_graph_i ggml-ci	2025-02-12 13:49:44 +02:00
Georgi Gerganov	5eae8e5183	context : move build_rope_factors to base class ggml-ci	2025-02-12 13:32:02 +02:00
Georgi Gerganov	d146a14f77	context : minor naming fix	2025-02-12 12:41:36 +02:00
Georgi Gerganov	8da7f612b7	context : improve llama_context encapsulation ggml-ci	2025-02-12 12:15:04 +02:00
Georgi Gerganov	b52b79b048	context : move encode/decode to llama-context.cpp	2025-02-12 11:23:38 +02:00
Daniel Bevenius	369be5598a	llama : fix typo in llama-grammar.h [no ci] (#11816 )	2025-02-12 09:40:01 +02:00
lhez	4078c77f98	docs: add OpenCL (#11697 )	2025-02-11 15:04:13 -07:00
Georgi Gerganov	02ef4be975	context : initial abstraction ggml-ci	2025-02-11 22:27:21 +02:00
Sheldon Robinson	90e4dba461	Fix #11802 : Compile bug - RegQueryValueExA changed to RegQueryValueEx (#11803 ) * Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx * Fix #11802: PR #11803 - keep RegQueryValueExA, remove TEXT macro, description needs to be ANSI string b4689	2025-02-11 16:55:45 +01:00
Daniel Bevenius	a18f481f99	server : use common_token_to_piece instead of common_detokenize (#11740 ) * server : use common_token_to_piece instead of common_detokenize This commit replaces the call to common_detokenize with common_token_to_piece in the populate_token_probs. The motivation for this change is to avoid an issue where common_detokenize would remove the word boundary character for tokens, which caused a regression in the server generated token probabilities. Resolves: https://github.com/ggerganov/llama.cpp/issues/11728 * squash! server : use common_token_to_piece instead of common_detokenize Use common_token_to_piece for post_sampling_probs as well. b4688	2025-02-11 14:06:45 +01:00
Johannes Gäßler	b9ab0a4d0b	CUDA: use arch list for compatibility check (#11775 ) * CUDA: use arch list for feature availability check --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-02-11 00:17:22 +01:00
Maxim Evtush	7b891bdc86	fix: typos in documentation files (#11791 ) * Update ggml.c * Update arg.cpp * Update speculative.h b4686	2025-02-10 23:21:31 +01:00
jason_w	81732619fd	docs: utilize the forward slash (/) as the path separator for Unix-like systems (#11770 )	2025-02-10 23:17:48 +01:00
Xuan-Son Nguyen	507f9174fe	server : (webui) introduce conversation branching + idb storage (#11792 ) * server : (webui) introduce conversation branching + idb storage * mark old conv as "migrated" instead deleting them * improve migration * add more comments * more clarification	2025-02-10 21:23:17 +01:00
Wilken Gottwalt	19b392d58d	llama-mmap: fix missing include (#11796 ) Technically the fixed width types come only from iostream and cstdint/stdint.h headers. memory and vector headers should not provide these. In GCC 15 the headers are cleaned up and you require the proper header cstdint. src/llama-mmap.h:26:5: error: ‘uint32_t’ does not name a type 26 \| uint32_t read_u32() const; \| ^~~~~~~~ b4683	2025-02-10 20:58:18 +02:00
Xuan-Son Nguyen	0893e0114e	server : correct signal handler (#11795 ) b4682	2025-02-10 18:03:28 +01:00
Georgi Gerganov	2cd8a903c8	context : make output functions members ggml-ci	2025-02-10 17:01:27 +02:00
Georgi Gerganov	d1d8d53008	bman : remove ubatch member ggml-ci	2025-02-10 16:50:14 +02:00
Georgi Gerganov	ef358ee78f	context : add decode/encode ggml-ci	2025-02-10 16:14:13 +02:00
Georgi Gerganov	879ba82777	server : increase context size for the tests ggml-ci	2025-02-10 15:00:02 +02:00
Georgi Gerganov	f9971ef2e1	llama : dedup reserve code	2025-02-10 14:59:51 +02:00
Georgi Gerganov	972f91c7d7	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-02-10 14:45:54 +02:00
Olivier Chafik	d7b31a9d84	sync: minja (`a72057e519`) (#11774 ) b4681	2025-02-10 09:34:09 +00:00
pascal-lc	9ac3457b39	Update README.md [no ci] (#11781 ) typo: `\` -> `/` Change the UNIX path separator to` \`.	2025-02-10 09:05:57 +01:00
Danny Milosavljevic	c2a67efe38	vulkan: Make Vulkan optional at runtime (#11493 ). (#11494 ) Co-authored-by: Jeff Bolz <jbolz@nvidia.com> b4679	2025-02-10 07:17:21 +01:00
Wagner Bruna	b044a0fe3c	vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (#11592 ) b4678	2025-02-10 07:08:22 +01:00
Eric Curtin	19d3c8293b	There's a better way of clearing lines (#11756 ) Use the ANSI escape code for clearing a line. Signed-off-by: Eric Curtin <ecurtin@redhat.com> b4677	2025-02-09 10:34:49 +00:00
Jeff Bolz	98f6b0fd1e	vulkan: account for lookup tables when checking shared memory size (#11502 ) b4676	2025-02-09 08:43:51 +01:00
Xuan-Son Nguyen	55ac8c7791	server : (webui) revamp Settings dialog, add Pyodide interpreter (#11759 ) * redo Settings modal UI * add python code interpreter * fix auto scroll * build * fix overflow for long output lines * bring back sticky copy button * adapt layout on mobile view * fix multiple lines output and color scheme * handle python exception * better state management * add webworker * add headers * format code * speed up by loading pyodide on page load * (small tweak) add small animation to make it feels like claude b4675	2025-02-08 21:54:50 +01:00
Woof Dog	e6e6583199	server : (webui) increase edit textarea size (#11763 )	2025-02-08 20:09:55 +01:00
Georgi Gerganov	aaa5505307	server : minor log updates (#11760 ) ggml-ci	2025-02-08 18:08:43 +02:00
Georgi Gerganov	bdcf8b6a56	cont : fix mmap flag print (#11699 )	2025-02-08 16:49:38 +02:00
Karol Kontny	4d3465c5ae	ggml: Fix data race in ggml threadpool (#11736 ) After the barrier in last iteration is executed, still the loop termination condition will be executed. However main thread can destroy the cgraph object and its nodes already, then another thread will access it, but the thing is already gone. Also trouble can happen when n_nodes == 0 or abort is called, but I'm not sure if the prior situation is possible. Last syncronization should be done after the loop to ensure the cgraph/cplan won't be accessed after the main thread exits from the function. b4671	2025-02-08 15:30:53 +01:00

1 2 3 4 5 ...

4799 Commits