llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-08 10:07:01 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	035d511457	llama : minor API updates	2023-08-18 17:10:20 +03:00
Georgi Gerganov	2d6c2c757c	llama : remove C++ API + reorganize common source in /common dir	2023-08-18 16:22:48 +03:00
Georgi Gerganov	38016ed9ec	Merge branch 'master' into gguf	2023-08-18 15:21:48 +03:00
Georgi Gerganov	660ca9bbca	llama : re-order functions	2023-08-18 14:56:36 +03:00
slaren	097e121e2f	llama : add benchmark example (#2626 ) * llama : add benchmark example * add to examples CMakeLists.txt * fix msvc build * add missing include * add Bessel's correction to stdev calculation Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * improve markdown formatting * add missing include * print warning is NDEBUG is not defined * remove n_prompt and n_gen from the matrix, use each value separately instead * better checks for non-optimized builds * llama.cpp : fix MEM_REQ_SCRATCH0 reusing the value of n_ctx of the first call * fix json formatting * add sql output * add basic cpu and gpu info (linx/cuda only) * markdown: also show values that differ from the default * markdown: add build id * cleanup * improve formatting * formatting --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de> master-097e121	2023-08-18 12:44:58 +02:00
mdrokz	eaf98c2649	readme : add link to Rust bindings (#2656 )	2023-08-18 13:17:58 +03:00
Georgi Gerganov	e9b12c332e	perplexity : more meaningful ETA number - 2 decimal points master-e9b12c3	2023-08-18 12:48:55 +03:00
Georgi Gerganov	dea5be61d7	editorconfig : fix whitespaces	2023-08-18 12:42:38 +03:00
Georgi Gerganov	e35f8c744e	tests : update vocab file with new magic	2023-08-18 12:39:22 +03:00
Georgi Gerganov	856afff746	Merge branch 'master' into gguf	2023-08-18 12:38:05 +03:00
Georgi Gerganov	aa3efe87c8	llama : print number of tensors per type + print arch + style	2023-08-18 10:36:45 +03:00
klosax	b275de745d	llama.cpp : get special token kv and linefeed token id	2023-08-18 03:34:30 +02:00
Evan Jones	604b8bdfa6	Fix unicode in grammars (fixes #2501 ) (#2553 ) * Fix unicode in grammars (fixes #2501) * add more comments * fix test-llama-grammar master-604b8bd	2023-08-17 19:54:44 -04:00
staviq	10151bee2e	server : support for saving templates in browser LocalStorage (#2486 ) * support for templates in browser LocalStorage * sync accepted #2409 fix from upstream * convert autosave invocation to useEffect * Apply suggestions from code review Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com> * Regen index.html.cpp, suggested from code review --------- Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com> master-10151be	2023-08-18 07:34:01 +08:00
klosax	306070c896	llama.cpp : print kv general.name	2023-08-18 01:06:27 +02:00
Johannes Gäßler	0992a7b8b1	README: fix LLAMA_CUDA_MMV_Y documentation (#2647 )	2023-08-17 23:57:59 +02:00
klosax	d9e6890a51	test-tokenizer-0.cpp : fix warning	2023-08-17 23:34:21 +02:00
klosax	147a99bd3a	gguf.py : reverse GGUF_MAGIC	2023-08-17 23:24:04 +02:00
klosax	c20ae49b59	ggml.h : reverse GGUF_MAGIC	2023-08-17 23:23:17 +02:00
Henri Vasserman	6ddeefad9b	[Zig] Fixing Zig build and improvements (#2554 ) * Fix zig after console.o was split * Better include and flag management * Change LTO to option	2023-08-17 23:11:18 +03:00
klosax	3c1b7217a9	convert-llama-7b-pth-to-gguf.py : fixes	2023-08-17 21:44:34 +02:00
klosax	9e2d4dd48e	convert-llama-hf-to-gguf.py : fixes	2023-08-17 21:43:48 +02:00
klosax	640ddc4259	gguf.py : gptneox mapping	2023-08-17 21:43:10 +02:00
klosax	b668cd3296	convert-gptneox-hf-to-gguf.py : fixes	2023-08-17 21:42:26 +02:00
M. Yusuf Sarıgöz	fc3a523211	gguf.py : write tensors in a single pass (#2644 ) * gguf : single pass for writing tensors + refactoring writer * gguf : single pass for writing tensors + refactoring writer * gguf : single pass for writing tensors + refactoring writer * gguf : style fixes in simple conversion script * gguf : refactor gptneox conversion script * gguf : rename h5 to hf (for HuggingFace) * gguf : refactor pth to gguf conversion script * gguf : rm file_type key and method * gguf.py : fix vertical alignment * gguf.py : indentation --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-17 21:57:39 +03:00
Georgi Gerganov	5484737d58	llama : fix tensor name grepping during quantization ggml-ci	2023-08-17 21:40:51 +03:00
Georgi Gerganov	57eaadb853	llama : throw error if gguf fails to init from file ggml-ci	2023-08-17 21:32:14 +03:00
klosax	b3cc182990	llama.cpp : typo	2023-08-17 20:27:50 +02:00
Georgi Gerganov	acaa98234a	convert.py : fix HF tensor permuting / unpacking ggml-ci	2023-08-17 21:06:45 +03:00
klosax	78e1e57862	quantize-stats.cpp : .bin --> .gguf	2023-08-17 19:18:24 +02:00
klosax	fb11dd3f92	common.h : .bin --> .gguf	2023-08-17 19:16:35 +02:00
Georgi Gerganov	e72c8c2124	ggml : fix bug in gguf_set_kv ggml-ci	2023-08-17 20:13:48 +03:00
Georgi Gerganov	899f9a5350	llama : fix lambda capture ggml-ci	2023-08-17 19:49:45 +03:00
Georgi Gerganov	93f285bdf1	gptneox : move as a WIP example	2023-08-17 19:49:45 +03:00
Georgi Gerganov	81a2c2a6f4	llama : fix llama_model_loader memory leak	2023-08-17 19:49:02 +03:00
Georgi Gerganov	dd9e2fc988	ci : update ".bin" to ".gguf" extension ggml-ci	2023-08-17 19:32:14 +03:00
Georgi Gerganov	c3b739374e	editorconfig : ignore models folder ggml-ci	2023-08-17 19:17:25 +03:00
Georgi Gerganov	6d66ef96eb	Merge branch 'master' into gguf	2023-08-17 19:04:59 +03:00
Georgi Gerganov	11bf4366c2	llama : sync with recent PRs on master	2023-08-17 19:03:15 +03:00
Georgi Gerganov	8ace03ad3d	convert.py : better always have n_head_kv and default it to n_head	2023-08-17 18:47:06 +03:00
klosax	d646c4efce	convert.py : n_head_kv optional and .gguf file extension	2023-08-17 17:20:36 +02:00
Georgi Gerganov	dd016cc246	Revert "ci : disable CI temporary to not waste energy" This reverts commit `7e82d25f40`.	2023-08-17 17:23:16 +03:00
Georgi Gerganov	2ddd9681d6	convert.py : update to support GGUF output	2023-08-17 17:22:43 +03:00
Georgi Gerganov	e0429d38e4	convert-new.py : output gguf (#2635 ) * convert-new.py : output gguf (WIP) * convert-new.py : add gguf key-value pairs * llama : add hparams.ctx_train + no longer print ftype * convert-new.py : minor fixes * convert-new.py : vocab-only option should work now * llama : fix tokenizer to use llama_char_to_byte * tests : add new ggml-vocab-llama.gguf * convert-new.py : tensor name mapping * convert-new.py : add map for skipping tensor serialization * convert-new.py : convert script now works * gguf.py : pick some of the refactoring from #2644 * convert-new.py : minor fixes	2023-08-17 17:19:52 +03:00
Kerfuffle	8dae7ce684	Add --cfg-negative-prompt-file option for examples (#2591 ) Add --cfg-negative-prompt-file option for examples master-8dae7ce	2023-08-17 07:29:44 -06:00
klosax	d6fd53afd6	llama.cpp : use ggml_elements()	2023-08-17 15:24:35 +02:00
klosax	5a0a2c5685	llama.cpp : print actual model size	2023-08-17 15:18:16 +02:00
Georgi Gerganov	a73ccf1aa3	llama : replace (permute + reshape + view_1d) with (view_3d) (#2538 ) ggml-ci master-a73ccf1	2023-08-17 10:47:09 +03:00
drbh	7cf54e1f74	tests : adds simple llama grammar tests (#2618 ) * adds simple llama grammar tests * fix lint and add Makefile * 0 terminate code_points * avoid dangling pointers in candidate cleanup * cleanup grammar at end of test master-7cf54e1	2023-08-17 10:41:01 +03:00
Shouzheng Liu	a872a2b28e	ggml-alloc : fix discrepency between measure&eval (#2639 ) The GGML memory allocator consistently places a tensor within the optimal-fit memory block, which is the smallest block capable of accommodating the tensor's size. During the measurement phase, the final block is generously sized, ensuring it never qualifies as the optimal-fit block as long as there exists another block capable of accommodating the tensor. Nevertheless, in the evaluation phase, the last block is constrained in size and could potentially qualify as the optimal-fit block. Consequently, there exists the possibility of a tensor being allocated to a different region during evaluation, leading to more memory fragmentation in our scratch buffer. This recent commit guarantees uniform behavior of the allocator across both the measurement and evaluation phases, eliminating discrepancies between the two. master-a872a2b	2023-08-17 10:35:53 +03:00

1 2 3 4 5 ...

1214 Commits