Commit Graph

1214 Commits

Author SHA1 Message Date
Georgi Gerganov
035d511457 llama : minor API updates 2023-08-18 17:10:20 +03:00
Georgi Gerganov
2d6c2c757c llama : remove C++ API + reorganize common source in /common dir 2023-08-18 16:22:48 +03:00
Georgi Gerganov
38016ed9ec Merge branch 'master' into gguf 2023-08-18 15:21:48 +03:00
Georgi Gerganov
660ca9bbca llama : re-order functions 2023-08-18 14:56:36 +03:00
slaren
097e121e2f llama : add benchmark example (#2626)
* llama : add benchmark example

* add to examples CMakeLists.txt

* fix msvc build

* add missing include

* add Bessel's correction to stdev calculation

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* improve markdown formatting

* add missing include

* print warning is NDEBUG is not defined

* remove n_prompt and n_gen from the matrix, use each value separately instead

* better checks for non-optimized builds

* llama.cpp : fix MEM_REQ_SCRATCH0 reusing the value of n_ctx of the first call

* fix json formatting

* add sql output

* add basic cpu and gpu info (linx/cuda only)

* markdown: also show values that differ from the default

* markdown: add build id

* cleanup

* improve formatting

* formatting

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
master-097e121
2023-08-18 12:44:58 +02:00
mdrokz
eaf98c2649 readme : add link to Rust bindings (#2656) 2023-08-18 13:17:58 +03:00
Georgi Gerganov
e9b12c332e perplexity : more meaningful ETA number - 2 decimal points master-e9b12c3 2023-08-18 12:48:55 +03:00
Georgi Gerganov
dea5be61d7 editorconfig : fix whitespaces 2023-08-18 12:42:38 +03:00
Georgi Gerganov
e35f8c744e tests : update vocab file with new magic 2023-08-18 12:39:22 +03:00
Georgi Gerganov
856afff746 Merge branch 'master' into gguf 2023-08-18 12:38:05 +03:00
Georgi Gerganov
aa3efe87c8 llama : print number of tensors per type + print arch + style 2023-08-18 10:36:45 +03:00
klosax
b275de745d llama.cpp : get special token kv and linefeed token id 2023-08-18 03:34:30 +02:00
Evan Jones
604b8bdfa6 Fix unicode in grammars (fixes #2501) (#2553)
* Fix unicode in grammars (fixes #2501)

* add more comments

* fix test-llama-grammar
master-604b8bd
2023-08-17 19:54:44 -04:00
staviq
10151bee2e server : support for saving templates in browser LocalStorage (#2486)
* support for templates in browser LocalStorage

* sync accepted #2409 fix from upstream

* convert autosave invocation to useEffect

* Apply suggestions from code review

Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com>

* Regen index.html.cpp, suggested from code review

---------

Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com>
master-10151be
2023-08-18 07:34:01 +08:00
klosax
306070c896 llama.cpp : print kv general.name 2023-08-18 01:06:27 +02:00
Johannes Gäßler
0992a7b8b1 README: fix LLAMA_CUDA_MMV_Y documentation (#2647) 2023-08-17 23:57:59 +02:00
klosax
d9e6890a51 test-tokenizer-0.cpp : fix warning 2023-08-17 23:34:21 +02:00
klosax
147a99bd3a gguf.py : reverse GGUF_MAGIC 2023-08-17 23:24:04 +02:00
klosax
c20ae49b59 ggml.h : reverse GGUF_MAGIC 2023-08-17 23:23:17 +02:00
Henri Vasserman
6ddeefad9b [Zig] Fixing Zig build and improvements (#2554)
* Fix zig after console.o was split

* Better include and flag management

* Change LTO to option
2023-08-17 23:11:18 +03:00
klosax
3c1b7217a9 convert-llama-7b-pth-to-gguf.py : fixes 2023-08-17 21:44:34 +02:00
klosax
9e2d4dd48e convert-llama-hf-to-gguf.py : fixes 2023-08-17 21:43:48 +02:00
klosax
640ddc4259 gguf.py : gptneox mapping 2023-08-17 21:43:10 +02:00
klosax
b668cd3296 convert-gptneox-hf-to-gguf.py : fixes 2023-08-17 21:42:26 +02:00
M. Yusuf Sarıgöz
fc3a523211 gguf.py : write tensors in a single pass (#2644)
* gguf : single pass for writing tensors + refactoring writer

* gguf : single pass for writing tensors + refactoring writer

* gguf : single pass for writing tensors + refactoring writer

* gguf : style fixes in simple conversion script

* gguf : refactor gptneox conversion script

* gguf : rename h5 to hf (for HuggingFace)

* gguf : refactor pth to gguf conversion script

* gguf : rm file_type key and method

* gguf.py : fix vertical alignment

* gguf.py : indentation

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-08-17 21:57:39 +03:00
Georgi Gerganov
5484737d58 llama : fix tensor name grepping during quantization
ggml-ci
2023-08-17 21:40:51 +03:00
Georgi Gerganov
57eaadb853 llama : throw error if gguf fails to init from file
ggml-ci
2023-08-17 21:32:14 +03:00
klosax
b3cc182990 llama.cpp : typo 2023-08-17 20:27:50 +02:00
Georgi Gerganov
acaa98234a convert.py : fix HF tensor permuting / unpacking
ggml-ci
2023-08-17 21:06:45 +03:00
klosax
78e1e57862 quantize-stats.cpp : .bin --> .gguf 2023-08-17 19:18:24 +02:00
klosax
fb11dd3f92 common.h : .bin --> .gguf 2023-08-17 19:16:35 +02:00
Georgi Gerganov
e72c8c2124 ggml : fix bug in gguf_set_kv
ggml-ci
2023-08-17 20:13:48 +03:00
Georgi Gerganov
899f9a5350 llama : fix lambda capture
ggml-ci
2023-08-17 19:49:45 +03:00
Georgi Gerganov
93f285bdf1 gptneox : move as a WIP example 2023-08-17 19:49:45 +03:00
Georgi Gerganov
81a2c2a6f4 llama : fix llama_model_loader memory leak 2023-08-17 19:49:02 +03:00
Georgi Gerganov
dd9e2fc988 ci : update ".bin" to ".gguf" extension
ggml-ci
2023-08-17 19:32:14 +03:00
Georgi Gerganov
c3b739374e editorconfig : ignore models folder
ggml-ci
2023-08-17 19:17:25 +03:00
Georgi Gerganov
6d66ef96eb Merge branch 'master' into gguf 2023-08-17 19:04:59 +03:00
Georgi Gerganov
11bf4366c2 llama : sync with recent PRs on master 2023-08-17 19:03:15 +03:00
Georgi Gerganov
8ace03ad3d convert.py : better always have n_head_kv and default it to n_head 2023-08-17 18:47:06 +03:00
klosax
d646c4efce convert.py : n_head_kv optional and .gguf file extension 2023-08-17 17:20:36 +02:00
Georgi Gerganov
dd016cc246 Revert "ci : disable CI temporary to not waste energy"
This reverts commit 7e82d25f40.
2023-08-17 17:23:16 +03:00
Georgi Gerganov
2ddd9681d6 convert.py : update to support GGUF output 2023-08-17 17:22:43 +03:00
Georgi Gerganov
e0429d38e4 convert-new.py : output gguf (#2635)
* convert-new.py : output gguf (WIP)

* convert-new.py : add gguf key-value pairs

* llama : add hparams.ctx_train + no longer print ftype

* convert-new.py : minor fixes

* convert-new.py : vocab-only option should work now

* llama : fix tokenizer to use llama_char_to_byte

* tests : add new ggml-vocab-llama.gguf

* convert-new.py : tensor name mapping

* convert-new.py : add map for skipping tensor serialization

* convert-new.py : convert script now works

* gguf.py : pick some of the refactoring from #2644

* convert-new.py : minor fixes
2023-08-17 17:19:52 +03:00
Kerfuffle
8dae7ce684 Add --cfg-negative-prompt-file option for examples (#2591)
Add --cfg-negative-prompt-file option for examples
master-8dae7ce
2023-08-17 07:29:44 -06:00
klosax
d6fd53afd6 llama.cpp : use ggml_elements() 2023-08-17 15:24:35 +02:00
klosax
5a0a2c5685 llama.cpp : print actual model size 2023-08-17 15:18:16 +02:00
Georgi Gerganov
a73ccf1aa3 llama : replace (permute + reshape + view_1d) with (view_3d) (#2538)
ggml-ci
master-a73ccf1
2023-08-17 10:47:09 +03:00
drbh
7cf54e1f74 tests : adds simple llama grammar tests (#2618)
* adds simple llama grammar tests

* fix lint and add Makefile

* 0 terminate code_points

* avoid dangling pointers in candidate cleanup

* cleanup grammar at end of test
master-7cf54e1
2023-08-17 10:41:01 +03:00
Shouzheng Liu
a872a2b28e ggml-alloc : fix discrepency between measure&eval (#2639)
The GGML memory allocator consistently places a tensor within the
optimal-fit memory block, which is the smallest block capable of
accommodating the tensor's size. During the measurement phase, the final
block is generously sized, ensuring it never qualifies as the
optimal-fit block as long as there exists another block capable of
accommodating the tensor. Nevertheless, in the evaluation phase, the
last block is constrained in size and could potentially qualify as the
optimal-fit block. Consequently, there exists the possibility of a
tensor being allocated to a different region during evaluation, leading
to more memory fragmentation in our scratch buffer.

This recent commit guarantees uniform behavior of the allocator across
both the measurement and evaluation phases, eliminating discrepancies
between the two.
master-a872a2b
2023-08-17 10:35:53 +03:00