mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-31 08:51:55 +00:00

Files

goerch b08e75baea Fixing the last deviations from sentencepiece indicated by test-tokenizer-1 (#3170 )

* Fix für #2721

* Reenable tokenizer test for LLaMa

* Add `console.cpp` dependency

* Fix dependency to `common`

* Fixing wrong fix.

* Make console usage platform specific

Work on compiler warnings.

* Adapting makefile

* Remove trailing whitespace

* Adapting the other parts of the makefile

* Fix typo.

* Fixing the last deviations from sentencepiece indicated by test-tokenizer-1

* Simplify logic

* Add missing change...

* Fix ugly compiler warning

* llama_tokenize should accept strings containing NUL now

* Adding huichen's test case

2023-09-16 13:41:33 +02:00

CMakeLists.txt

cmake : install targets (#2256 )

2023-07-19 10:01:11 +03:00

convert-train-checkpoint-to-gguf.py

scripts: Use local gguf package when running from repo (#2927 )

2023-08-31 16:49:24 -06:00

README.md

train : mem usage and other improvements (#2439 )

2023-08-28 22:51:47 +03:00

train-text-from-scratch.cpp

Fixing the last deviations from sentencepiece indicated by test-tokenizer-1 (#3170 )

2023-09-16 13:41:33 +02:00

README.md

train-text-from-scratch

Basic usage instructions:

# get training data
wget https://raw.githubusercontent.com/brunoklein99/deep-learning-notes/master/shakespeare.txt

# train
./bin/train-text-from-scratch \
        --vocab-model ../models/ggml-vocab-llama.gguf \
        --ctx 64 --embd 256 --head 8 --layer 16 \
        --checkpoint-in  chk-shakespeare-256x16.gguf \
        --checkpoint-out chk-shakespeare-256x16.gguf \
        --model-out ggml-shakespeare-256x16-f32.gguf \
        --train-data "shakespeare.txt" \
        -t 6 -b 16 --seed 1 --adam-iter 256 \
        --no-checkpointing

# predict
./bin/main -m ggml-shakespeare-256x16-f32.gguf