mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-31 08:51:55 +00:00 
			
		
		
		
	 57684331fc
			
		
	
	57684331fc
	
	
	
		
			
			* Make tokenizer.cpp CLI tool nicer.
Before this commit, tokenize was a simple CLI tool like this:
  tokenize MODEL_FILENAME PROMPT [--ids]
This simple tool loads the model, takes the prompt, and shows the tokens
llama.cpp is interpreting.
This changeset makes the tokenize more sophisticated, and more useful
for debugging and troubleshooting:
  tokenize [-m, --model MODEL_FILENAME]
           [--ids]
           [--stdin]
           [--prompt]
           [-f, --file]
           [--no-bos]
           [--log-disable]
It also behaves nicer on Windows now, interpreting and rendering Unicode
from command line arguments and pipes no matter what code page the user
has set on their terminal.
* style fix: strlen(str) == 0 --> *str == 0
* Simplify tokenize.cpp; by getting rid of handling positional style arguments.
It must now be invoked with long --model, --prompt etc. arguments only.
Shortens the code.
* tokenize.cpp: iostream header no longer required
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: brian khuu <mofosyne@gmail.com>
		
	
		
			
				
	
	
	
		
			13 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			13 KiB