mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-11-02 09:12:03 +00:00 
			
		
		
		
	* (WIP) Implement stochastic speculative decoding * sample from residual distribution on draft accept failure * fix #5657: force greedy sampling with probs when temp is 0 * remove p_accept parameter * fix style * remove unused variables * add srand() in speculative.cpp * replace use of rand() with mt19937 sampling * fixes based on review (@JohannesGaessler) * fix r random generation * randomly select next sequence to verify + fix bug in memory freeing * fix bug in active_seqs sync * fix uniform int distribution initialization * remove warnings from comparison between int and size_t * check grammar in `llama_sample_probability_distribution_impl` * remove malloc code by utilizing vectors * add PR link to README
		
			
				
	
	
		
			10 lines
		
	
	
		
			285 B
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			10 lines
		
	
	
		
			285 B
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# llama.cpp/examples/speculative
 | 
						|
 | 
						|
Demonstration of speculative decoding and tree-based speculative decoding techniques
 | 
						|
 | 
						|
More info:
 | 
						|
 | 
						|
- https://github.com/ggerganov/llama.cpp/pull/2926
 | 
						|
- https://github.com/ggerganov/llama.cpp/pull/3624
 | 
						|
- https://github.com/ggerganov/llama.cpp/pull/5625
 |