mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-30 08:42:00 +00:00 
			
		
		
		
	 68ee98ae18
			
		
	
	68ee98ae18
	
	
	
		
			
			In streaming mode when prompt exceeds context length, the server returns HTTP 200 status code with a JSON error in the body. This is very confusing and inconsistent with all other inference engines which return HTTP 4xx error in this case. This patch fixes this problem and makes the server return HTTP 400 in such cases.
		
			
				
	
	
	
		
			22 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			22 KiB