mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-11-04 09:32:00 +00:00 
			
		
		
		
	update straggling refs
This commit is contained in:
		
							
								
								
									
										2
									
								
								.github/workflows/build.yml
									
									
									
									
										vendored
									
									
								
							
							
						
						
									
										2
									
								
								.github/workflows/build.yml
									
									
									
									
										vendored
									
									
								
							@@ -240,7 +240,7 @@ jobs:
 | 
				
			|||||||
          echo "Fetch llama2c model"
 | 
					          echo "Fetch llama2c model"
 | 
				
			||||||
          wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories260K/stories260K.bin
 | 
					          wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories260K/stories260K.bin
 | 
				
			||||||
          ./bin/convert-llama2c-to-ggml --copy-vocab-from-model ./tok512.bin --llama2c-model stories260K.bin --llama2c-output-model stories260K.gguf
 | 
					          ./bin/convert-llama2c-to-ggml --copy-vocab-from-model ./tok512.bin --llama2c-model stories260K.bin --llama2c-output-model stories260K.gguf
 | 
				
			||||||
          ./bin/main -m stories260K.gguf -p "One day, Lily met a Shoggoth" -n 500 -c 256
 | 
					          ./bin/llama -m stories260K.gguf -p "One day, Lily met a Shoggoth" -n 500 -c 256
 | 
				
			||||||
 | 
					
 | 
				
			||||||
      - name: Determine tag name
 | 
					      - name: Determine tag name
 | 
				
			||||||
        id: tag
 | 
					        id: tag
 | 
				
			||||||
 
 | 
				
			|||||||
@@ -77,7 +77,7 @@ It has the similar design of other llama.cpp BLAS-based paths such as *OpenBLAS,
 | 
				
			|||||||
*Notes:*
 | 
					*Notes:*
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- **Memory**
 | 
					- **Memory**
 | 
				
			||||||
  - The device memory is a limitation when running a large model. The loaded model size, *`llm_load_tensors: buffer_size`*, is displayed in the log when running `./bin/main`.
 | 
					  - The device memory is a limitation when running a large model. The loaded model size, *`llm_load_tensors: buffer_size`*, is displayed in the log when running `./bin/llama`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  - Please make sure the GPU shared memory from the host is large enough to account for the model's size. For e.g. the *llama-2-7b.Q4_0* requires at least 8.0GB for integrated GPU and 4.0GB for discrete GPU.
 | 
					  - Please make sure the GPU shared memory from the host is large enough to account for the model's size. For e.g. the *llama-2-7b.Q4_0* requires at least 8.0GB for integrated GPU and 4.0GB for discrete GPU.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 
 | 
				
			|||||||
@@ -27,10 +27,8 @@ To mitigate it, you can increase values in `n_predict`, `kv_size`.
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
```shell
 | 
					```shell
 | 
				
			||||||
cd ../../..
 | 
					cd ../../..
 | 
				
			||||||
mkdir build
 | 
					cmake -B build -DLLAMA_CURL=ON
 | 
				
			||||||
cd build
 | 
					cmake --build build --target llama-server
 | 
				
			||||||
cmake -DLLAMA_CURL=ON ../
 | 
					 | 
				
			||||||
cmake --build . --target llama-server
 | 
					 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
2. Start the test: `./tests.sh`
 | 
					2. Start the test: `./tests.sh`
 | 
				
			||||||
 
 | 
				
			|||||||
		Reference in New Issue
	
	Block a user