update straggling refs

2025-11-04 09:32:00 +00:00 · 2024-06-07 09:42:21 +01:00
parent 99df4cc091
commit 7fbe6006c9
3 changed files with 4 additions and 6 deletions
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -240,7 +240,7 @@ jobs:
          echo "Fetch llama2c model"
          wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories260K/stories260K.bin
          ./bin/convert-llama2c-to-ggml --copy-vocab-from-model ./tok512.bin --llama2c-model stories260K.bin --llama2c-output-model stories260K.gguf
-          ./bin/main -m stories260K.gguf -p "One day, Lily met a Shoggoth" -n 500 -c 256
+          ./bin/llama -m stories260K.gguf -p "One day, Lily met a Shoggoth" -n 500 -c 256
      - name: Determine tag name
        id: tag
--- a/README-sycl.md
+++ b/README-sycl.md
@@ -77,7 +77,7 @@ It has the similar design of other llama.cpp BLAS-based paths such as *OpenBLAS,
 *Notes:*
 - **Memory**
-  - The device memory is a limitation when running a large model. The loaded model size, *`llm_load_tensors: buffer_size`*, is displayed in the log when running `./bin/main`.
+  - The device memory is a limitation when running a large model. The loaded model size, *`llm_load_tensors: buffer_size`*, is displayed in the log when running `./bin/llama`.
  - Please make sure the GPU shared memory from the host is large enough to account for the model's size. For e.g. the *llama-2-7b.Q4_0* requires at least 8.0GB for integrated GPU and 4.0GB for discrete GPU.
--- a/examples/server/tests/README.md
+++ b/examples/server/tests/README.md
@@ -27,10 +27,8 @@ To mitigate it, you can increase values in `n_predict`, `kv_size`.
 ```shell
 cd ../../..
-mkdir build
+cmake -B build -DLLAMA_CURL=ON
-cd build
+cmake --build build --target llama-server
 cmake -DLLAMA_CURL=ON ../
 cmake --build . --target llama-server
 ```
 2. Start the test: `./tests.sh`