mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-27 08:21:30 +00:00

Files

Eric Curtin 7b717fb4b2 Rewrite llama-run to use llama-server

llama-run works fine, but falls well behind llama-server functionality.
Integrate llama-server with llama-run.

Signed-off-by: Eric Curtin <ericcurtin17@gmail.com>

2025-09-05 17:22:36 +01:00

1.3 KiB

Raw Blame History

llama.cpp/example/run

The purpose of this example is to demonstrate a minimal usage of llama.cpp for running models.

llama-run -hf llama.cpp/example/run

Usage: llama-run [server-options]

This tool starts a llama-server process and provides an interactive chat interface.
All options except --port are passed through to llama-server.

Common options:
  -h, --help                  Show this help
  -m,    --model FNAME        model path (default: `models/$filename` with filename from `--hf-file`
                              or `--model-url` if set, otherwise models/7B/ggml-model-f16.gguf)
  -hf,   -hfr, --hf-repo      <user>/<model>[:quant]
                              Hugging Face model repository; quant is optional, case-insensitive,
                              default to Q4_K_M, or falls back to the first file in the repo if
                              Q4_K_M doesn't exist.
                              mmproj is also downloaded automatically if available. to disable, add
                              --no-mmproj
                              example: unsloth/phi-4-GGUF:q4_k_m
                              (default: unused)
  -c, --ctx-size N            Context size
  -n, --predict N             Number of tokens to predict
  -t, --threads N             Number of threads

For all server options, run: llama-server --help

1.3 KiB Raw Blame History

llama.cpp/example/run

1.3 KiB

Raw Blame History