mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-31 08:51:55 +00:00

Files

Adrien Gallouët 4201deae9c common: introduce http.h for httplib-based client (#16373 )

* common: introduce http.h for httplib-based client

This change moves cpp-httplib based URL parsing and client setup into
a new header `common/http.h`, and integrates it in `arg.cpp` and `run.cpp`.

It is an iteration towards removing libcurl, while intentionally
minimizing changes to existing code to guarantee the same behavior when
`LLAMA_CURL` is used.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* tools : add missing WIN32_LEAN_AND_MEAN

Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>

2025-10-01 20:22:18 +03:00

linenoise.cpp

llama : move end-user examples to tools directory (#13249 )

2025-05-02 20:27:13 +02:00

CMakeLists.txt

cmake : Do not install tools on iOS targets (#15903 )

2025-09-16 09:54:44 +07:00

README.md

llama-run: add support for downloading models from ModelScope (#13370 )

2025-05-09 10:25:50 +01:00

run.cpp

common: introduce http.h for httplib-based client (#16373 )

2025-10-01 20:22:18 +03:00

README.md

llama.cpp/example/run

The purpose of this example is to demonstrate a minimal usage of llama.cpp for running models.

llama-run granite3-moe

Description:
  Runs a llm

Usage:
  llama-run [options] model [prompt]

Options:
  -c, --context-size <value>
      Context size (default: 2048)
  -n, -ngl, --ngl <value>
      Number of GPU layers (default: 0)
  --temp <value>
      Temperature (default: 0.8)
  -v, --verbose, --log-verbose
      Set verbosity level to infinity (i.e. log all messages, useful for debugging)
  -h, --help
      Show help message

Commands:
  model
      Model is a string with an optional prefix of
      huggingface:// (hf://), ollama://, https:// or file://.
      If no protocol is specified and a file exists in the specified
      path, file:// is assumed, otherwise if a file does not exist in
      the specified path, ollama:// is assumed. Models that are being
      pulled are downloaded with .partial extension while being
      downloaded and then renamed as the file without the .partial
      extension when complete.

Examples:
  llama-run llama3
  llama-run ollama://granite-code
  llama-run ollama://smollm:135m
  llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
  llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
  llama-run ms://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
  llama-run modelscope://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
  llama-run https://example.com/some-file1.gguf
  llama-run some-file2.gguf
  llama-run file://some-file3.gguf
  llama-run --ngl 999 some-file4.gguf
  llama-run --ngl 999 some-file5.gguf Hello World