* Add Pad Reflect 1D CUDA support
* Update ggml/src/ggml-cuda/pad_reflect_1d.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
- Use server_tokens in more places in server and util.cpp
- Convert most functions that used llama_tokens to server_tokens
- Modify input tokenizer to handle JSON objects as subprompts
- Break out MTMD prompt parsing into utility function
- Support JSON objects with multimodal_data arrays for MTMD prompts along with other existing types
- Add capability to model endpoint to indicate if client can send multimodal data
- Add tests.
* vulkan: Reuse conversion results in prealloc_y
Cache the pipeline and tensor that were most recently used to fill prealloc_y,
and skip the conversion if the current pipeline/tensor match.
* don't use shared pointer for prealloc_y_last_pipeline_used
* Changed the CI file to hw
* Changed the CI file to hw
* Added to sudoers for apt
* Removed the clone command and used checkout
* Added libcurl
* Added gcc-14
* Checking gcc --version
* added gcc-14 symlink
* added CC and C++ variables
* Added the gguf weight
* Changed the weights path
* Added system specification
* Removed white spaces
* ci: Replace Jenkins riscv native build Cloud-V pipeline with GitHub Actions workflow
Removed the legacy .devops/cloud-v-pipeline Jenkins CI configuration and introduced .github/workflows/build-riscv-native.yml for native RISC-V builds using GitHub Actions.
* removed trailing whitespaces
* Added the trigger at PR creation
* Corrected OS name
* Added ccache as setup package
* Added ccache for self-hosted runner
* Added directory for ccache size storage
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Changed the build command and added ccache debug log
* Added the base dir for the ccache
* Re-trigger CI
* Cleanup and refactored ccache steps
* Cleanup and refactored ccache steps
---------
Co-authored-by: Akif Ejaz <akifejaz40@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* examples : add model conversion tool/example
This commit adds an "example/tool" that is intended to help in the
process of converting models to GGUF. Currently it supports normal
causal models and embedding models. The readme contains instructions and
command to guide through the process.
The motivation for this to have a structured and repeatable process for
model conversions and hopefully with time improve upon it to make the
process easier and more reliable. We have started to use this for new
model conversions internally and will continue doing so and improve it
as we go along. Perhaps with time this should be placed in a different
directory than the examples directory, but for now it seems like a good
place to keep it while we are still developing it.
* squash! examples : add model conversion tool/example
Remove dependency on scikit-learn in model conversion example.
* squash! examples : add model conversion tool/example
Update transformer dep to use non-dev version. And also import
`AutoModelForCausalLM` instead of `AutoModel` to ensure compatibility
with the latest version.
* squash! examples : add model conversion tool/example
Remove the logits requirements file from the all requirements file.
* Fix -Werror=return-type so ci/run.sh can run
* Update tools/mtmd/clip.cpp
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* Remove false now that we have abort
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* Initial plan
* Initialize copilot instructions exploration
* Add comprehensive .github/copilot-instructions.md file
* Update Python environment and tools directory documentation
- Add instructions for using .venv Python environment
- Include flake8 and pyright linting tools from virtual environment
- Add tools/ as core directory in project layout
- Reference existing configuration files (.flake8, pyrightconfig.json)
* add more python dependencies to .venv
* Update copilot instructions: add backend hardware note and server testing
* Apply suggestions from code review
* Apply suggestions from code review
* Replace clang-format with git clang-format to format only changed code
* Minor formatting improvements: remove extra blank line and add trailing newline
* try installing git-clang-format
* try just clang-format
* Remove --binary flag from git clang-format and add git-clang-format installation to CI
* download 18.x release
* typo--
* remove --binary flag
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Make Mistral community chat templates optional
* Change the flag arg to disable instead of enable community chat templates
* Improve error message
* Improve help message
* Tone down the logger messages
This commit removes references to `make` in the examples, as the build
system has been updated to use CMake directly and using `make` will now
generate an error since Commit 37f10f955f
("make : remove make in favor of CMake (#15449)").
This commit addresses an inconsistency during inference by adding a new
member to the `templates_params` struct to indicate whether the chat is
in inference mode. This allows the gpt-oss specific function
`common_chat_params_init_gpt_oss` to check this flag and the
`add_generation_prompt` flag to determine if it should replace the
`<|return|>` token with the `<|end|>` token in the prompt.
The motivation for this change is to ensure that the formatted prompt of
past messages in `common_chat_format_single` matches the output of the
formatted new message. The issue is that the gpt-oss template returns
different end tags: `<|return|>` when `add_generation_prompt` is false,
and `<|end|>` when `add_generation_prompt` is true. This causes the
substring function to start at an incorrect position, resulting in
tokenization starting with 'tart|>' instead of '<|start|>'.
Resolves: https://github.com/ggml-org/llama.cpp/issues/15417
* Update docker.yml
修改docker.yml文件中的内容使其停止周期性的运行该workflow,如果想要运行该workflow可以手动启动
* feat:Modify the header file include path
1. There's no llava directory in the tools directory.
2. Because the command `target_include_directories(mtmd PUBLIC .)` is used in the `mtmd` CMakeLists.txt file, other targets that link against `mtmd` automatically include the `mtmd` directory as a search path for header files. Therefore, you can remove `target_include_directories(${TARGET} PRIVATE ../llava`` or use `target_include_directories(${TARGET} PRIVATE ../mtmd`` to explicitly require the `llama-server` target to use header files from `mtmd`.
* Restore the docker.yml file
This commit removes the content from the Makefile and updates the
current deprecation message to information that `make` has been
replaced by CMake instead.
The message when `make` is invoked will now be the following:
```console
$ make
Makefile:6: *** Build system changed:
The Makefile build has been replaced by CMake.
For build instructions see:
https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md
. Stop.
```
The motivation for this is that many, if not all targets fail to build
now, after changes to the system, and `make` has also been deprected for
some time now.
* musa: fix build warnings
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* fix warning: comparison of integers of different signs: 'const int' and 'unsigned int' [-Wsign-compare]
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>