This commit add the trust_remote_code=True argument when loading models
using AutoConfig, AutoTokenizer, and AutoModelForCausalLM for the run
original model script.
The motivation for this is that some models require custom code to be
loaded properly, and setting trust_remote_code=True avoids a prompt
asking for user confirmation:
```console
(venv) $ make causal-run-original-model
The repository /path/to/model contains custom code which must be
executed to correctly load the model. You can inspect the repository
content at /path/to/model.
Do you wish to run the custom code? [y/N] N
```
Having this as the default seems like a safe choice as we have to clone
or download the models we convert and would be expecting to run any
custom code they have.
* model-conversion : add support for SentenceTransformers
This commit adds support for models that use SentenceTransformer layers.
The motivation for this is that if converted model includes any of the
numbered layers specified in the original models repository then these
changes enable these models to be used and verified. Currently the
model-conversion only support the base model output without any of
the additional transformation layers.
Usage:
Convert the model that also includes the SentenceTransformer layers:
```console
(venv) $ export EMBEDDING_MODEL_PATH="~/google/embeddinggemma-300M"
(venv) make embedding-convert-model
```
Verify the produced embeddings from the converted model against the
original model embeddings:
```console
(venv) make embedding-verify-logits-st
```
The original model can be run using SentenceTransformer:
```console
(venv) make embedding-run-original-model-st
```
Run the converted model using "SentenceTransformer" layers whic
enables pooling and normalization:
```console
(venv) make embedding-run-converted-model-st
```
* add model-conversion example requirements
* add support for -st flag in embedding model conversion
This commit add support for the -st flag in the embedding model
conversion script. This will enable models to be converted using
sentence transformers dense layers.
This commit adds support for passing a prompt file to the model
conversion targets/scripts. It also updates the logits.cpp to print out
embedding information in the same format as when running the original
embedding model.
The motivation for this is that it allows us to pass files of different
sizes when running the converted models and validating the logits.
This can be particularly important when testing the sliding window
functionality of models where the sequence length needs to exceed a
certain number of tokens to trigger the sliding window logic.
* feat: Extra debugging support for model conversion - added BF16 support for llama-callback-eval and support for dumping intermediate steps in run-org-model.py
This commit updates the modelcard.template file used in the model
conversion scripts for embedding models to include the llama-server
--embeddings flag in the recommended command to run the model.
The motivation for this change was that when using the model-conversion
"tool" to upload the EmbeddingGemma models to Hugging Face this flag was
missing and the embedding endpoint was there for not available when
copy&pasting the command.
* model-conversion : remove hardcoded /bin/bash shebangs [no ci]
This commit updates the bash scripts to use env instead of using
hardcoded /bin/bash in the shebang line.
The motivation for this is that some systems may have bash installed
in a different location, and using /usr/bin/env bash ensures that
the script will use the first bash interpreter found in the user's
PATH, making the scripts more portable across different environments.
* model-conversion : rename script to .py [no ci]
This commit renames run-casual-gen-embeddings-org.sh to
run-casual-gen-embeddings-org.py to reflect its Python nature.
This commit adds a curl script to the model-conversion examples
which is currently missing. This script is required for the running the
embedding server targets to test llama-server embeddings functionality.
This commit adds a new target to the Makefile for converting models that
are multimodal. This target will convert the original model and in
addition also create the mmproj GGUF model.
The motivation for this change is that for models that are multimodal,
for example those that contain a vision encoders, we will often want to
upload both the quantized model and the vision encoder model to
HuggingFace.
Example usage:
```console
$ make causal-convert-mm-model MODEL_PATH=~/work/ai/models/gemma-3-4b-it-qat-q4_0-unquantized/
...
The environment variable CONVERTED_MODEL can be set to this path using:
export CONVERTED_MODEL=/home/danbev/work/ai/llama.cpp/models/gemma-3-4b-it-qat-q4_0-unquantized.gguf
The mmproj model was created in /home/danbev/work/ai/llama.cpp/models/mmproj-gemma-3-4b-it-qat-q4_0-unquantized.gguf
```
The converted original model can then be quantized, and after that both
the quantized model and the mmproj file can then be uploaded to
HuggingFace.
Refs: https://huggingface.co/ggml-org/gemma-3-4b-it-qat-GGUF/tree/main
This commit adds two targets to the Makefile for quantizing of
Quantization Aware Trained (QAT) models to Q4_0 format.
The motivation for this is that this sets the token embedding and the
output tensors data types to Q8_0 instead of the default Q6_K. This is
someting that we wish to enforce for QAT Q4_0 models that are to be
uploaded to ggml-org on Huggingface to guarantee the best quality.
* model-conversion: add model card template for embeddings [no ci]
This commit adds a separate model card template (model repository
README.md template) for embedding models.
The motivation for this is that there server command for the embedding
model is a little different and some addition information can be useful
in the model card for embedding models which might not be directly
relevant for causal models.
* squash! model-conversion: add model card template for embeddings [no ci]
Fix pyright lint error.
* remove --pooling override and clarify embd_normalize usage
* examples : add model conversion tool/example
This commit adds an "example/tool" that is intended to help in the
process of converting models to GGUF. Currently it supports normal
causal models and embedding models. The readme contains instructions and
command to guide through the process.
The motivation for this to have a structured and repeatable process for
model conversions and hopefully with time improve upon it to make the
process easier and more reliable. We have started to use this for new
model conversions internally and will continue doing so and improve it
as we go along. Perhaps with time this should be placed in a different
directory than the examples directory, but for now it seems like a good
place to keep it while we are still developing it.
* squash! examples : add model conversion tool/example
Remove dependency on scikit-learn in model conversion example.
* squash! examples : add model conversion tool/example
Update transformer dep to use non-dev version. And also import
`AutoModelForCausalLM` instead of `AutoModel` to ensure compatibility
with the latest version.
* squash! examples : add model conversion tool/example
Remove the logits requirements file from the all requirements file.