Georgi Gerganov
2f37014073
lookahead : add sample command to readme ( #15447 )
...
* lookahead : add sample command to readme
* cont : build-agnostic command
2025-08-20 13:30:46 +03:00
R0CKSTAR
a094f38143
musa: fix build warnings ( #15258 )
...
* musa: fix build warnings
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* fix warning: comparison of integers of different signs: 'const int' and 'unsigned int' [-Wsign-compare]
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
b6210
2025-08-20 10:17:37 +08:00
lhez
fb22dd07a6
opencl: mark argsort unsupported if cols exceed workgroup limit ( #15375 )
b6209
2025-08-19 11:25:51 -07:00
Georgi Gerganov
9ef6b0b835
model : add gpt-oss type strings ( #15424 )
b6208
2025-08-19 19:58:28 +03:00
Gian-Carlo Pascutto
1e19f5d462
common : Add top-nsigma sampler to help globally ( #15428 )
...
Fixes #15423 .
b6207
2025-08-19 19:58:14 +03:00
Georgi Gerganov
d2fcd91cf9
server : disable context shift by default ( #15416 )
...
* server : disable context shift by default
ggml-ci
* server : make scopr of test parameters local
2025-08-19 16:46:37 +03:00
SHUAI YANG
a6d3cfe7fa
CANN: optimize rope operator ( #15335 )
...
* optimize rope ops
* amendment
* delete trailing whitespace
* change the variable name
b6205
2025-08-19 21:28:22 +08:00
R0CKSTAR
67f09a3a27
musa: handle __hgt2_mask, available starting from MUSA SDK rc4.3.0 ( #15413 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
b6204
2025-08-19 12:33:47 +02:00
Marvin Gießing
6424594c56
ggml-cpu: add mxfp4 VSX intrinsics for Power9+ (ppc64le) hardware ( #15385 )
...
* Added VSX intrinsics for Power9+ systems
Signed-off-by: mgiessing <marvin.giessing@gmail.com >
* Manual unrolling for minor perf improvement
Signed-off-by: mgiessing <marvin.giessing@gmail.com >
* Update ggml/src/ggml-cpu/arch/powerpc/quants.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Signed-off-by: mgiessing <marvin.giessing@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-08-19 11:54:31 +03:00
Xuan-Son Nguyen
e9288e8869
chat : clarify the meaning of reasoning_format ( #15408 )
...
* chat : clarify the meaning of reasoning_format
* add link to this PR
b6202
2025-08-19 10:29:36 +02:00
Georgi Gerganov
9d262f4bad
server : remove swa_full warning ( #15399 )
b6201
2025-08-19 08:45:26 +03:00
Georgi Gerganov
f0d3c7405c
batched-bench : use rand tokens ( #15398 )
2025-08-19 08:45:12 +03:00
Xuan-Son Nguyen
f08c4c0d8d
mtmd : clean up clip_n_output_tokens ( #15391 )
b6199
2025-08-18 22:53:52 +02:00
Georgi Gerganov
6d7f1117e3
codeowners : remove mmv.*
2025-08-18 22:06:44 +03:00
Georgi Gerganov
60212f1ead
sync : ggml
2025-08-18 22:06:44 +03:00
Georgi Gerganov
f0c541d315
scripts : update sync scripts
2025-08-18 22:06:44 +03:00
Sigbjørn Skjæret
baa9255a45
llama : merge conts and reshapes and remove unnecessary cont ( #15380 )
...
* remove unnecessary conts and merge reshapes
* restore necessary conts
* merge more conts and reshapes
* merge even more conts and reshapes
b6195
2025-08-18 19:30:17 +02:00
Georgi Gerganov
3007baf201
readme : update hot topics ( #15397 )
2025-08-18 18:11:44 +03:00
davidef
d1d8241600
server : fix incoming tasks not process in order ( #15395 )
b6193
2025-08-18 17:51:42 +03:00
Dobri Danchev
618575c582
Fix broken build: require updated pip to support --break-system-packages ( #15357 )
...
* Revert "devops : fix compile bug when the BASE_CUDA_DEV_CONTAINER is based on Ubuntu 24.04 (#15005 )"
This reverts commit e4e915912c .
* devops: Allow pip to modify externally-managed python environment (system installation)
- Updated pip install commands to include the --break-system-packages
flag, ensuring compatibility when working with system-managed Python
environments (PEP 668).
- Note: The --break-system-packages option was introduced in 2023.
Ensure pip is updated to a recent version before using this flag.
fixes [#15004 ](https://github.com/danchev/llama.cpp/issues/15004 )
2025-08-18 12:50:48 +02:00
compilade
f44f793172
ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors ( #15379 )
...
* ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors
* ggml-quants : avoid division by zero in make_q3_quants
b6191
2025-08-18 09:23:56 +02:00
Jeff Bolz
ae532eac2c
vulkan: disable spirv-opt for bfloat16 shaders ( #15352 )
b6190
2025-08-18 07:56:29 +02:00
Oleksandr Kuvshynov
e5155e6986
server : export max observed n_past value ( #15361 )
...
Add tracking for high watermark cache usage and make it available in /metrics endpoint.
Use-case: Tracking largest needed cache usage under realistic workload
to better understand memory requirements and be able to adjust
cache size/quantization for model/cache accordingly.
b6189
2025-08-18 00:28:58 +02:00
Jeff Bolz
21c17b5bef
vulkan: Use larger workgroups for mul_mat_vec when M is small ( #15355 )
...
* vulkan: Use larger workgroups for mul_mat_vec when M is small
Also use subgroup instructions for (part of) the reduction when supported.
Without this, the more expensive reductions would eat into the benefits of
the larger workgroups.
* update heuristic for amd/intel
Co-authored-by: 0cc4m <picard12@live.de >
---------
Co-authored-by: 0cc4m <picard12@live.de >
b6188
2025-08-17 18:08:57 +02:00
Dong Won Kim
19f4decae0
vulkan: support sqrt ( #15370 )
b6187
2025-08-17 16:03:09 +02:00
Sigbjørn Skjæret
4d196981d4
convert : force patch_embd weights to F16 or F32 to avoid broken GGUFs ( #15367 )
...
* force patch_embd weights to f32
* use MmprojModel base tensor_force_quant instead
2025-08-17 14:47:42 +02:00
Sigbjørn Skjæret
b143fbc87a
ci : fix hang in windows-hip build/release ( #15365 )
...
* fix hang in windows-latest-cmake-hip
* apply fix to release as well
b6185
2025-08-17 13:30:23 +02:00
Jeff Bolz
de5627910d
vulkan: Optimize argsort ( #15354 )
...
- Launch an appropriate number of invocations (next larger power of two).
32 invocations is common and the barrier is much cheaper there.
- Specialize for "needs bounds checking" vs not.
- Make the code less branchy and [[unroll]] the loops. In the final code,
I see no branches inside the main loop (only predicated stores) when
needs_bounds_check is false.
- Always sort ascending, then apply the ascending vs descending option when
doing the final stores to memory.
- Copy the values into shared memory, makes them slightly cheaper to access.
b6184
2025-08-17 10:41:45 +02:00
Tarek Dakhran
65349f26f2
model : support vision LiquidAI LFM2-VL family ( #15347 )
...
* wip lfm2 vision model
* Fix conv weight
* Implement dynamic resolution
* Fix cuda
* support LFM2-VL-450M
* happy CI
* Remove extra `ggml_conv` and put others into the right place
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
b6183
2025-08-16 23:33:54 +02:00
Jeff Bolz
1fe00296f5
vulkan: fuse adds ( #15252 )
...
* vulkan: fuse adds
Fuse adds that have the same shape, which are common in MoE models.
It will currently fuse up to 6 adds, because we assume no more than
8 descriptors per dispatch. But this could be changed.
* check runtimeDescriptorArray feature
* disable multi_add for Intel due to likely driver bug
b6182
2025-08-16 11:48:22 -05:00
Jeff Bolz
de2192794f
vulkan: Support mul_mat_id with f32 accumulators ( #15337 )
...
* vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id
* vulkan: Support mul_mat_id with f32 accumulators, but they are not hooked up
- There's no explicit way to request f32 precision for mul_mat_id, but there
probably should be, and this gets the code in place for that.
- A couple fixes to check_results.
- Remove casts to fp16 in coopmat1 FA shader (found by inspection).
b6181
2025-08-16 11:18:31 +02:00
Jeff Bolz
2e2b22ba66
vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id ( #15334 )
b6180
2025-08-16 10:58:38 +02:00
rmatif
912ff8c119
OpenCL: add initial FA support ( #14987 )
...
* add F16/F16 fa support
* fix kernel init
* use mad instead of fma
* use inline function
* mark FA with sinks as unsupported for now
* add pragma unroll to loops
b6179
2025-08-16 01:05:55 -07:00
Daniel Bevenius
5e6229a840
common : fix double bos, use common_chat_templates for add_bos and add_eos ( #15326 )
...
This commit updates common_chat_templates_apply_jinja to use the
the add_bos and add_eos parameters from the chat template instead of
the inputs.
The motivation for this is that currently if the `add_bos` and `add_eos`
from the input parameters are used it is possible to there will be a
missmatch between the model and the chat template which can lead to the
the removal of duplicate BOS/EOS tokens in chat.cpp `apply` to not
happen leading to two BOS tokens being added to the template.
b6178
2025-08-15 19:50:52 +02:00
lhez
e2c1bfff53
opencl: add initial mxfp4 support via mv ( #15270 )
...
* opencl: add reference `mul_mv_mxfp4_f32`
* opencl: add reference `mul_mv_id` for mxfp4
* Q4_0 tranpose fix for Adreno
---------
Co-authored-by: shawngu-quic <shawngu@qti.qualcomm.com >
b6177
2025-08-15 09:52:14 -07:00
Georgi Gerganov
5edf1592fd
vulkan : fix out-of-bounds access in argmax kernel ( #15342 )
...
ggml-ci
b6176
2025-08-15 16:16:36 +02:00
Georgi Gerganov
db3010bd23
vulkan : fix compile warnings on macos ( #15340 )
...
ggml-ci
b6175
2025-08-15 15:28:28 +02:00
Aaron Teo
ff27f80a74
ggml: initial IBM zDNN backend ( #14975 )
...
* ggml-zdnn: inital backend impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: temp change z17 to arch15
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: fix build bugs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: tensor->extra logging check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add layout name mapping, ztensor information
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: separate logging into its own line
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add shape comparison
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add ggml_tensor shape log
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: fix incorrect shape logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add output buffer check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: run compute and store into tensor->extra
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add set_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add more loggers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update set_tensor logging to check only for matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: last working matmul version
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add comments to prevent accidentally deleting lines
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: support op out_prod
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update op out_prod to use tensor->extra
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: rewrite the backend implementation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bugfix new impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix compiler warnings and bugfixes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: test ztensor finding in init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: implement at least 1 op to test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: assign tensor->extra to buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add check for view tensors to prevent init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: rework init_tensor to create new buffers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: switch to std vector instead of array
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: switch buffers back and set to arbitrary number
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: impl init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update supports_op matmul matrix
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix incorrect ztensor shape, reduce memory padding
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: impl matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix compiler error missing type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing data transform call
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add bias init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: tighten memory usage, change string allocation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add bias ztensor and data free
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add bias data transform
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add more debug info for extra buffer transform
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add logger to check if mat mul ops go through set_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: activate bias transform in matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: move weights transform into mulmat
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add more safeguards in matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix sequencing of transforms
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bugfix transform ztensor vs origtensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: figure out why sigtrap is happening
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix sigsegv
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: move everything back to local declaration
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: move bias data to local also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bring back working matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: rewrite into mre
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing vector import
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing vector import in header
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt to fix sigsegv
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing load tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix invalid ztensor buffer release
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add logging to debug free buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: remove free_buffer debug info
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add parmblkformat detections
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add nnpa installed detection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add zdnn_init call for static libs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at fixing invalid buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: switch to using deque to fix pointer deref problem
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add weights logging to check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt to use unique ptr
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add tensor to pre_tfm_desc logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add inputs logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable op_none initialisation for testing
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing return from init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: load ztensors in cgraph exec
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: work on moving output ztensor as well
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable logging and breakpoints for full test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at manually changing the layout
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at using default nwhc format instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable global load ztensor for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix errorenous output load tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add guards to prevent loading ztensor if transformed
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bring load ztensor back to init routine
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix ztensor deallocation abort
stabilise ggml <-> zdnn api
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: clean up matmul selection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: clean up project structure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update documentation, prepare for upstream
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* chore: add codeowners
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable batched matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at fixing tensor views during matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: deny all view tensors directly
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix pr comments
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: update ops docs for zdnn
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: redo test-backend-ops for ops.md
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix typo in build-s390x.md
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* codeowners: remove taronaeo for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "codeowners: remove taronaeo for now"
This reverts commit 411ea4ed78 .
* ggml-zdnn: remove unused ggml_zdnn macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
b6174
2025-08-15 21:11:22 +08:00
Sigbjørn Skjæret
d3248d9b65
ci : fix ios-xcode-build ( #15324 )
...
* fix ios-xcode-build
* use xcode-select with fixed version
* switch to macos-15 to get xcode 16.4
b6173
2025-08-15 14:02:39 +02:00
Diego Devesa
7aeee88cfe
ci : move ccache action to ggml-org fork ( #15328 )
2025-08-15 12:27:02 +02:00
Johannes Gäßler
b07791aa1d
test-opt: fix backend support check ( #15317 )
...
* test-opt: fix backend support check
* Update tests/test-opt.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-08-15 11:23:17 +02:00
Johannes Gäßler
4227c9be42
CUDA: fix negative KV_max values in FA ( #15321 )
2025-08-14 23:21:24 +02:00
Georgi Gerganov
df36bce667
eval-callback : stop on first NaN ( #15320 )
...
* eval-callback : stop on first NaN
* cont : log error
2025-08-14 22:10:51 +03:00
Diego Devesa
f75b830647
chat : include kwargs in template example ( #15309 )
2025-08-14 10:28:29 -07:00
Daniel Bevenius
7a0de96045
llama : add 18-layer model type for Gemma 3-270m ( #15319 )
...
This commit adds support for the 18-layer model type in the Gemma3
series, which is the size of the Gemma3-270m model.
The motivation for this commit is was the only change required for
Gemma3-270m to be converted to GGUF format and used with llama.cpp.
Once the model has been converted and uploaded to Huggingface it can be
used like this:
```console
$ ./build/bin/llama-cli -hf ggml-org/gemma-3-270m-GGUF:Q8_0
```
2025-08-14 17:56:26 +02:00
simevo
e4e915912c
devops : fix compile bug when the BASE_CUDA_DEV_CONTAINER is based on Ubuntu 24.04 ( #15005 )
...
fixes #15004
Co-authored-by: Paolo Greppi <paolo.greppi@libpf.com >
2025-08-14 18:45:27 +03:00
uvos
5ba36f6103
HIP: Cleanup hipification header ( #15285 )
...
add expicit conversion operator to support older versions of rocm
Switch over to hip_bf16 from legacy hip_bfloat16
Simplify RDNA3 define
Reduce swap over of new hipblas api to rocm 6.5 as this version is used for rocm 7.0 previews
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
2025-08-14 16:23:56 +02:00
Aldehir Rojas
b204a5a234
gpt-oss: implement harmony parsing ( #15181 )
...
* model : add harmony parser for gpt-oss
* gpt-oss : fix grammar trigger from causing empty stack
* gpt-oss: tweak the grammar trigger again
* gpt-oss : add support for recipient in role header
* gpt-oss : fix ungrouped tool calls in grammar
* gpt-oss : loosen function name matching during parse
* gpt-oss : clean up workarounds
* gpt-oss : add template tests
* gpt-oss : simulate thinking and tool call tags
* gpt-oss : undo think tags when reasoning_format is none
* gpt-oss : set special tokens back to user defined
* gpt-oss : update openai-gpt-oss template
* server : filter out harmony thought messages
* gpt-oss : simplify parsing
2025-08-14 17:23:11 +03:00
Christian Kastner
646944cfa8
docker : Enable GGML_CPU_ALL_VARIANTS for ARM ( #15267 )
2025-08-14 16:22:58 +02:00
Georgi Gerganov
1a01899b61
readme : update hot topics ( #15315 )
2025-08-14 17:16:03 +03:00