compilade
5d46babdc2
llama : initial Mamba-2 support ( #9126 )
...
* llama : initial Mamba-2 support
* ggml : SIMD ggml_ssm_scan for Mamba-2
* ggml : improve ggml_mul speed when masking recurrent states
* llama : support running Mamba-Codestral-7B-v0.1
* llama : fix Mamba-2 conv state saving
* ggml : make the ggml_mul fast broadcast path more consistently formatted
* llama : remove unused variable
* llama : add missing break
* convert_hf : prefer SentencePiece tokenizer for Mamba-2 when present
The tokenzier.json of Mamba-Codestral-7B-v0.1 otherwise requires
workarounds to work correctly.
* llama : avoid redundant state copy for Mamba 1 and 2
* metal : attempt to adapt SSM_SCAN for Mamba-2
* metal : fix SSM_SCAN pipeline scope
* metal : use log and exp instead of log1pf and expf in SSM_SCAN
* metal : remove unused arguments for SSM_SCAN
The max index is 31, so trimming the arguments is necessary.
* metal : add back n_seqs to SSM_SCAN args
Whoops, this is needed for the offset in the concatenated output.
* metal : fix SSM_SCAN state head offset
* metal : fix wrong number of tokens per sequence in SSM_SCAN
* ggml : remove unused fast broadcast path in GGML_MUL
This was initially added because states were masked with ggml_mul,
but this is no longer done and so this "optimisation" is no longer
necessary, or at least not worth the additional code complexity.
* ggml : avoid multiply by D in GGML_OP_SSM_SCAN
This makes the weight buft detection in src/llama.cpp simpler.
* convert : transpose Mamba-2 A, D and reshape SSM_NORM
This breaks existing conversions of Mamba-2 models
to avoid some reshapes.
Not sure if it's a good idea,
but it makes the graph slightly cleaner.
* llama : more appropriate SSM_SCAN and SSM_CONV buft support checks
* convert : fix flake8 lint
* metal : fix confusion between ; and ,
* metal : add missing args for nb references in ssm_scan_f32_group
* metal : single-user mamba2 inference works
* kv-cache : remove const_cast when setting inputs for s_copy
And also fix multi-user inference for recurrent models
by using cell_id instead of i as the kv cell index
when populating s_copy.
* convert : avoid AutoConfig for Mamba and Mamba2 hparams
* kv-cache : allow context shift for recurrent models
* graph : fix recurrent state copies when avoiding copies
Works, but using lambda functions might not be that clean.
* ggml : fix mamba2 ssm scan when compiled with SVE
* ggml-cpu : reorder SVE FMA for consistency with other SIMD arches
* cuda : implement ssm scan for Mamba2
There is still room for improvement, but it works!
* cuda : adapt Mamba1 ssm scan to shape changes from Mamba2
* mamba : fix mismatched new and delete size for llm_build_mamba
Subclasses of llm_graph_context cannot have extra fields,
because the called destructor is not the one from the subclass.
This otherwise would cause problems when runnning Mamba-(1|2) inference
when compiled -DGGML_SANITIZE_ADDRESS=ON
* cuda : graceful fallback for Mamba-1 models with weird embd size
2025-07-02 13:10:24 -04:00
Aaron Teo
60ef23d6c1
ggml-cpu: enable IBM NNPA Vector Intrinsics ( #14317 )
...
* ggml-cpu: add nnpa compile flag
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 4a9f60c201 )
* ggml-cpu: add fp16->fp32 nnpa first
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 8d4a7987f9 )
* ggml-cpu: add fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 0ff0d65162 )
* ggml-cpu: better variable names
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 2f58bbcbb8 )
* docs: update s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 01b929491b )
* ggml-cpu: add debugging prints to see if dlf16 is correct
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix print vs printf
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix float placeholder
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: ensure fp16 and fp32 load and stores are called
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fp16 load ensured to hit
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove sigint from fp16 store
for some reason, the function is not getting a hit when debugged with
gdb. we will need to investigate further
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: nnpa switch to vec_xst test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: switch to vec_xst for 4 element loops also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: rework noop
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove noop, general code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: clarify variable naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add breakpoint for debugging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: test fix for conversion failure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: disable fp32->fp16 nnpa conversions for now
there are some conversion failures in nnpa that requires the eyes of an
ibm stsm. will create a separate pr to introduce the fp32->fp16 change.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: switch to elif macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: reattempt fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix typo
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: reattempt fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix compiler types
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: change to typedef vector types
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add 4 element loops for fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: clarified vector naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: bring back fp32->fp16 store nnpa
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: activate nnpa fp32->fp16 or fp16->fp32 compute
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add nnpa macro check in ggml-impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add missing __func__
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: diagnose why __NNPA__ macro is not being defined
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: import vecintrin.h to fix compiler errors
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: update macro tests
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move s390x typedef to own header file
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: move s390x typedef to own header file"
This reverts commit 157f856c34 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: switch to importing ggml-cpu-impl instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix macro declaration
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: test more macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add debug prints
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: bruteforce macro definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move macro definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add ggml-impl.h to cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: switch to private macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move s390x typedef to own header file
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 157f856c34 )
* ggml-cpu: move things around
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: bring back compile macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: switch to quotes for import
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add compiler error macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add s390x detection in ggml-src
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: bring back compile definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: undo cmakelists work
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: move s390x typedef to own header file"
This reverts commit 18d79e1a30 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove typedefs.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove typedef from cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add ggml-impl.h future notes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add todo comment for future reference
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: clarify naming of dlf16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove unnecessary target compile definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move nnpa fp16->fp32 and fp32->fp16 to simd-mappings
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: update broken huggingface link for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix duplicate func names during compile
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: fix duplicate func names during compile"
This reverts commit fbb733451f .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu"
This reverts commit bd288e8fa5 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: refactor fp16<->fp32 simd to ggml-cpu
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix missing simd-mappings.h import in quants.c
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix missing simd-mappings.h within repack
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix amx mmq missing simd-mappings.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: attempt at fixing loongarch failing build
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move nnpa together with other fp16<->fp32 simd
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix wrong refactor of ggml-base
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164176555
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: remove dependency on ggml-cpu from ggml-base
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: rename all fp16<->fp32 macros to prefix with ggml_cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164449406
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove mistaken fallback macro
fallback logic was already implemented but i was too sleepy to realise
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: move ggml_table_f32_f16 to ggml-cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures"
This reverts commit 32a3533564 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml: move ggml_table_f32_f16 to ggml-cpu"
This reverts commit 9e40d984ad .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: move ggml_table_f32_f16 to ggml-cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 9e40d984ad )
* ggml: move ggml_table_f32_f16 to ggml-cpu.c
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: extern c ggml_table_f32_f16 + chore docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h
we rely on the variable declaration in ggml-cpu.c instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h"
This reverts commit f71b21d2f7 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: bring back ggml_table_f32_f16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: bring back ggml_table_f32_f16"
This reverts commit 2dce119178 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* fix ggml time initialization
* fix f32_f16 table init
* remove extra line
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
Co-authored-by: slaren <slarengh@gmail.com >
2025-06-25 23:49:04 +02:00
Aaron Teo
50d2227953
ggml-cpu: reduce asm calls for hsum ( #14037 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-18 18:10:08 +01:00
Vineel Abhinav
1b8fb8152d
ggml: aarch64: Implement SVE F32 kernels for vector functions ( #13843 )
...
* F32-Mamba-SVE
* F32-Mamba-SVE
* Resolve test errors-1
* Resolve test errors-2
* F32-vec-SVE
* F32-vec-SVE
* F32-vec-SVE
2025-05-29 09:01:33 +03:00
shalinib-ibm
416313773b
ggml : fix ppc64le build ( #13176 )
...
Build fails with compilation error on power pc.
This patch fixes the same.
Tested with unit tests run via
--build <build_dir> && cd <build_dir> && make test
Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com >
2025-04-30 13:17:08 +02:00
Aaron Teo
0fed24c347
ggml: fix compilation error s390x ( #12848 )
...
* ggml: fixes #12846 compilation error
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com >
* ggml: add documentation for code change
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com >
* ggml: refactor to type-cast and update documentation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com >
* ggml: update documentation to provide full issue link
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com >
---------
Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com >
2025-04-11 08:20:07 +03:00
Prajwal B Mehendarkar
11d07e1e69
Fixes #12823 ( #12830 )
...
* Including limits file on AIX
* Fixes #12823
2025-04-10 01:18:01 +02:00
Georgi Gerganov
ff067dbcb9
ggml : simplify Arm fp16 CPU logic (ggml/1177)
...
* ggml : simlpify Arm fp16 CPU logic
ggml-ci
* cont : bring back CUDA/MUSA checks
ggml-ci
2025-04-07 18:44:17 +03:00
cmdr2
995083e4ed
cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167)
...
* cpu: refactor SIMD mappings and vectorized op functions into separate files
* Fix warning for ggml_float to float
* Fix warnings
* cpu: move all the operations (except mul_mat) to a separate c++ file
* fix whitespace
* Update ggml/src/ggml-cpu/vec.h
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* Fix PR comments - use GGML_UNUSED, use cassert in ops.cpp
* Reverse the order of import for ops.h and vec.h, to match what was present in ggml-cpu.c previously
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-04-07 18:44:17 +03:00