llama.cpp/convert_hf_to_gguf.py at 3bc7103d2ef1c41cd380a1ad8d918cf9c26694d8

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-03 09:22:01 +00:00

Files

Francis Couture-Harpin 3bc7103d2e ggml : avoid multiply by D in GGML_OP_SSM_SCAN

This makes the weight buft detection in src/llama.cpp simpler.

* convert : transpose Mamba-2 A, D and reshape SSM_NORM

This breaks existing conversions of Mamba-2 models
to avoid some reshapes.

Not sure if it's a good idea,
but it makes the graph slightly cleaner.

* llama : more appropriate SSM_SCAN and SSM_CONV buft support checks

2024-11-04 13:29:47 -05:00

203 KiB

Executable File

Raw Blame History

View Raw

203 KiB Executable File Raw Blame History

203 KiB

Executable File

Raw Blame History