llama.cpp/ggml.c at 17f6c1ef3bdb8332393ea8da008023134291b0c3

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-09 10:17:06 +00:00

Files

Francis Couture-Harpin 8fb57ac0fb llama : use im2col and mul_mat to perform convolution for Mamba

This removes the need for ggml_ssm_conv!!!
But performance seems slighly worse on my system,
especially for prompt processing.
Maybe ggml_mul_mat isn't optimized for small row sizes?
More performance testing is necessary until GGML_OP_SSM_CONV is removed.

* ggml : make ggml_ssm_scan not modify its source tensors

* llama : fix shared recurrent tail cell count for small ubatch sizes

Otherwise it was impossible to run the 'parallel' example with '-ub 1'
with a Mamba or Jamba model.

2024-06-03 00:01:41 -04:00

738 KiB

Raw Blame History

View Raw

738 KiB Raw Blame History

738 KiB

Raw Blame History