mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-11-01 09:01:57 +00:00
* 9B - query_pre_attn_scalar = 256 not 224
See 03e657582d
Gemma 9b should use 256 and not 224 (self.config.hidden_size // self.config.num_attention_heads)
* llama : fix Gemma-2 Query scaling factor
ggml-ci
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
160 KiB
Executable File
160 KiB
Executable File