llama.cpp/llama.cpp at b220222a64ce760bfbec9c770f11db3ec6a6abb6

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-05 09:36:52 +00:00

Files

CausalLM 03562f3a86 llama : support attention bias on LLaMA architecture (#4283 )

* Support attention_bias on LLaMA architecture

QKVO bias, should fix InternLM (https://github.com/ggerganov/llama.cpp/issues/3133) and works for LLaMAfied Qwen models (https://github.com/ggerganov/llama.cpp/pull/3743#issuecomment-1825923608).

* check existence of qkvo bias while loading llama models

Tested on LLaMA2, CUDA and CPU.

* Update llama.cpp

2023-12-01 20:17:06 +02:00

373 KiB

Raw Blame History

View Raw

373 KiB Raw Blame History

373 KiB

Raw Blame History