mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-11-05 09:36:52 +00:00
* Support attention_bias on LLaMA architecture QKVO bias, should fix InternLM (https://github.com/ggerganov/llama.cpp/issues/3133) and works for LLaMAfied Qwen models (https://github.com/ggerganov/llama.cpp/pull/3743#issuecomment-1825923608). * check existence of qkvo bias while loading llama models Tested on LLaMA2, CUDA and CPU. * Update llama.cpp
373 KiB
373 KiB