llama.cpp/utils.h at 6f1ee4b640912211a4b07965c585d327e32e734d

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-06 09:46:50 +00:00

Files

Georgi Gerganov 7a9b6c3a8b Reduce memory usage and allocate enough memory for largest context (#473 )

* Reduce memory usage and allocate enough memory for large contexts

* Simpler scratch buffer usage

* Reenable BLAS for quantized mul_mat

* Fix number of layers in 30B and 65B

* Fix KV cache size for F32

2023-03-24 23:17:37 +02:00

2.1 KiB

Raw Blame History

View Raw

2.1 KiB Raw Blame History

2.1 KiB

Raw Blame History