llama.cpp/utils.cpp at 04c6f5ed6fafd63601fa06757877ed5ccf9d5991

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-29 08:41:22 +00:00

Files

Georgi Gerganov 7a9b6c3a8b Reduce memory usage and allocate enough memory for largest context (#473 )

* Reduce memory usage and allocate enough memory for large contexts

* Simpler scratch buffer usage

* Reenable BLAS for quantized mul_mat

* Fix number of layers in 30B and 65B

* Fix KV cache size for F32

2023-03-24 23:17:37 +02:00

9.9 KiB

Raw Blame History

View Raw

9.9 KiB Raw Blame History

9.9 KiB

Raw Blame History