llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-27 08:21:30 +00:00

Files

Georgi Gerganov 0f0a3c2851 metal : make the backend async (#15906 )

* metal : make the backend async

ggml-ci

* cont : add comments, extend op offload, clean up

ggml-ci

* metal : fix batch size for MUL_MAT_ID

* metal : remove deprecated ggml_backend_metal_buffer_from_ptr

* metal : create only metal buffers, no wrapping of host memory

ggml-ci

* metal : restore .alloc_buffer for buffer_from_ptr_type

ggml-ci

* metal : remove broken implementation of GGML_OP_SET

ggml-ci

* metal : clean-up loose ends, ready for tests

ggml-ci

* metal : support both private and shared buffers

ggml-ci

* metal : enable private buffers + add global device queue

* metal : disable host buffer to prevent races

ggml-ci

* metal : avoid extra copy during set_tensor

ggml-ci

* metal : use separate buffer types for shread and private Metal buffers

ggml-ci

* metal : simplify synchronization logic

ggml-ci

* metal : fix build

ggml-ci

* metal : do not implement cpy_tensor

ggml-ci

* metal : separate implementations for shared and private buffers

ggml-ci

2025-09-10 17:52:35 +03:00

ggml-alloc.h

ggml : upgrade init_tensor API to return a ggml_status (#11854 )

2025-02-28 14:41:47 +01:00

ggml-backend.h

llama : separate compute buffer reserve from fattn check (#15696 )

2025-08-31 15:49:03 +02:00

ggml-blas.h

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-cann.h

ggml : build backends as libraries (#10256 )