llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-27 08:21:30 +00:00

Files

hipudding f9bc66c3eb CANN: Update several operators to support FP16 data format (#16251 )

Many Ascend operators internally use FP16 precision for computation.
If input data is in FP32, it must first be cast to FP16 before
computation, and then cast back to FP32 after computation, which
introduces unnecessary cast operations. Moreover, FP16 computation
requires significantly less workload compared to FP32, leading to
noticeable efficiency improvements.

In this change, `get_rows`, `rms_norm`, and `flash_attn_ext` are extended
to support multiple data types. Validation on the Qwen2 0.5b model shows
correct accuracy and about 10% performance gain in concurrent scenarios.

Co-authored-by: noemotiovon <757486878@qq.com>

2025-10-13 08:52:22 +08:00

cmake

ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )

2025-08-07 13:45:41 +02:00

include

rpc : add support for multiple devices (#16276 )

2025-10-04 12:49:16 +03:00

src

CANN: Update several operators to support FP16 data format (#16251 )

2025-10-13 08:52:22 +08:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml webgpu: profiling, CI updates, reworking of command submission (#16452 )

2025-10-07 13:48:56 -07:00