Default Branch

945501f5ea · llama: fix leaked buffers for mmap + split files (#16765) · Updated 2025-10-27 08:17:31 +00:00

Branches

1180752835 · cuda : support Falcon-H1 state size for SSM_SCAN · Updated 2025-07-09 16:18:37 +00:00    CS348Project

901
1

4d6a179c68 · gguf-py : avoid adding duplicate tensor mappings for Jamba · Updated 2025-07-09 15:58:35 +00:00    CS348Project

901
61

b7c6ece5b5 · ggml-ci · Updated 2025-07-09 12:13:34 +00:00    CS348Project

906
24

7634d14d7a · test-model-random : fix seq_id buffer overflow · Updated 2025-07-08 22:23:58 +00:00    CS348Project

906
18

2ff3354c33 · memory : fix broken batch splits for recurrent cache · Updated 2025-07-08 01:23:14 +00:00    CS348Project

917
1

996195299e · up. · Updated 2025-07-07 21:42:40 +00:00    CS348Project

1069
6

bf8b39015f · metal : reuse graphs · Updated 2025-07-07 18:37:07 +00:00    CS348Project

925
3

886da0a2c5 · kv-cache : prepare K/V buffers for separation · Updated 2025-07-04 07:13:16 +00:00    CS348Project

936
1

dfceb012ee · llama : add "virtual sequences" · Updated 2025-07-02 17:26:55 +00:00    CS348Project

947
8

71bef66591 · cuda : graceful fallback for Mamba-1 models with weird embd size · Updated 2025-07-02 07:49:36 +00:00    CS348Project

956
44

6179578988 · batch : require non-coupled batch with sequential split_equal · Updated 2025-06-25 14:20:46 +00:00    CS348Project

1013
29

37bdfbef8c · wip 3 · Updated 2025-06-24 08:04:18 +00:00    CS348Project

1013
21

ae96333923 · metal : fix thread-safety · Updated 2025-06-20 13:42:54 +00:00    CS348Project

1039
1

6fb2f2e8a9 · ggml : fix repack work size for mul_mat_id · Updated 2025-06-20 07:34:16 +00:00    CS348Project

1042
1

59fee24c72 · recurrent : rework graph inputs + add TODOs · Updated 2025-06-18 06:29:51 +00:00    CS348Project

1066
31

d3d06debe3 · server : add pidfile option · Updated 2025-06-17 20:47:53 +00:00    CS348Project

1067
1

4b2233befb · Vulkan: Set device max size for host memory to avoid OOM warning and fallback to CPU buffer · Updated 2025-06-17 20:25:42 +00:00    CS348Project

1069
1

36fce98281 · server : re-enable swa speculative decoding · Updated 2025-06-12 08:51:15 +00:00    CS348Project

1110
1

ed99a8ea04 · cont : fix comments · Updated 2025-06-12 07:43:55 +00:00    CS348Project

1113
3

4b6fb6524b · context : round n_tokens to next multiple of n_seqs when reserving · Updated 2025-06-11 20:19:17 +00:00    CS348Project

1116
1