llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-28 08:31:25 +00:00

Author	SHA1	Message	Date
Francis Couture-Harpin	9d873d7543	test-model-random : shuffle across sequences but not within There isn't really a use-case for fully-shuffled batches * test-model-random : use F32 as the KV cache type Temporary until F16 is fixed on ARM when using FP16_VECTOR_ARITHMETIC	2025-06-18 15:07:24 -04:00
Francis Couture-Harpin	04b8f5143d	Merge branch 'master' into compilade/test-model-random	2025-06-16 21:45:48 -04:00
Francis Couture-Harpin	352703b08b	test-model-random : better default tensor initialization distribution	2025-06-16 21:37:45 -04:00
Francis Couture-Harpin	dfa3c18266	tests : add LLAMA, LLAMA4, and GEMMA2 to test-model-random	2025-06-13 20:02:47 -04:00
Francis Couture-Harpin	8fe213af76	tests : avoid sprintf in test-model-random	2025-06-12 02:48:11 -04:00
Francis Couture-Harpin	7657835b33	tests : fix overflow and memory leaks in test-model-random * tests : fix integer types in test-model-random	2025-06-12 02:41:36 -04:00
Francis Couture-Harpin	9cd402cbe1	tests : add test-model-random This generates random models and then tests different concurrencies of batches to check if the output is consistent. This can detect when e.g. the recurrent cache has been broken, or anything else which would affect the consistency of the output when inferencing multiple distinct sequences. More architectures will be added, but for now this starts with Mamba. Eventually, consistency of pooled embeddings will also be tested. The goal is to reduce accidental regressions by making it easy to quickly test a lot of edge cases on the supported architectures, without having to download any model.	2025-06-12 01:00:57 -04:00

7 Commits