llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-31 08:51:55 +00:00

Author	SHA1	Message	Date
Neo Zhang Jianyu	a6a8f8d09c	Update docs/backend/SYCL.md Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>	2024-09-17 16:25:43 +08:00
arthw	8241151f16	set context default to avoid memory issue, update guide	2024-09-14 09:01:05 +08:00
Xuan Son Nguyen	feff4aa846	server : add loading html page while model is loading (#9468 ) * Adding loading page for '/' server requests * set content when model is loading * removed loading html file * updated cmakelist * updated makefile * cleaned up whitespace * cleanup for PR removed error * updated server test to handle 503 HTML * updated server test to handle 503 HTML * ca†ch 503 before parsing json * revert test * account for both api and web browser requests * precommit corrections * eol fix * revert changes to pre-commit * removed print statement * made loading message more descriptive * also support .html files --------- Co-authored-by: VJHack <flymyplane21@gmail.com> Co-authored-by: Vinesh Janarthanan <36610342+VJHack@users.noreply.github.com> b3751	2024-09-13 14:23:11 +02:00
Georgi Gerganov	0abc6a2c25	llama : llama_perf + option to disable timings during decode (#9355 ) * llama : llama_perf + option to disable timings during decode ggml-ci * common : add llama_arg * Update src/llama.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * perf : separate functions in the API ggml-ci * perf : safer pointer handling + naming update ggml-ci * minor : better local var name * perf : abort on invalid sampler pointer ggml-ci --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> b3750	2024-09-13 09:53:38 +03:00
Gilad S.	bd35cb0ae3	feat: remove a sampler from a chain (#9445 ) * feat: remove a sampler from a chain * fix: return removed sampler * fix: safer casting b3749	2024-09-13 03:54:49 +02:00
Mathijs Henquet	78203641fe	server : Add option to return token pieces in /tokenize endpoint (#9108 ) * server : added with_pieces functionality to /tokenize endpoint * server : Add tokenize with pieces tests to server.feature * Handle case if tokenizer splits along utf8 continuation bytes * Add example of token splitting * Remove trailing ws * Fix trailing ws * Maybe fix ci * maybe this fix windows ci? --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> b3748	2024-09-12 22:30:11 +02:00
Dou Xinpeng	e6b7801bd1	cann: Add host buffer type for Ascend NPU (#9406 ) * feat: Add host buffer type for Ascend NPU(CANN backend) * fix some checking errors * Add a few comments b3747	2024-09-12 19:46:43 +08:00
fengerhu1	e665744317	llava : fix the script error in MobileVLM README (#9054 ) Signed-off-by: Erhu Feng <2748250768@qq.com> b3746	2024-09-12 14:34:22 +03:00
Xuan Son Nguyen	d4c3c10fad	lora : raise error if lm_head is ignored (#9103 ) * lora : raise error if lm_head is ignored * fix style * clarify comment	2024-09-12 14:33:57 +03:00
Michael Podvitskiy	2a825116b6	cmake : fix for builds without `GGML_CDEF_PUBLIC` (#9338 ) * `GGML_TARGET_DEFINES-NOTFOUND` fix for builds without `GGML_CDEF_PUBLIC` * Update CMakeLists.txt, spaces fix b3744	2024-09-12 14:30:01 +03:00
Huang Qi	4dc4f5f14a	ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329 ) b3743	2024-09-12 14:28:43 +03:00
daminho	c837981bba	py : add Phi-1.5/Phi-2 tokenizer (#9361 ) * add phi2 tokenizer * add phi name to convert_hf_to_gguf_update.py * make tokenizer_pre consistent; llama.cpp work	2024-09-12 14:28:20 +03:00
Trivikram Kamat	3c26a1644d	ci : bump actions/checkout to v4 (#9377 )	2024-09-12 14:27:45 +03:00
Michael Podvitskiy	ff76e18516	cmake : fixed the order of linking libraries for llama-quantize (#9450 ) b3740	2024-09-12 14:27:14 +03:00
Molly Sophia	39f852f440	py : add special tokens in hf_converter for RWKV v6 (#9428 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-09-12 14:25:16 +03:00
Ahmad Tameem	2b00fa7997	riscv : modify Makefile and add a RISCV_VECT to print log info (#9442 ) - Added ggml_cpu_has_riscv_v() in GGML to print system info in log - Modified Makefile to only use flag when cross compiling for RISC-V b3738	2024-09-12 14:24:31 +03:00
Georgi Gerganov	d6a04f872d	ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408 ) * ggml : hide ggml_object, ggml_cgraph, ggml_hash_set ggml-ci * ggml : add ggml-impl.h to backends * ggml : fix compiler warnings ggml-ci * ggml : add assert upon adding nodes b3737	2024-09-12 14:23:49 +03:00
Neo Zhang Jianyu	c9c8575a1a	enhance run script to be easy to change the parameters (#9448 ) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com> b3736	2024-09-12 17:44:17 +08:00
Xinpeng Dou	df4b7945ae	cann: Fix error when running a non-exist op (#9424 ) b3735	2024-09-12 09:02:35 +08:00
Faisal Zaghloul	449ccfb6f5	Add Jais to list of supported models (#9439 ) Co-authored-by: fmz <quic_fzaghlou@quic.com>	2024-09-12 02:29:53 +02:00
slaren	1b28061400	llama : skip token bounds check when evaluating embeddings (#9437 ) b3733	2024-09-11 17:52:13 +02:00
Pavel Zloi	8db003a19d	py : support converting local models (#7547 ) * Support of converting local models added to convert-hf-to-gguf-update.py * Description fixed * shutil added to imports	2024-09-11 15:29:51 +03:00
Xuan Son Nguyen	0996c5597f	llava : correct args for minicpmv-cli (#9429 ) b3731	2024-09-11 12:59:13 +02:00
Xuan Son Nguyen	5bb2c5dbd2	files : remove accidentally added `lora_test` submodule (#9430 )	2024-09-11 13:02:09 +03:00
Farbod Bijary	67155ab7f5	feat: Implements retrying logic for downloading models using --model-url flag (#9255 ) * feat: Implements retrying logic for downloading models using --model-url flag * Update common/common.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Update common/common.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * apply comments * implements a retry function to avoid duplication * fix editorconfig * change function name --------- Co-authored-by: farbod <farbod.bjary82@gmail.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> b3729	2024-09-11 11:22:37 +02:00
Johannes Gäßler	5af118efda	CUDA: fix --split-mode row race condition (#9413 ) b3728	2024-09-11 10:22:40 +02:00
Georgi Gerganov	d2b496bff4	batched-bench : remove unused code (#9305 ) b3727	2024-09-11 10:03:54 +03:00
R0CKSTAR	b34e023480	musa: remove Clang builtins mapping (#9421 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> b3726	2024-09-11 03:46:55 +02:00
Alberto Cabrera Pérez	51b6038636	sycl : update support conditions (#9394 ) * sycl : update support condition to im2col Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com> * Added TODO to remind supporting FP32 im2col --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com> b3725	2024-09-11 08:53:42 +08:00
Georgi Gerganov	cb9c933eb2	flake.lock: Update (#9360 ) Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/af510d4a62d071ea13925ce41c95e3dec816c01d?narHash=sha256-ODYRm8zHfLTH3soTFWE452ydPYz2iTvr9T8ftDMUQ3E%3D' (2024-08-30) → 'github:hercules-ci/flake-parts/567b938d64d4b4112ee253b9274472dc3a346eb6?narHash=sha256-%2Bebgonl3NbiKD2UD0x4BszCZQ6sTfL4xioaM49o5B3Y%3D' (2024-09-01) • Updated input 'flake-parts/nixpkgs-lib': '`a5d394176e`.tar.gz?narHash=sha256-uFf2QeW7eAHlYXuDktm9c25OxOyCoUOQmh5SZ9amE5Q%3D' (2024-08-01) → '`356624c120`.tar.gz?narHash=sha256-Ss8QWLXdr2JCBPcYChJhz4xJm%2Bh/xjl4G0c0XlP6a74%3D' (2024-09-01) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/71e91c409d1e654808b2621f28a327acfdad8dc2?narHash=sha256-GnR7/ibgIH1vhoy8cYdmXE6iyZqKqFxQSVkFgosBh6w%3D' (2024-08-28) → 'github:NixOS/nixpkgs/574d1eac1c200690e27b8eb4e24887f8df7ac27c?narHash=sha256-v3rIhsJBOMLR8e/RNWxr828tB%2BWywYIoajrZKFM%2B0Gg%3D' (2024-09-06) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-09-10 15:46:59 -07:00
Xuan Son Nguyen	6cd4e03444	arg : bring back missing ifdef (#9411 ) * arg : bring back missing ifdef * replace with llama_supports_gpu_offload b3723	2024-09-10 22:41:29 +02:00
matteo	8d300bd35f	enable --special arg for llama-server (#9419 ) Co-authored-by: matteo serva <matteo.serva@gmail.com> b3722	2024-09-10 22:40:59 +02:00
slaren	49006c67b4	llama : move random seed generation to the samplers (#9398 ) * llama_sampler_penalties : clamp penalty_last_n to zero b3721	2024-09-10 18:04:25 +02:00
Georgi Gerganov	00ba2ff781	metal : fix compile warning with GGML_METAL_NDEBUG (#0 ) b3720	2024-09-10 10:17:43 +03:00
Daniel Bevenius	83008b7cfe	llama : update llm_build_copy_mask_state comment [no ci] (#9385 ) This commit updates the comment, which seems to contain a typo or be an outdated comment, in the copy_mask_state function changing the variable n_rs to n_kv. I believe this change is correct and what the comment wants to convey is to copy the states that are not going to be used in the upcoming processing, which are the tokens states from n_seqs up to the number of possible token states n_kv.	2024-09-10 10:03:21 +03:00
Molly Sophia	0b4ac75772	RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (#9387 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com> b3718	2024-09-10 10:02:30 +03:00
slaren	fb3f249815	make : do not run llama-gen-docs when building (#9399 ) b3717	2024-09-10 09:23:33 +03:00
Xuan Son Nguyen	bfe76d4a17	common : move arg parser code to `arg.cpp` (#9388 ) * common : move arg parser to arg.cpp * better categorize args * add cmake * missing climits * missing cstdarg * common : more explicit includes * fix build * refactor gpt_params_parse * update server readme * fix test --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b3716	2024-09-09 23:36:09 +02:00
Radoslav Gerganov	293bebe077	rpc : fix segfault with nkvo (#9389 ) * rpc : fix nkvo * rpc : buf_size must not be static ref: #9337 --------- Co-authored-by: slaren <slarengh@gmail.com> b3715	2024-09-09 18:40:10 +03:00
Prashant Vithule	5fac4d5764	ggml : vector length agnostic SVE support (#9290 ) * Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths * Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths * Removed WhiteSpaces * ggml : style changes + fix 512-bit nb loop check - fix local scope in switch cases - consistent predicate names - empty lines when necessary - opening braces, spaces - const-correctness - add asserts * Update ggml/src/ggml-quants.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b3714	2024-09-09 18:37:18 +03:00
slaren	5fb5e24811	llama : minor sampling refactor (2) (#9386 ) b3713	2024-09-09 17:10:46 +02:00
Georgi Gerganov	38ca6f644b	readme : update hot topics	2024-09-09 15:51:37 +03:00
Johannes Gäßler	8e6e2fbe14	CUDA: fix variable name conflict for Windows build (#9382 ) b3711	2024-09-09 14:22:53 +02:00
Antonis Makropoulos	5ed087573e	readme : add LLMUnity to UI projects (#9381 ) * add LLMUnity to UI projects * add newline to examples/rpc/README.md to fix editorconfig-checker unit test	2024-09-09 14:21:38 +03:00
Radoslav Gerganov	54f376d0b9	rpc : update README [no ci] (#9320 ) Update README with instructions how to offload model layers to both local and remote devices	2024-09-09 11:04:39 +03:00
Dan Johansson	b2e89a3274	Arm AArch64: Documentation updates (#9321 ) * Arm AArch64: Documentation updates * Update docs/build.md to include information on how to enable the Arm optimized gemm/gemv kernels * Update examples/quantize/README.md with information on the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats * Add newline to the end of docs/build.md	2024-09-09 10:02:45 +03:00
Markus Tavenrath	daa9623ab0	Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (#9118 ) * Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. * fix compile issues * Fix issues where the last submit wasn't executed or handled properly. * remove trailing whitespace * Repair GGML_VULKAN_CHECK_RESULTS * Increase submit counter only if actual work has been submitted and increase submit count to 100. * Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled. b3707	2024-09-08 21:43:48 +02:00
Georgi Gerganov	e079bffb66	cuda : fix FA Q src index (1 -> 0) (#9374 ) b3706	2024-09-08 22:01:02 +03:00
Xuan Son Nguyen	3f7ccfd649	common : bring back missing args, add env var duplication check (#9375 ) * common : bring back missing args * move duplication check to test-arg-parser * add check for duplicated env var * correct default values b3705	2024-09-08 18:08:55 +02:00
slaren	a249843d89	common : restore --n-gpu-layers (#9371 ) b3704	2024-09-08 16:44:42 +02:00

1 2 3 4 5 ...

3753 Commits