llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-01 09:01:57 +00:00

Author	SHA1	Message	Date
Tobias Lütke	ff6e39f138	use javascript generators as much cleaner API Also add ways to access completion as promise and EventSource	2023-07-05 15:03:01 -04:00
Tobias Lütke	efa86bf2a6	export llama_timings as struct and expose them in server	2023-07-04 21:52:04 -04:00
Tobias Lütke	c19daa4eb5	basic response formatting	2023-07-04 09:14:51 -04:00
Tobias Lütke	eee6d69e39	fix mobile, fix missing prompt cache	2023-07-04 09:14:51 -04:00
Tobias Lütke	fedce007c0	rework state management into session, expose historyTemplate to settings	2023-07-04 09:14:51 -04:00
Tobias Lütke	98e612cefd	slightly nicer css	2023-07-04 09:14:51 -04:00
Tobias Lütke	dd1df3f31c	add /completion.js file to make it easy to use the server from js	2023-07-04 09:14:50 -04:00
Tobias Lütke	8e1b04d319	enable server in Makefiles	2023-07-04 09:14:50 -04:00
Tobias Lütke	dc7dd0886a	let's try this with the xxd tool instead and see if msvc is happier with that	2023-07-04 09:14:50 -04:00
Tobias Lütke	34fc3c7e9f	remove need for @microsoft/fetch-event-source dep (-7kb)	2023-07-04 09:14:50 -04:00
Tobias Lütke	e192f950a3	revert log format changes	2023-07-04 09:14:50 -04:00
Tobias Lütke	0f95689c17	improvements	2023-07-04 09:14:50 -04:00
Tobias Lütke	7a3895641c	allow server to multithread because web browsers send a lot of garbage requests we want the server to multithread when serving 404s for favicon's etc. To avoid blowing up llama we just take a mutex when it's invoked.	2023-07-04 09:14:49 -04:00
Tobias Lütke	a30d4b2a8f	switched to fprintf logging and to access_log	2023-07-04 09:14:49 -04:00
tobi lutke	c8cedf5684	newline police	2023-07-04 09:14:05 -04:00
tobi lutke	022bf2bb48	embed index and add --path for choosing static dir	2023-07-04 09:14:05 -04:00
tobi lutke	e3fba85d14	minor aesthetic fixes	2023-07-04 09:14:05 -04:00
Georgi Gerganov	c1cb0e1db2	server : clear trailing whitespace	2023-07-04 09:14:05 -04:00
tobi lutke	b07b271358	tighter	2023-07-04 09:14:04 -04:00
tobi lutke	627d3ba8b5	expose simple web interface on root domain demonstrates how to use the stream option of generate.	2023-07-04 09:14:04 -04:00
Henri Vasserman	acc111caf9	Allow old Make to build server. (#2098 ) Also make server build by default. Tested with Make 3.82 master-acc111c	2023-07-04 15:38:04 +03:00
ZhouYuChen	23c7c6fc91	Update Makefile: clean simple (#2097 ) master-23c7c6f	2023-07-04 14:15:16 +02:00
Erik Scholz	698efad5fb	CI: make the brew update temporarily optional. (#2092 ) until they decide to fix the brew installation in the macos runners. see the open issues. eg https://github.com/actions/runner-images/pull/7710 master-698efad	2023-07-04 01:50:12 +02:00
Govlzkoy	14a2cc71f6	[ggml] fix index for ne03 value in ggml_cl_mul_f32 (#2088 )	2023-07-04 07:50:00 +08:00
Henri Vasserman	1cf14ccef1	fix server crashes (#2076 )	2023-07-04 00:05:23 +03:00
Howard Su	cc45a7feb8	Fix crash of test-tokenizer-0 under Debug build (#2064 ) * Fix crash of test-tokenizer-0 under Debug build * Change per comment	2023-07-03 20:43:55 +02:00
Howard Su	55dbb915cc	[llama] No need to check file version when loading vocab score (#2079 )	2023-07-03 19:58:58 +08:00
WangHaoranRobin	d7d2e6a0f0	server: add option to output probabilities for completion (#1962 ) * server: add option to output probabilities for completion * server: fix issue when handling probability output for incomplete tokens for multibyte character generation * server: fix llama_sample_top_k order * examples/common.h: put all bool variables in gpt_params together master-d7d2e6a	2023-07-03 00:38:44 +03:00
Georgi Gerganov	46088f7231	ggml : fix build with OpenBLAS (close #2066 ) master-46088f7	2023-07-02 09:46:46 +03:00
Johannes Gäßler	0bc2cdfc87	Better CUDA synchronization logic (#2057 ) master-0bc2cdf	2023-07-01 21:49:44 +02:00
Johannes Gäßler	befb3a3562	Test-based VRAM scratch size + context adjustment (#2056 )	2023-07-01 21:47:26 +02:00
Daniel Drake	b213227067	cmake : don't force -mcpu=native on aarch64 (#2063 ) It's currently not possible to cross-compile llama.cpp for aarch64 because CMakeLists.txt forces -mcpu=native for that target. -mcpu=native doesn't make sense if your build host is not the target architecture, and clang rejects it for that reason, aborting the build. This can be easily reproduced using the current Android NDK to build for aarch64 on an x86_64 host. If there is not a specific CPU-tuning target for aarch64 then -mcpu should be omitted completely. I think that makes sense, there is not enough variance in the aarch64 instruction set to warrant a fixed -mcpu optimization at this point. And if someone is building natively and wishes to enable any possible optimizations for the host device, then there is already the LLAMA_NATIVE option available. Fixes #495.	2023-07-01 21:31:44 +03:00
Aaron Miller	2f8cd979ec	metal : release buffers when freeing metal context (#2062 ) master-2f8cd97	2023-07-01 21:14:59 +03:00
Judd	471aab6e4c	convert : add support of baichuan-7b (#2055 ) Co-authored-by: Judd <foldl@boxvest.com>	2023-07-01 20:00:25 +03:00
Georgi Gerganov	463f2f4c4f	llama : fix return value of llama_load_session_file_internal (#2022 )	2023-07-01 19:05:09 +03:00
Rand Xie	cb44dbc7de	llama : catch llama_load_session_file_internal exceptions (#2022 ) * convert checks in llama_load_session_file to throw and handle them * make llama_load_session_file_internal static * address feedbacks to avoid using exceptions	2023-07-01 19:02:58 +03:00
Georgi Gerganov	79f634a19d	embd-input : fix returning ptr to temporary master-79f634a	2023-07-01 18:46:00 +03:00
Georgi Gerganov	04606a1599	train : fix compile warning	2023-07-01 18:45:44 +03:00
Qingyou Meng	b1ca8f36a9	ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995 ) Will not be scheduled unless explicitly enabled.	2023-07-01 18:42:43 +03:00
Howard Su	b8c8dda75f	Use unsigned for random seed (#2006 ) * Use unsigned for random seed. Keep -1 as the value to use a time based seed. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> master-b8c8dda	2023-06-29 06:15:15 -07:00
LostRuins	96a712ca1b	Porting the improved K-Quant CUDA kernels to OpenCL (#1966 ) * Added broken new q4k quant * xx + ib0 * Fix q2_k fast kernel * Use preprocessor for QK_K * Add q6_k fast matmul kernel * ported q3k speedup successfully * ported q2k and q5k speedups * remove old dot kernels and template * fixed global const struct types * fixing address spaces * fixed string too long CI issue --------- Co-authored-by: 0cc4m <picard12@live.de>	2023-06-29 05:56:43 +02:00
m3ndax	d3494bb86b	llama : replacing auto &kv with const auto &kv (#2041 ) * Replacing auto &kv with const auto &kv * Create codacy.yml * Delete codacy.yml master-d3494bb	2023-06-28 21:39:08 +03:00
Salvador E. Tropea	5b351e94d0	cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028 ) - Not used master-5b351e9	2023-06-28 20:27:31 +03:00
Salvador E. Tropea	6432aabb6d	cuda : fix missing const qualifier in casts (#2027 ) master-6432aab	2023-06-28 20:26:26 +03:00
Howard Su	b922bc351b	llama : remove shards weight file support (#2000 ) * Remove multiple shards * Remove multiple file loaders * Remove llama_load_tensor_shard class * Simplify load logic * Remove dead code guess_n_parts function * Remove vocab_only from constructor of llama_model_loader * Remove alignment_prevents_mmap which is not more needed. * Remove useless check master-b922bc3	2023-06-28 20:13:02 +03:00
Johannes Gäßler	7f9753fa12	CUDA GPU acceleration for LoRAs + f16 models (#1970 ) master-7f9753f	2023-06-28 18:35:54 +02:00
ningshanwutuobang	cfa0750bc9	llama : support input embeddings directly (#1910 ) * add interface for float input * fixed inpL shape and type * add examples of input floats * add test example for embd input * fixed sampling * add free for context * fixed add end condition for generating * add examples for llava.py * add READMD for llava.py * add READMD for llava.py * add example of PandaGPT * refactor the interface and fixed the styles * add cmake build for embd-input * add cmake build for embd-input * Add MiniGPT-4 example * change the order of the args of llama_eval_internal * fix ci error	2023-06-28 18:53:37 +03:00
Erik Scholz	9d23589d63	fix pthreads setaffinity usage on android (#2020 ) master-9d23589	2023-06-27 19:06:33 +02:00
Howard Su	0be54f75a6	baby-llama : fix build after ggml_rope change (#2016 ) master-0be54f7	2023-06-27 08:07:13 +03:00
Georgi Gerganov	181e8d9755	llama : fix rope usage after ChatGLM change	2023-06-27 00:37:33 +03:00

1 2 3 4 5 ...

802 Commits