Olivier Chafik 
							
						 
					 
					
						
						
							
						
						d74e94c1b3 
					 
					
						
						
							
							server: fix format of streamed tool call deltas (diff name, fix id location) (#13800 )  
						
						... 
						
						
						
						* fix deltas of tool_call.function.name
* fix tool_call.id (was in tool_call.function.id!) + add function type
* add tool_call.type
* populate empty tool_call.function.arguments on first delta 
						
						
					 
					
						2025-05-26 14:56:49 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						f13847cfb5 
					 
					
						
						
							
							server: fix regression on streamed non-chat completion w/ stops ( #13785 )  
						
						... 
						
						
						
						* more forgiving message diffs: partial stop words aren't erased, full stops are
* Add (slow) server test for completion + stream + stop 
						
						
					 
					
						2025-05-26 14:16:37 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						79c137f776 
					 
					
						
						
							
							examples : allow extracting embeddings from decoder contexts ( #13797 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-05-26 14:03:54 +03:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						e121edc432 
					 
					
						
						
							
							server: add --reasoning-budget 0 to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771 )  
						
						... 
						
						
						
						---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2025-05-26 00:30:51 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						2f099b510f 
					 
					
						
						
							
							webui : bump max upload file size to 500MB ( #13779 )  
						
						
						
						
					 
					
						2025-05-25 18:02:18 +01:00 
						 
				 
			
				
					
						
							
							
								Percy Piper 
							
						 
					 
					
						
						
							
						
						c508256db2 
					 
					
						
						
							
							rpc : Fix build on OpenBSD ( #13541 )  
						
						
						
						
					 
					
						2025-05-25 15:35:53 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						40aaa8a403 
					 
					
						
						
							
							mtmd : add support for Qwen2-Audio and SeaLLM-Audio ( #13760 )  
						
						... 
						
						
						
						* mtmd : add Qwen2-Audio support
* small clean up
* update discussion link
* clarify mtmd_get_output_embd
* clarification in multimodal.md
* fix ultravox bug
* ggml_cont 
						
						
					 
					
						2025-05-25 14:06:32 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						d785f9c1fd 
					 
					
						
						
							
							server: fix/test add_generation_prompt ( #13770 )  
						
						... 
						
						
						
						Co-authored-by: ochafik <ochafik@google.com > 
						
						
					 
					
						2025-05-25 10:45:49 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						f5cd27b71d 
					 
					
						
						
							
							server: streaming of tool calls and thoughts when --jinja is on (#12379 )  
						
						... 
						
						
						
						* add common_json w/ support for truncated json healing
* add common_chat_msg_diff
* partial common_chat_parse
* refactor parser w/ optionals
* server: wire chat diffs in stream mode
* fix trigger of thinking models (must happen after thoughts are closed)
* fix functionary v3.2 raw python!
* rename: common_chat_syntax (now contains format)
* rm common_regex.at_start
* don't return empty <think></think>
* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)
* fix QwQ 32B tool call parsing after thoughts (hermes2)
* better logs for grammar triggers
* consume spaces after parse_json_tool_calls
* fix required tool calls w/ thinking models that have pre-opened thinking tags
* fix thinking model's initial trigger + test qwq's template
* run most test_tool_call tests in stream + non-stream modes
* make functionary v3.2 parsing more strict (differentiate first match from others)
* send final diff from server, to close off raw python arguments
* support partial content streaming in Generic mode
* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)
* Update function-calling.md
* Update tool_bench.py
* chat-parser: remove input from exception (llm output may contain PII)
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com > 
						
						
					 
					
						2025-05-25 01:48:08 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						9ecf3e66a3 
					 
					
						
						
							
							server : support audio input ( #13714 )  
						
						... 
						
						
						
						* server : support audio input
* add audio support on webui 
						
						
					 
					
						2025-05-23 11:03:47 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8a1d206f1d 
					 
					
						
						
							
							tts : fix n_ubatch + make WavTokenizer cache-less ( #13713 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-05-22 22:21:07 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						797990c4bc 
					 
					
						
						
							
							mtmd : add ultravox audio input ( #13623 )  
						
						... 
						
						
						
						* convert ok, load ok
* warmup ok
* test
* still does not work?
* fix padding
* temporary give up
* fix merge conflict
* build_ultravox()
* rm test
* fix merge conflict
* add necessary mtmd APIs
* first working version (only 4s of audio)
* will this monster compile?
* fix compile
* please compile
* fPIC
* fix windows
* various fixes
* clean up audio_helpers
* fix conversion
* add some debug stuff
* long audio input ok
* adapt the api
* add --audio arg
* final touch UX
* add miniaudio to readme
* fix typo
* refactor kv metadata
* mtmd_default_marker() 
						
						
					 
					
						2025-05-22 20:42:48 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						cc74d5be99 
					 
					
						
						
							
							server : pad small embedding batches ( #13692 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-05-22 16:33:39 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5fbfe384d4 
					 
					
						
						
							
							server : improve error reporting ( #13680 )  
						
						
						
						
					 
					
						2025-05-21 19:46:56 +03:00 
						 
				 
			
				
					
						
							
							
								Robin Davidsson 
							
						 
					 
					
						
						
							
						
						0d5c742161 
					 
					
						
						
							
							server : Add the endpoints /api/tags and /api/chat ( #13659 )  
						
						... 
						
						
						
						* Add the endpoints /api/tags and /api/chat
Add the endpoints /api/tags and /api/chat, and improved the model metadata response
* Remove trailing whitespaces
* Removed code that is not needed for copilot to work. 
						
						
					 
					
						2025-05-21 15:15:27 +02:00 
						 
				 
			
				
					
						
							
							
								Dorin-Andrei Geman 
							
						 
					 
					
						
						
							
						
						42158ae2e8 
					 
					
						
						
							
							server : fix first message identification ( #13634 )  
						
						... 
						
						
						
						* server : fix first message identification
When using the OpenAI SDK (https://github.com/openai/openai-node/blob/master/src/lib/ChatCompletionStream.ts#L623-L626 ) we noticed that the expected assistant role is missing in the first streaming message. Fix this by correctly checking for the first message.
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
Signed-off-by: Dorin Geman <dorin.geman@docker.com >
* server : Fix checks for first role message for stream=True
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
Signed-off-by: Dorin Geman <dorin.geman@docker.com >
---------
Signed-off-by: Dorin Geman <dorin.geman@docker.com >
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com > 
						
						
					 
					
						2025-05-21 15:07:57 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						797f2ac062 
					 
					
						
						
							
							kv-cache : simplify the interface ( #13660 )  
						
						... 
						
						
						
						* kv-cache : simplify the interface
ggml-ci
* context : revert llama_batch_allocr position change
ggml-ci 
						
						
					 
					
						2025-05-21 15:11:13 +03:00 
						 
				 
			
				
					
						
							
							
								l3utterfly 
							
						 
					 
					
						
						
							
						
						b7a17463ec 
					 
					
						
						
							
							mtmd-helper : bug fix to token batching in mtmd ( #13650 )  
						
						... 
						
						
						
						* Update mtmd-helper.cpp
* Update tools/mtmd/mtmd-helper.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2025-05-20 18:55:30 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e298d2fbd0 
					 
					
						
						
							
							kv-cache : add SWA support ( #13194 )  
						
						... 
						
						
						
						* kv-cache : prepare for SWA
ggml-ci
* kv-cache : initial iSWA implementation
ggml-ci
* kv-cache : rework error recovery logic
ggml-ci
* models : fix Phi-3 SWA parameters
ggml-ci
* model : adjust Granite to rope factor changes
ggml-ci
* server : check if context can do shifts
ggml-ci
* iswa : for now, always enable shifts (experiment)
ggml-ci
* kv-cache : simplify SWA logic
ggml-ci
* kv-cache : apply defrag when we fail to find slots for the batch
ggml-ci
* llama : update docs about llama_decode
ggml-ci
* kv-cache : update warning logs when no space for the batch is available
ggml-ci
* llama : add llama_kv_self_seq_pos_min()
* kv-cache : keep track of partial SWA computes and print warnings
* server : disallow use cases involving partial SWA context
ggml-ci
* llama : add param to control SWA cache size
ggml-ci
* minor : clean-up
ggml-ci 
						
						
					 
					
						2025-05-20 08:05:46 +03:00 
						 
				 
			
				
					
						
							
							
								Nicolò Scipione 
							
						 
					 
					
						
						
							
						
						f7c9429c85 
					 
					
						
						
							
							sycl : Overcoming workaround for mmap() allocation on Windows ( #13482 )  
						
						... 
						
						
						
						* Remove mmap workaround on windows
After some testing I found that mmap is supported on windows and for
many GPUs on Linux. Therefore I remove the workaround for windows since
it is not necessary.
* Update llama-bench README
SYCL backend introduced a workaround that allows execution of
llama-bench also without specifying `--mmp 0` flag 
						
						
					 
					
						2025-05-20 08:54:43 +08:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						92ecdcc06a 
					 
					
						
						
							
							mtmd : add vision support for llama 4 ( #13282 )  
						
						... 
						
						
						
						* wip llama 4 conversion
* rm redundant __init__
* fix conversion
* fix conversion
* test impl
* try this
* reshape patch_embeddings_0
* fix view
* rm ffn_post_norm
* cgraph ok
* f32 for pos embd
* add image marker tokens
* Llama4UnfoldConvolution
* correct pixel shuffle
* fix merge conflicts
* correct
* add debug_graph
* logits matched, but it still preceives the image incorrectly
* fix style
* add image_grid_pinpoints
* handle llama 4 preprocessing
* rm load_image_size
* rm unused line
* fix
* small fix 2
* add test & docs
* fix llava-1.6 test
* test: add notion of huge models
* add comment
* add warn about degraded quality 
						
						
					 
					
						2025-05-19 13:04:14 +02:00 
						 
				 
			
				
					
						
							
							
								Isaac McFadyen 
							
						 
					 
					
						
						
							
						
						6a2bc8bfb7 
					 
					
						
						
							
							server : added --no-prefill-assistant flag ( #13608 )  
						
						... 
						
						
						
						* added no-prefill-assistant flag
* reworded documentation comment
* updated server README.md 
						
						
					 
					
						2025-05-17 23:59:48 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						6aa892ec2a 
					 
					
						
						
							
							server : do not return error out of context (with ctx shift disabled) ( #13577 )  
						
						
						
						
					 
					
						2025-05-16 21:50:00 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						aea9f8b4e7 
					 
					
						
						
							
							webui : improve accessibility for visually impaired people ( #13551 )  
						
						... 
						
						
						
						* webui : improve accessibility for visually impaired people
* add a11y for extra contents
* fix some labels being read twice
* add skip to main content 
						
						
					 
					
						2025-05-16 21:49:01 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						6c8b91500e 
					 
					
						
						
							
							llama-bench : fix -ot with dl backends ( #13563 )  
						
						
						
						
					 
					
						2025-05-15 15:46:55 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						3cc1f1f1d2 
					 
					
						
						
							
							webui : handle PDF input (as text or image) + convert pasted long content to file ( #13562 )  
						
						... 
						
						
						
						* webui : handle PDF input (as text or image)
* handle the case where pdf image + server without mtmd
* fix bug missing pages 
						
						
					 
					
						2025-05-15 14:24:50 +02:00 
						 
				 
			
				
					
						
							
							
								Piotr Wilkin (ilintar) 
							
						 
					 
					
						
						
							
						
						c753d7bed0 
					 
					
						
						
							
							server : proper error handling for missing elements in messages array (OpenAI compatible backend) ( #13540 )  
						
						
						
						
					 
					
						2025-05-15 08:40:58 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						b2838049cc 
					 
					
						
						
							
							bench : handle decode errors ( #13548 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-05-15 05:57:02 +03:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						aa48e373f2 
					 
					
						
						
							
							server: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802 )  
						
						... 
						
						
						
						* Inject date_string in llama 3.x + fix for functionary v2
https://github.com/ggml-org/llama.cpp/issues/12729 
* move/fix detection of functionary v3.1 before llama 3.x, fix & test their non-tool mode
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* generate more tokens in test_completion_with_required_tool_tiny_fast to avoid truncation
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com > 
						
						
					 
					
						2025-05-15 02:39:51 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						3198405e98 
					 
					
						
						
							
							common: add partial regex support (#12808 )  
						
						... 
						
						
						
						* move string_find_partial_stop & string_ends_with to common
* add common_regex (supports partial matches)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/regex-partial.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/regex-partial.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/regex-partial.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* partial regex: add missing iterator end checks
* string utils: use string_views
* direct throw to avoid ggml.h include
* regex-partial: replace missed ggml_asserts
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-05-14 19:50:57 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						053174436f 
					 
					
						
						
							
							server : passthrough the /models endpoint during loading ( #13535 )  
						
						... 
						
						
						
						* server : passthrough the /models endpoint during loading
* server : update readme + return json for "meta" field 
						
						
					 
					
						2025-05-14 15:42:10 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						360a9c98e1 
					 
					
						
						
							
							server : fix cache_tokens bug with no cache_prompt ( #13533 )  
						
						
						
						
					 
					
						2025-05-14 13:35:07 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						bb1681fbd5 
					 
					
						
						
							
							webui : use fflate for more deterministic gzip compress ( #13525 )  
						
						... 
						
						
						
						* webui : use pako for more deterministic gzip compress
* simpler code
* use fflate instead of pako 
						
						
					 
					
						2025-05-14 10:26:12 +02:00 
						 
				 
			
				
					
						
							
							
								Luca Stefani 
							
						 
					 
					
						
						
							
						
						d486dd3e8e 
					 
					
						
						
							
							webui: Allow pasting file from clipboard ( #13526 )  
						
						... 
						
						
						
						* server: Allow pasting file from clipboard
* server: Prevent default action on file paste
* update build
* format then build combined
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-05-14 10:07:31 +02:00 
						 
				 
			
				
					
						
							
							
								Ed Addario 
							
						 
					 
					
						
						
							
						
						e5c834f718 
					 
					
						
						
							
							quantize : improve tensor-type pattern matching ( #13033 )  
						
						
						
						
					 
					
						2025-05-13 19:12:31 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						71bdbdb587 
					 
					
						
						
							
							clip : clip.h become private API ( ⚠️  breaking change) ( #13510 )  
						
						
						
						
					 
					
						2025-05-13 17:07:21 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						b89d605a91 
					 
					
						
						
							
							batched-bench : fix pp batch contents ( #13492 )  
						
						
						
						
					 
					
						2025-05-13 18:01:53 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						b4726345ac 
					 
					
						
						
							
							mtmd : remove libllava, remove clip-quantize-cli ( ⚠️  breaking change) ( #13460 )  
						
						... 
						
						
						
						* mtmd : remove libllava, remove clip-quantize-cli
* rm clip_model_quantize 
						
						
					 
					
						2025-05-13 15:33:58 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						cf0a43bb64 
					 
					
						
						
							
							llama-bench : add defrag-thold, check for invalid ranges ( #13487 )  
						
						
						
						
					 
					
						2025-05-13 00:31:37 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						de4c07f937 
					 
					
						
						
							
							clip : cap max image size 1024 for qwen vl model ( #13478 )  
						
						
						
						
					 
					
						2025-05-12 15:06:51 +02:00 
						 
				 
			
				
					
						
							
							
								Anudit Nagar 
							
						 
					 
					
						
						
							
						
						91159ee9df 
					 
					
						
						
							
							server : allow content to be null in oaicompat_completion_params_parse ( #13477 )  
						
						
						
						
					 
					
						2025-05-12 13:56:42 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						22cdab343b 
					 
					
						
						
							
							llama-bench : accept ranges for integer parameters ( #13410 )  
						
						
						
						
					 
					
						2025-05-12 13:08:22 +02:00 
						 
				 
			
				
					
						
							
							
								City 
							
						 
					 
					
						
						
							
						
						c104023994 
					 
					
						
						
							
							mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj ( #13459 )  
						
						
						
						
					 
					
						2025-05-12 00:39:06 +02:00 
						 
				 
			
				
					
						
							
							
								Anthony Umfer 
							
						 
					 
					
						
						
							
						
						9a390c4829 
					 
					
						
						
							
							tools : fix uninitialized llama_batch in server ( #13436 )  
						
						... 
						
						
						
						* add constructor to initialize server_context::batch, preventing destructor's call to llama_batch_free from causing an invalid free()
* Update tools/server/server.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
* use C++11 initializer syntax
* switch from Copy-list-initialization to Direct-list-initialization
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2025-05-11 17:08:26 +02:00 
						 
				 
			
				
					
						
							
							
								David Huang 
							
						 
					 
					
						
						
							
						
						7f323a589f 
					 
					
						
						
							
							Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B ( #13386 )  
						
						
						
						
					 
					
						2025-05-11 14:18:39 +02:00 
						 
				 
			
				
					
						
							
							
								City 
							
						 
					 
					
						
						
							
						
						3eac209319 
					 
					
						
						
							
							mtmd : support InternVL 3 38B and 78B mmproj ( #13443 )  
						
						... 
						
						
						
						* Support InternVL 3 38B and 78B mmproj
* Swap norms in clip.cpp
* Group variables together 
						
						
					 
					
						2025-05-11 11:35:52 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						a634d75d1b 
					 
					
						
						
							
							mtmd : move helpers to dedicated file ( #13442 )  
						
						... 
						
						
						
						* mtmd : move helpers to dedicated file
* fix windows build
* rm redundant include 
						
						
					 
					
						2025-05-11 11:34:23 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						15e6125a39 
					 
					
						
						
							
							mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl ( #13434 )  
						
						... 
						
						
						
						* mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl
* fix typo 
						
						
					 
					
						2025-05-10 19:57:54 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						3b24d26c22 
					 
					
						
						
							
							server : update docs ( #13432 )  
						
						
						
						
					 
					
						2025-05-10 18:44:49 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						053367d149 
					 
					
						
						
							
							mtmd : support InternVL 2.5 and 3 ( #13422 )  
						
						... 
						
						
						
						* convert : internvl support
* InternVL3-1B working
* fix regression
* rm mobilevlm from test
* fix conversion
* add test for internvl
* add to list of pre-quant
* restore boi/eoi check
* add clarify comment for norm eps 
						
						
					 
					
						2025-05-10 16:26:42 +02:00