mtmd: refactor preprocessing + support max/min pixels (#16878)

* mtmd: refactor preprocessing + support max/min pixels * fix mlp type * implement mix/max pixels * improve hparams * better image preproc for qwen * fix * fix out of bound composite * fix (2) * fix token calculation * get_merge_kernel_size() * fix llama4 and lfm2 * gonna fix them all * use simple resize for qwen * qwen: increase min tokens * no resize if dst size == src size * restore to initial min/max tokens value for qwen
2025-11-08 10:07:01 +00:00 · 2025-11-01 15:51:36 +01:00
parent d8b860a219
commit cf659bbb8e
2 changed files with 432 additions and 332 deletions
--- a/tools/mtmd/clip-impl.h
+++ b/tools/mtmd/clip-impl.h
@@ -154,8 +154,8 @@ enum projector_type {
    PROJECTOR_TYPE_LFM2,
    PROJECTOR_TYPE_KIMIVL,
    PROJECTOR_TYPE_LIGHTONOCR,
-    PROJECTOR_TYPE_UNKNOWN,
    PROJECTOR_TYPE_COGVLM,
+    PROJECTOR_TYPE_UNKNOWN,
 };

 static std::map<projector_type, std::string> PROJECTOR_TYPE_NAMES = {