* 1. add "integrated" in ggml_cuda_device_info for distinguish whether it is Intergrate_gpu or discrete_gpu
2. Adjust the func:"ggml_backend_cuda_device_supports_buft" for this new feature
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Adjusted code indentation
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Fixed incorrect setting of variable types
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Adjusted the judgment logic
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* add a host_buft assert in case of integrated_cuda_device with func:'evaluate_and_capture_cuda_graph()'
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Add a defensive security assert
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Adjusted the support judgment logic.
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* revoke the suggest commit changes due to it's not applicable in jetson_device
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Add parentheses to enforce operator precedence
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Fix ci bug: add a spaces
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: yangxiao <yang_xl@tju.edu.cn>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: yangxiao <yangxl_zz@qq.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* musa: fix build warning (unused parameter)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: upgrade MUSA SDK version to rc4.0.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Update ggml/src/ggml-cuda/cpy.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* llama/ggml: add LLM training support
more compact progress bar
llama_save_model_to_file
llama_opt_param_filter
ggml_graph_dup force_grads
refactor ggml_opt, fix test-opt
* remove logits_all
* refactor CUDA implementation for ACC
* reset graph at beginning of opt period
* ggml : remove MSVC warnings pragmas
This commit removes the MSVC-specific pragmas as these are now handled
in ggml/CMakeLists.txt.
* whisper : remove MSVC warning pragmas
This commit removes the MSVC-specific pragmas. These are now handled in
the ggml/CMakeLists.txt file.
* graph : make mla compatible with FA
* metal : add exp FA kernels for DeepSeek models
ggml-ci
* llama : minor naming updates
ggml-ci
* ggml : disable FA for DS head sizes
* tests : add FA tests for MLA shapes
ggml-ci