mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-10-27 08:21:30 +00:00
cuda : improve cuda pool efficiency using virtual memory (#4606)
* cuda : improve cuda pool efficiency using virtual memory * fix mixtral * fix cmake build * check for vmm support, disable for hip ggml-ci * fix hip build * clarify granularity * move all caps to g_device_caps * refactor error checking * add cuda_pool_alloc, refactor most pool allocations ggml-ci * fix hip build * CUBLAS_TF32_TENSOR_OP_MATH is not a macro * more hip crap * llama : fix msvc warnings * ggml : fix msvc warnings * minor * minor * cuda : fallback to CPU on host buffer alloc fail * Update ggml-cuda.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml-cuda.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * ensure allocations are always aligned * act_size -> actual_size --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
This commit is contained in:
2
ggml.h
2
ggml.h
@@ -255,6 +255,8 @@
|
||||
#define GGML_UNREACHABLE() GGML_ASSERT(!"statement should not be reached")
|
||||
#elif defined(__GNUC__)
|
||||
#define GGML_UNREACHABLE() __builtin_unreachable()
|
||||
#elif defined(_MSC_VER)
|
||||
#define GGML_UNREACHABLE() __assume(0)
|
||||
#else
|
||||
#define GGML_UNREACHABLE() ((void) 0)
|
||||
#endif
|
||||
|
||||
Reference in New Issue
Block a user