Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
cuda : improve cuda pool efficiency using virtual memory (ggerganov#4606
) * cuda : improve cuda pool efficiency using virtual memory * fix mixtral * fix cmake build * check for vmm support, disable for hip ggml-ci * fix hip build * clarify granularity * move all caps to g_device_caps * refactor error checking * add cuda_pool_alloc, refactor most pool allocations ggml-ci * fix hip build * CUBLAS_TF32_TENSOR_OP_MATH is not a macro * more hip crap * llama : fix msvc warnings * ggml : fix msvc warnings * minor * minor * cuda : fallback to CPU on host buffer alloc fail * Update ggml-cuda.cu Co-authored-by: Johannes Gäßler <[email protected]> * Update ggml-cuda.cu Co-authored-by: Johannes Gäßler <[email protected]> * ensure allocations are always aligned * act_size -> actual_size --------- Co-authored-by: Johannes Gäßler <[email protected]>
- Loading branch information