sync : llama.cpp #1045

ggerganov · 2024-12-10T15:16:33Z

No description provided.

…ttention (llama/10206)

* metal : Extend how Llama.cpp locates metal resources (llama/10675) * It searches the resource file in the directory where the current binary is located as well. * Resolves symbolic links. Rationale: When we plug this dependency into a Bazel build and run it in the context of Bazel (e.g. testing): * the execution directory is often very different from where the files are located and no direct control over this (Bazel sandboxing), * the Bazel sandbox often use symbolic links to make files available. With this patch, we can have the resource file added to the target, can build and run tests in the context of Bazel. * Update ggml/src/ggml-metal/ggml-metal.m Co-authored-by: Georgi Gerganov <[email protected]> * Update ggml/src/ggml-metal/ggml-metal.m Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>

…ng (llama/10597) * Vulkan: Implement VK_KHR_cooperative_matrix support in the matrix matrix multiplication shader * Improve performance with better q4_k and q5_k dequant and store unrolling * Add Vulkan MUL_MAT and MUL_MAT_ID accumulator precision selection * Rework mulmat shader selection and compilation logic, avoid compiling shaders that won't get used by device * Vulkan: Implement accumulator switch for specific mul mat mat shaders * Vulkan: Unroll more loops for more mul mat mat performance * Vulkan: Add VK_AMD_shader_core_properties2 support to read Compute Unit count for split_k logic * Disable coopmat support on AMD proprietary driver * Remove redundant checks * Add environment variable GGML_VK_DISABLE_COOPMAT to disable VK_KHR_cooperative_matrix support * Fix rebase typo * Fix coopmat2 MUL_MAT_ID pipeline selection

* rename ggml-cpu-aarch64.c to .cpp * reformat extra cpu backend. - clean Q4_0_N_M and IQ4_0_N_M - remove from "file" tensor type - allow only with dynamic repack - extract cpu extra bufts and convert to C++ - hbm - "aarch64" - more generic use of extra buffer - generalise extra_supports_op - new API for "cpu-accel": - amx - aarch64 * clang-format * Clean Q4_0_N_M ref Enable restrict on C++ * add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack * added/corrected control on tensor size for Q4 repacking. * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <[email protected]> * add debug logs on repacks. --------- Co-authored-by: Georgi Gerganov <[email protected]>

ggml-ci

…(llama/10713)

…llama/10723) * Vulkan: fix NaN in tanh.comp * Faster NaN-free tanh

ggml-ci

jeffbolznv and others added 11 commits December 10, 2024 17:15

vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash a…

84359fb

…ttention (llama/10206)

ggml : disable iq4_nl interleave size 8 (llama/10709)

5837f93

ggml-ci

vulkan: compile a test shader in cmake to check for coopmat2 support …

93cf8ce

…(llama/10713)

Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows (…

d171dd9

…llama/10723) * Vulkan: fix NaN in tanh.comp * Faster NaN-free tanh

vulkan: fix compile warnings (llama/10731)

5b93f65

CUDA: fix shared memory access condition for mmv (llama/10740)

cc4184b

sync : llama.cpp

30ee1bf

ggml-ci

common : remove old types

916577c

ggml-ci

ggerganov merged commit 38e504a into master Dec 10, 2024
11 checks passed

ggerganov deleted the sync branch December 10, 2024 16:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : llama.cpp #1045

sync : llama.cpp #1045

ggerganov commented Dec 10, 2024

sync : llama.cpp #1045

sync : llama.cpp #1045

Conversation

ggerganov commented Dec 10, 2024