sync : llama.cpp #1060

ggerganov · 2025-01-03T11:33:51Z

No description provided.

* server : add "tokens" output ggml-ci * server : output embeddings for all tokens when pooling = none ggml-ci * server : be explicit about the pooling type in the tests ggml-ci * server : do not normalize embeddings when there is no pooling ggml-ci * llama : add OuteTTS support (wip) * wip * extract features * first conv * group norm * resnet conv * resnet * attn * pos net * layer norm * convnext * head * hann window * fix n_embd + remove llama.cpp hacks * compute hann window * fft * spectrum processing * clean-up * tts : receive input text and generate codes * clip : fix new conv name * tts : minor fix * tts : add header + minor fixes ggml-ci * tts : add matchematical constant ggml-ci * tts : fix sampling + cut initial noise * tts : fixes * tts : update default samplers ggml-ci * tts : text pre-processing * tts : outetts-voc -> wavtokenizer-dec * tts : remove hardcoded constants ggml-ci * tts : fix tensor shapes * llama : refactor wavtokenizer tensors ggml-ci * cont ggml-ci * cont [no ci] * llama : update WavTokenizer to non-causal attn * llama : handle no-vocab detokenization * tts : add Python example for OuteTTS (wip) * tts : extend python example to generate spectrogram ggml-ci * server : fix rebase artifacts * tts : enable "return_tokens" in Python example ggml-ci * tts : minor fixes * common : support HF download for vocoder

* ggml: GGML_NATIVE uses -mcpu=native on ARM Signed-off-by: Adrien Gallouët <[email protected]> * ggml: Show detected features with GGML_NATIVE Signed-off-by: Adrien Gallouët <[email protected]> * remove msvc support, add GGML_CPU_ARM_ARCH option * disable llamafile in android example * march -> mcpu, skip adding feature macros ggml-ci --------- Signed-off-by: Adrien Gallouët <[email protected]> Co-authored-by: Adrien Gallouët <[email protected]>

Signed-off-by: Adrien Gallouët <[email protected]>

* Migrate to tensor->buffer for checking backend buffer type: 1 * SYCL: common.cpp try to migrate away from tensor->backend * SYCL: fix assertions and add proper comments * SYCL: remove extra space * SYCL: Add back static to ggml_backend_buffer_is_sycl_split function * SYCL: Add pragma directive to suppress warning spam * SYCL: Integrate debug logs with GGML_LOG and other fixes * Revert "SYCL: Integrate debug logs with GGML_LOG and other fixes" This reverts commit 2607b7de0f0d2f4f1f690226f86fa861aa39cb97. Let's keep the current SYCL specific logging mechanism for now * SYCL: Use GGML_SYCL_DEBUG after reverting * SYCL: reg_get_proc_address func, update to the current func signature * SYCL: Refactor SYCL buffer checks in ggml_sycl_cpy_tensor_2d

…() (llama/10874) * ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() Signed-off-by: Adrien Gallouët <[email protected]> * ggml-cpu: format code Signed-off-by: Adrien Gallouët <[email protected]> --------- Signed-off-by: Adrien Gallouët <[email protected]>

Change the code to do 16b loads when possible and extract the appropriate component late, so the code is effectively decoding a pair of elements and then selecting one. This can allow more commoning to happen in the compiler when neighboring elements are loaded.

* vulkan: build fixes for 32b Should fix #10923 * vulkan: initialize some buffer/offset variables

ggml-ci

* more perfo with llamafile tinyblas on x86_64. - add bf16 suport - change dispache strategie (thanks: ikawrakow/ik_llama.cpp#71 ) - reduce memory bandwidth simple tinyblas dispache and more cache freindly * tinyblas dynamic dispaching * sgemm: add M blocs. * - git 2.47 use short id of len 9. - show-progress is not part of GNU Wget2 * remove not stable test

Warning types fixed (observed under MSYS2 GCC 14.2.0): * format '%ld' expects argument of type 'long int', but argument has type 'size_t' * llama.cpp/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp:81:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers] (emitted for all struct field except first)

* multi row k quant shaders! * better row selection * more row choices * readjust row selection * rm_kq=2 by default

…ama/10987)

…0942) * tests: Add im2col perf tests * vulkan: optimize im2col, more elements per thread * vulkan: increase small tile size for NV_coopmat2 * vulkan: change im2col to 512 elements per workgroup

Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where the batch_strides are overloaded to hold the row strides. Put the loads from the B matrix in the innermost loop because it should cache better. Share some code for reducing the result values to memory in mul_mat_vec_base.

…1027) * Fixes for clang AVX VNNI * enable AVX VNNI and alder lake build for MSVC * Apply suggestions from code review --------- Co-authored-by: slaren <[email protected]>

ggml-ci

JohannesGaessler and others added 22 commits January 3, 2025 13:33

tests: add tests for GGUF (llama/10830)

089373d

ggml: fix arm build with gcc (llama/10895)

66d5a45

Signed-off-by: Adrien Gallouët <[email protected]>

ggml : add test for SVE and disable when it fails (llama/10906)

88ac055

vulkan: build fixes for 32b (llama/10927)

02db05d

* vulkan: build fixes for 32b Should fix #10923 * vulkan: initialize some buffer/offset variables

ggml : fix run-time on FreeBSD in get_executable_path() (llama/10948)

f77c213

ggml : fix const usage in SSE path (llama/10962)

2f8aea9

ggml : fix arm enabled features check (llama/10961)

250b245

ggml : use wstring for backend search paths (llama/10960)

7f292ed

ggml-ci

vulkan: multi-row k quants (llama/10846)

e659119

* multi row k quant shaders! * better row selection * more row choices * readjust row selection * rm_kq=2 by default

vulkan: Use push constant offset to handle misaligned descriptors (ll…

92d38ff

…ama/10987)

vulkan: im2col and matmul optimizations for stable diffusion (llama/1…

3bcc231

…0942) * tests: Add im2col perf tests * vulkan: optimize im2col, more elements per thread * vulkan: increase small tile size for NV_coopmat2 * vulkan: change im2col to 512 elements per workgroup

ggml : fixes for AVXVNNI instruction set with MSVC and Clang (llama/1…

43c2a5c

…1027) * Fixes for clang AVX VNNI * enable AVX VNNI and alder lake build for MSVC * Apply suggestions from code review --------- Co-authored-by: slaren <[email protected]>

metal : avoid uint (llama/11019)

0937be4

sync : llama.cpp

add9f12

ggml-ci

ggerganov merged commit e61b9f5 into master Jan 3, 2025
10 checks passed

ggerganov deleted the sync-llama.cpp-25-01-03 branch January 3, 2025 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : llama.cpp #1060

sync : llama.cpp #1060

ggerganov commented Jan 3, 2025

sync : llama.cpp #1060

sync : llama.cpp #1060

Conversation

ggerganov commented Jan 3, 2025