feat: add potential to run Jina Embeddings architecture #6826

JoanFM · 2024-04-22T11:27:36Z

Thanks for having this awesome project. I have been trying to add support for Jina Embeddings (https://huggingface.co/jinaai/jina-embeddings-v2-base-en) in llama.cpp

This PR aims to be able to run in llama.cpp the Jina Embeddings architecture.

For this, the changes made are:

Define JinaBertModel into convert-hf-to-gguf.py to be able to extract the tensors into GGUF.
Set options to force ollama to load the model with proper vocab settings (Add EOS and BOS tokens)
Introduce the LLM_ARCH_JINA_BERT architecture and adapt the tensors used by the implementation.
Adapt the build_bert model to adapt to some small changes needed by the model (like not having positional embeddings)
(The most controversial thing) Adapt or fix the ALIBI computation of softmax to have the slope multiplied by the distance to the diagonal of the specific head attention.

JoanFM · 2024-04-22T16:07:04Z

Hey @ggerganov ,

I would like to get some comments specially on the Alibi implementation, which is what I found more confusing.

ggerganov · 2024-04-23T09:42:10Z

The way it is implemented now, ALiBi is not applied because KQ_pos is null. You need to apply the following patch:

diff --git a/llama.cpp b/llama.cpp
index 309f4eec..1230a4bc 100644
--- a/llama.cpp
+++ b/llama.cpp
@@ -4135,7 +4135,7 @@ static void llm_load_hparams(
 
     model.ftype = ml.ftype;
 
-    if (hparams.f_max_alibi_bias > 0.0f && model.arch != LLM_ARCH_JINA_BERT) {
+    if (hparams.f_max_alibi_bias > 0.0f) {
         hparams.need_kq_pos = true;
     }
 
@@ -7984,11 +7984,8 @@ struct llm_build_context {
 
         struct ggml_tensor * cur;
         struct ggml_tensor * inpL;
-        struct ggml_tensor * inp_pos = nullptr;
+        struct ggml_tensor * inp_pos = inp_pos = build_inp_pos();
 
-        if (model.arch != LLM_ARCH_JINA_BERT) {
-            inp_pos = build_inp_pos();
-        }
         struct ggml_tensor * inp_mean = build_inp_mean();
         struct ggml_tensor * inp_cls  = build_inp_cls();
 
@@ -8010,6 +8007,9 @@ struct llm_build_context {
         // KQ_mask (mask for 1 head, it will be broadcasted to all heads)
         struct ggml_tensor * KQ_mask = build_inp_KQ_mask(false);
 
+        // positions of the tokens in the KV cache
+        struct ggml_tensor * KQ_pos = build_inp_KQ_pos();
+
         // iterate layers
         for (int il = 0; il < n_layer; ++il) {
             struct ggml_tensor * cur = inpL;
@@ -8065,7 +8065,7 @@ struct llm_build_context {
             struct ggml_tensor * kq = ggml_mul_mat(ctx0, k, q);
             cb(kq, "kq", il);
 
-            kq = ggml_soft_max_ext(ctx0, kq, KQ_mask, nullptr, 1.0f/sqrtf(float(n_embd_head)), hparams.f_max_alibi_bias);
+            kq = ggml_soft_max_ext(ctx0, kq, KQ_mask, KQ_pos, 1.0f/sqrtf(float(n_embd_head)), hparams.f_max_alibi_bias);
             cb(kq, "kq_soft_max_ext", il);
 
             struct ggml_tensor * v = ggml_cont(ctx0, ggml_transpose(ctx0, ggml_reshape_2d(ctx0, Vcur, n_embd_gqa, n_tokens)));
@@ -11131,7 +11131,7 @@ static int llama_decode_internal(
         }
 
         // non-causal masks do not use the KV cache
-        if (hparams.causal_attn) {
+        if (hparams.causal_attn || model.arch == LLM_ARCH_JINA_BERT) {
             llama_kv_cache_update(&lctx);
 
             // if we have enough unused cells before the current head ->

But this still does not work because the KQ_pos is padded.

Let's revisit this PR after merging #5021 - I think the fix should be relatively simple, but will be easier to resolve conflicts after we merge #5021

JoanFM · 2024-04-30T12:13:10Z

Hey @ggerganov ,

I have a couple of comments from the suggestions you made:

The inp_pos is not needed for Jina Embedding architecture and not adding the architecture check gives an Assertion error when running it.
The KQ_pos is not the one we need, every time the vector [0, 1, 2 ... num_tokens]*slope is added to the data with AliBI. However we need to represent a matrix like this [[0, 1, 2 3, ...], [1, 0, 1, 2, ...], [2, 1, 0, 1, ...]]. Not sure how to create this behavior. Also the slope or the vector needs to be negative.

JoanFM · 2024-05-08T08:24:01Z

Hey @ggerganov ,

Is there anything missing?

…t-jina-embeddings

JoanFM · 2024-05-09T07:52:51Z

Hey @ggerganov , I fixed the last conflicts

ggerganov · 2024-05-09T09:18:29Z

Yup, thanks. I'll be looking today a bit more - think the ALiBi stuff still needs some changes/improvements. Hope to be ready soon

…t-jina-embeddings

ggerganov · 2024-05-10T07:51:49Z

@JoanFM Thanks to your ALiBi-related changes here, I realized that my understanding of how ALiBi works was wrong. I will do a refactoring of the functionality on the master branch: #7192

After the refactoring is ready, we will rebase this PR and merge it

ggml-ci

This reverts commit b83cc3f.

JoanFM added 10 commits April 11, 2024 14:27

feat: first things to do

86a5d96

feat: create tensors for Jina architecture

747d17a

fix: use other tensors

a40156a

feat: embedding gets results

b00d38b

fix: fix usage of ALIBI

cf1c144

fix: clean prints

63a1d7c

fix: do some cleanup unused vars

c229e48

fix: revert changes to Makefile and CMakeLists

e232370

fix: revert some changes

795ff1d

fix: fix small detail

d6ac931

JoanFM changed the title ~~Feat jina embeddings~~ (DRAFT) feat: add potential to run Jina Embeddings architecture Apr 22, 2024

JoanFM and others added 4 commits April 22, 2024 13:31

Merge branch 'master' into feat-jina-embeddings

db7e8ce

fix: fix convert formatting

c1c0f4d

fix: fix linting and editor

64cd4b1

feat: set proper vocab settings

71ff763

JoanFM marked this pull request as ready for review April 22, 2024 16:03

JoanFM changed the title ~~(DRAFT) feat: add potential to run Jina Embeddings architecture~~ feat: add potential to run Jina Embeddings architecture Apr 22, 2024

fix: JinaBertForMaskedLM registration

d7d6a4e

JoanFM force-pushed the feat-jina-embeddings branch from e946cb0 to d7d6a4e Compare April 23, 2024 07:49

JoanFM added 4 commits April 23, 2024 16:10

feat: support q_normalization and k_normalization in Jina arch

cde49b7

feat: handle gpt2 tokenizer with Jina architecture

dd060a2

feat: example comments in embedding

dfa0676

feat: rename Jina Bert to Jina Bert V2

c3f4b1f

JoanFM mentioned this pull request Apr 29, 2024

Tokenizers questions and ... proposals? #6980

Closed

Merge branch 'master' into feat-jina-embeddings

f8d1709

fix: add some changes as per review

d9b8dd6

JoanFM force-pushed the feat-jina-embeddings branch from da96368 to d9b8dd6 Compare April 30, 2024 12:34

JoanFM and others added 2 commits May 8, 2024 16:04

feat: add capacity to load models ES and DE for Spanish

cf9fcd8

Merge branch 'master' into feat-jina-embeddings

e59b546

ggerganov self-requested a review May 8, 2024 14:33

ggerganov and others added 3 commits May 8, 2024 19:43

llama : fix pre-tokenizers

b7ede48

Merge branch 'master' of https://github.com/JoanFM/llama.cpp into fea…

8e36fd5

…t-jina-embeddings

Merge branch 'master' of https://github.com/JoanFM/llama.cpp into fea…

849aeda

…t-jina-embeddings

Merge branch 'master' of https://github.com/JoanFM/llama.cpp into fea…

ee3250d

…t-jina-embeddings

mofosyne added Review Complexity : High Generally require indepth knowledge of LLMs or GPUs enhancement New feature or request labels May 9, 2024

ggerganov mentioned this pull request May 10, 2024

ggml : full ALiBi support #7192

Merged

8 tasks

ggerganov added 7 commits May 10, 2024 11:16

ggml : full ALiBi support

7fdca33

ggml : update ggml_soft_max_ext() CUDA, SYCL

d0592d4

ggml : ggml_flash_attn_ext() support ALiBi (CPU)

166e60b

ggml : ggml_flash_attn_ext() support ALiBi (Metal)

97c27f5

ggml : fix warning

f7055d3

ggml : ggml_flash_attn_ext() support ALiBi (CUDA)

865af99

ggml-ci

Merge remote-tracking branch 'origin/gg/refactor-alibi-2' into HEAD

d9adb88

ggerganov changed the base branch from master to gg/refactor-alibi-2 May 10, 2024 12:28

minor : clean-up

a1278f1

ggerganov changed the base branch from gg/refactor-alibi-2 to master May 11, 2024 07:32

Merge branch 'master' into HEAD

23499b8

ggerganov approved these changes May 11, 2024

View reviewed changes

embedding : add warning about missing SEP

49b3dbb

ggerganov merged commit b83cc3f into ggml-org:master May 11, 2024
56 of 61 checks passed

JoanFM added a commit to JoanFM/llama.cpp that referenced this pull request May 11, 2024

Revert "llama : add Jina Embeddings architecture (ggml-org#6826)"

16134b5

This reverts commit b83cc3f.

abhishekbhakat mentioned this pull request Sep 21, 2024

Feature Request: Support Jina V3 arch #9585

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add potential to run Jina Embeddings architecture #6826

feat: add potential to run Jina Embeddings architecture #6826

JoanFM commented Apr 22, 2024 •

edited

Loading

JoanFM commented Apr 22, 2024

ggerganov commented Apr 23, 2024

JoanFM commented Apr 30, 2024

JoanFM commented May 8, 2024

JoanFM commented May 9, 2024

ggerganov commented May 9, 2024

ggerganov commented May 10, 2024

feat: add potential to run Jina Embeddings architecture #6826

feat: add potential to run Jina Embeddings architecture #6826

Conversation

JoanFM commented Apr 22, 2024 • edited Loading

JoanFM commented Apr 22, 2024

ggerganov commented Apr 23, 2024

JoanFM commented Apr 30, 2024

JoanFM commented May 8, 2024

JoanFM commented May 9, 2024

ggerganov commented May 9, 2024

ggerganov commented May 10, 2024

JoanFM commented Apr 22, 2024 •

edited

Loading