Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train : fix KQ_pos allocation #3392

Merged
merged 2 commits into from
Sep 29, 2023
Merged

train : fix KQ_pos allocation #3392

merged 2 commits into from
Sep 29, 2023

Conversation

ggerganov
Copy link
Owner

fix #3389

#3228 changes seem to have broken the train examples. I think this should fix it

@ggerganov ggerganov requested a review from xaedes September 29, 2023 08:52
@xaedes
Copy link
Collaborator

xaedes commented Sep 29, 2023

I am currently letting it run a test finetune & train to see if it actually works, but from looking at it I think this should be correct.

On a sitenote, a function ggml_range(ctx, dtype, start, stop, step) analog to numpy.arange would be nice to have for filling a tensor with a sequence of values.
It is a obvious basic primitive for array/tensor processing libs and would avoid the need for this manual for loop.

struct ggml_tensor * KQ_pos = ggml_new_tensor_1d(ctx, GGML_TYPE_I32, N);
ggml_allocr_alloc(alloc, KQ_pos);
if (!ggml_allocr_is_measure(alloc)) {
    int * data = (int *) KQ_pos->data;
    for (int i = 0; i < N; ++i) {
        data[i] = n_past + i;
    }
}

would then just look like this:

struct ggml_tensor * KQ_pos = ggml_range(ctx, GGML_TYPE_I32, 0, N, 1);
ggml_allocr_alloc(alloc, KQ_pos);

@ggerganov
Copy link
Owner Author

Yup, ggml_range() is a great idea - we will add it

@xaedes
Copy link
Collaborator

xaedes commented Sep 29, 2023

Yup, ggml_range() is a great idea - we will add it

Ok, after testing the train & finetune from this PR here, I will make a PR for ggml_range.

@slaren
Copy link
Collaborator

slaren commented Sep 29, 2023

ggml_range looks useful, but as it is now, implementing it would require adding an implementation in every backend.

Copy link
Collaborator

@xaedes xaedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making sure that KQ_pos is not reallocated was missing in finetune.

Performed some finetune and train tests, the results indicate that it works.

@ggerganov ggerganov merged commit bc34dd4 into master Sep 29, 2023
@ggerganov ggerganov deleted the train-fix-kq-pos branch September 29, 2023 16:05
joelkuiper added a commit to vortext/llama.cpp that referenced this pull request Oct 2, 2023
…example

* 'master' of github.com:ggerganov/llama.cpp:
  ggml-cuda : perform cublas mat mul of quantized types as f16 (ggerganov#3412)
  llama.cpp : add documentation about rope_freq_base and scale values (ggerganov#3401)
  train : fix KQ_pos allocation (ggerganov#3392)
  llama : quantize up to 31% faster on Linux and Windows with mmap (ggerganov#3206)
  readme : update hot topics + model links (ggerganov#3399)
  readme : add link to grammars app (ggerganov#3388)
  swift : fix build on xcode 15 (ggerganov#3387)
  build : enable more non-default compiler warnings (ggerganov#3200)
  ggml_tensor: update the structure comments. (ggerganov#3283)
  ggml : release the requested thread pool resource (ggerganov#3292)
  llama.cpp : split llama_context_params into model and context params (ggerganov#3301)
  ci : multithreaded builds (ggerganov#3311)
  train : finetune LORA (ggerganov#2632)
  gguf : basic type checking in gguf_get_* (ggerganov#3346)
  gguf : make token scores and types optional (ggerganov#3347)
  ci : disable freeBSD builds due to lack of VMs (ggerganov#3381)
  llama : custom attention mask + parallel decoding + no context swaps (ggerganov#3228)
  docs : mark code as Bash (ggerganov#3375)
  readme : add Mistral AI release 0.1 (ggerganov#3362)
  ggml-cuda : perform cublas fp16 matrix multiplication as fp16 (ggerganov#3370)
yusiwen pushed a commit to yusiwen/llama.cpp that referenced this pull request Oct 7, 2023
* train : fix KQ_pos allocation

* make sure KQ_pos is not reallocated in finetune

---------

Co-authored-by: xaedes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

train-text-from-scratch.cpp: dereferenced NULL KQ_pos->data
3 participants