Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement context shifting in executor base #714

Merged
merged 3 commits into from
May 4, 2024

Conversation

ksanman
Copy link
Contributor

@ksanman ksanman commented May 2, 2024

Based on the discussion in #713

This will allow the Interactive/Instruct to avoid NoKvSlot errors when the chat history/prompt is longer than the model context.

This is the same logic from the stateless executor and llama.cpp main example

@martindevans
Copy link
Member

There's some interesting differences between the llama.cpp example code you linked, and the implementation in your code (and the stateless executor).

Stateless:

llama_kv_cache_seq_rm(ctx,  0, inferenceParams.TokensKeep + 1, inferenceParams.TokensKeep + n_discard + 1);
llama_kv_cache_seq_add(ctx, 0, inferenceParams.TokensKeep + 1 + n_discard, n_past, -n_discard);

llama.cpp example:

llama_kv_cache_seq_rm (ctx, 0, params.n_keep            , params.n_keep + n_discard);
llama_kv_cache_seq_add(ctx, 0, params.n_keep + n_discard, n_past, -n_discard);

Some weird +1 bits in there.

Digging through the history in llama.cpp it looks like the example code used to look like that until this PR.

Would you be up for modifying it to work like the new example code, testing that, and if it works also porting that change back across to the stateless executor?

@ksanman
Copy link
Contributor Author

ksanman commented May 3, 2024

I have added a method to add +1 to tokensKeep if the BOS token is available and refactored the executors to be inline with the example code.

The models I have are working fine, but I need to test some other models that do not have BOS (like Falcon) to see if they work.

@martindevans
Copy link
Member

Looks good so far! Thanks for doing the extra work.

@SignalRT
Copy link
Collaborator

SignalRT commented May 4, 2024

It seems ok to me.

@ksanman
Copy link
Contributor Author

ksanman commented May 4, 2024

I tested all the supported models, and they are working correctly.

@martindevans
Copy link
Member

Excellent, thanks for the hard work @ksanman!

@martindevans martindevans merged commit 9906871 into SciSharp:master May 4, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants