Introduced a new `BatchedExecutor` #503

martindevans · 2024-02-09T23:51:18Z

Created a new BatchedExecutor which processes multiple "Conversations" in one single inference batch. This is faster, even when the conversations are unrelated, and is much faster if the conversations share some overlap (e.g. a common system prompt prefix).

Conversations can be "forked", to create a copy of a conversation at a given point. This allows e.g. prompting a conversation with a system prefix just once and then forking it again and again for each individual conversation. Conversations can also be "rewound" to an earlier state.

Added two new examples, demonstrating forking and rewinding.

This is currently much "lower level" than the existing executors, and is really just a minimum viable system to move LLamaSharp over to batching. There is more work needed in the future:

Wrap Conversation with functionality that the existing executors have (e.g. prompt templates, sampling)
Extend Conversation with new capabilities (whatever is needed to handle out-of-context)
Extend executor with new capabilities, such as saving/loading the entire batch.

How To Use

Brief guide to using the BatchedExecutor:

Create a BatchedExecutor
Create one or more new Conversations with executor.Prompt("hello");
Call executor.Infer() to run inference for all conversations which need inference simultaneously
Sample each conversation individually (conversation.Sample())
Prompt the conversation with the token chosen by sampling, or with more user input
goto 3

Conversation objects have flags which indicate what state they're in (waiting for sampling, waiting for inference) and will throw exceptions if you try to use them wrong.

…ns" in one single inference batch. This is faster, even when the conversations are unrelated, and is much faster if the conversations share some overlap (e.g. a common system prompt prefix). Conversations can be "forked", to create a copy of a conversation at a given point. This allows e.g. prompting a conversation with a system prefix just once and then forking it again and again for each individual conversation. Conversations can also be "rewound" to an earlier state. Added two new examples, demonstrating forking and rewinding.

…* access to directly modify the KV cache. - Re-implmented `Rewind` as an extension method using `Modify` internally - Implemented `ShiftLeft`, which shifts everything over except for some starting tokens. This is the same as the `StatelessExecutor` out-of-context handling. - Starting batch at epoch 1, this ensures that conversations (starting at zero) are below the current epoch. It also means `0` can always be used as a value guaranteed to be below the current epoch.

…osed.

martindevans force-pushed the batched_executor_again branch from 225ba47 to b0acecf Compare February 9, 2024 23:57

martindevans added 4 commits February 11, 2024 23:20

Added a Finalizer for Conversation in case it is not correctly disp…

0c2cff0

…osed.

Added a finalizer to BatchedExecutor

1cc463b

Added Divide to KvAccessor

e9d9042

martindevans added the minor-release label Feb 15, 2024

martindevans merged commit d03c1a9 into SciSharp:master Feb 15, 2024
3 checks passed

martindevans deleted the batched_executor_again branch February 15, 2024 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduced a new `BatchedExecutor` #503

Introduced a new `BatchedExecutor` #503

martindevans commented Feb 9, 2024 •

edited

Loading

Introduced a new BatchedExecutor #503

Introduced a new BatchedExecutor #503

Conversation

martindevans commented Feb 9, 2024 • edited Loading

How To Use

Introduced a new `BatchedExecutor` #503

Introduced a new `BatchedExecutor` #503

martindevans commented Feb 9, 2024 •

edited

Loading