Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Created a new
BatchedExecutor
which processes multiple "Conversations" in one single inference batch. This is faster, even when the conversations are unrelated, and is much faster if the conversations share some overlap (e.g. a common system prompt prefix).Conversations can be "forked", to create a copy of a conversation at a given point. This allows e.g. prompting a conversation with a system prefix just once and then forking it again and again for each individual conversation. Conversations can also be "rewound" to an earlier state.
Added two new examples, demonstrating forking and rewinding.
This is currently much "lower level" than the existing executors, and is really just a minimum viable system to move LLamaSharp over to batching. There is more work needed in the future:
Conversation
with functionality that the existing executors have (e.g. prompt templates, sampling)Conversation
with new capabilities (whatever is needed to handle out-of-context)How To Use
Brief guide to using the BatchedExecutor:
BatchedExecutor
Conversation
s withexecutor.Prompt("hello");
executor.Infer()
to run inference for all conversations which need inference simultaneouslyconversation.Sample()
)Conversation
objects have flags which indicate what state they're in (waiting for sampling, waiting for inference) and will throw exceptions if you try to use them wrong.