-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chat session state management #560
Chat session state management #560
Conversation
… stateful executors and context
… from history and process system message methods for pre-processing prompts. Serializing executor state to JSON, to avoid saved states from being updated by reference.
…ration that resets chat to original point of history without extra processing
Thanks for this PR! I'm not very familiar with the higher level executors, but I've just left a few comments on things I spotted :) As a general note. I'm personally working on bringing a brand new executor to LLamaSharp based on the |
…ssionState.ContextState no nullable
Sorry, I've over-thought it a bit with nullable ExecutorBaseState and context State. I've just made those non-nullable and removed all the |
Huh, it indeed does, I should've checked it out. Some notes on what would make BatchExecutor viable for my use-case:
I'd be happy to help, if that direction would be useful for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing those previous comments, this looks good to me 👍
I won't merge it myself immediately, since I'm not very familiar with the high level chat/executor stuff. If no one else has any comments I'll merge it before the next release happens :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution! The part of pre-filling prompt looks good to me while I have some concerns about the the part of session state. I'm open for any discussion and please feel free to ping me if there's something blocking you. :)
@eublefar > If I process batches too big - frames per second suffer.... I'm not sure if this will help, but it helped me in a similar situation. By default, we transfer control to await UniTask.NextFrame(); and we believe that this is sufficient from the point of view of control logic.
At a fixed frequency of 35 FPS
During execution, FPS may drop to 7-6 frames for the entire time of inference. Quite a crude technique, but you can simply add additional await UniTask.NextFrame();
When adding one additional UniTask.NextFrame(); FPS reaches up to 25-35. This has almost no effect on the speed of the withdrawal itself, and eliminates the Moire effect (frieze), since regardless of the generation length, only the generation time of one token matters. Additional question. Why is the "seed" set in ModelParams. When it is more logical in inferenceParams, where it is needed. |
LLamaSharp has IModelParams which is everything required to load a model into memory and (If you want to ask any followup questions please open an issue and ping me, to keep this PR on topic) |
@eublefar Hi, how's it going? Many thanks for your contribution. We'll be happy if you could complete this PR but it shouldn't be blamed if you don't have time to continue. Please let us know if you're not available in the future two weeks and I'll finish this work myself. :) |
@AsakusaRinne Hey, sorry, just got back from the vacation. I'll get on it today :) |
@AsakusaRinne I've suffered a bit with serializing/deserializing transforms, but It's working now I think. |
Tests pass on my machine and I can't figure out which one is crashing here, any suggestions? |
Tests are a little flakey at the moment (language models are huge, so I think we're just trying to do too much work in some tests for the github runners to handle). Since you passed on Linux, and this PR isn't really platform specific, you're probably ok. I've restarted the failed CI runs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, sorry for the delay of the review. It's impressing and the overall looks good to me, with a few comments left. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, many thanks for this contribution! :)
As discussed here #559
I've added few things:
StatefulExecutorBase.AddPromptAsync
- basically runs the text through decode without sampling new tokens.I am basically running InferInternal twice with WaitForInput = true, but it's very reliant on specific implementation of child classes and not the abstraction itself, so maybe you'll have some suggestions on how to improve it.
void LoadSession(SessionState state)
andSessionState GetSessionState()
for ChatSession class to save session state at arbitrary point.static Task<ChatSession> ChatSession.InitializeSessionFromHistoryAsync(ILLamaExecutor executor, ChatHistory history, CancellationToken cancellationToken = default)
andTask<ChatSession> ChatSession.ProcessSystemMessage(string content)
to pre-process KV cache.ChatSessionWithRestart
to show off how it worksI didn't find any ChatSession unit tests so I did not write any.
BR,
Mykyta