-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/async chat serving #2727
Fix/async chat serving #2727
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay in review. A more sensible fix seems to be changing the __init__
arguments of the super class OpenAIServing
to accept any asyncio task to be run during its _post_init
function so we are sure that the chat template can be loaded after tokenizers.
@simon-mo No worries about the delay, I very much appreciate the fact that you maintain this OSS-project in your free time. Thanks for the feedback, I'll have a look into your suggestion and see what I can do. |
@simon-mo
Open questions:
|
We had faced the same when using with ray serve. This fix worked for us! |
@simon-mo Would it be possible to get some feedback on this PR. The code changes are pretty minimal and seeing as there was another PR/ bug report last week it seems like there are multiple people facing this problem. |
This looks good, and you fix the conflict and I will click merge |
@simon-mo Thanks a lot. Merge requests are resolved and remaining issues with tests are fixed. |
@@ -356,7 +359,10 @@ async def chat_completion_full_generator( | |||
|
|||
return response | |||
|
|||
def _load_chat_template(self, chat_template: Optional[str]): | |||
async def _load_chat_template(self, chat_template: Optional[str]): | |||
while self.tokenizer is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is faulty. There is no code that actually initializes self.tokenizer = None
before self._load_chat_template
is called. You might want to instead use hasattr
to check the existence of the attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DarkLight1337 You are right. Until two weeks ago ServingEngine
set self.tokenizer = None
in its __init__
but that changed in this commit.
The tests still pass because by the time _load_chat_template
is awaited the tokenizer is now already there (which was the idea behind this in the first place). How do you want to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could open another PR simply replacing while self.tokenizer is None
by while getattr(self, "tokenizer", None) is None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess inside tests/async_engine/test_chat_template.py
, you can use another MockServingChat
that doesn't have the chat_template
attribute. However, I am doubtful about the value of such a test since it does not depend on OpenAIServing.__init__
, making it useless if another commit changes the logic.
IMO It would be more useful to test a situation where the tokenizer takes considerably longer to load, making it more likely that the chat template will be accessed before the engine has fully loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I think the current design is not that great as it puts async
logic into __init__
. It would be better if __init__
requires tokenizer
and chat_template
upfront so that developers are encouraged to place the async
logic outside of the constructor.
To maintain the same functionality as the current __init__
, we can have a separate async staticmethod
factory that does both async
and initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that sentiment. The only reason _post_init
is async in the first place is that engine.get_model_config
is async and in turn this is only async in order to enable the engine-workers to use ray. So 95% of the code is already synchronous and the remaining 5% are only artificially asynchronous to enable ray workers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading the chat_template used to be synchronous (before this PR) but this didn't mesh with the async code in the ServingEngine
's __init__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I think the current design is not that great as it puts async logic into init. It would be better if init requires tokenizer and chat_template upfront so that developers are encouraged to place the async logic outside of the constructor.
To maintain the same functionality as the current init, we can have a separate async staticmethod factory that does both async and initialization.
If you don't mind, I'll work on a PR regarding this issue. This should make it no longer necessary to test the async behaviour of _load_chat_template
, so the relevant tests will be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to open a PR, but I'm not sure what you mean regarding the tests. As long as OpenAIServingChat
can still be initiated in an async function I'm fine with everything ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that the chat template tests will be reverted to sync functions as opposed to async.
Edit: I have opened #4674, feel free to take a look.
This fixes Issue 2683
Changes:
OpenAIServingChat._load_chat_template_async
which waits for the tokenizer to become available.OpenAIServingChat.__init__
to accomodate this (analogous implementation toServingEngine.__init__
.