Fix/async chat serving #2727

schoennenbeck · 2024-02-02T10:11:51Z

Changes:

Added OpenAIServingChat._load_chat_template_async which waits for the tokenizer to become available.
According changes to OpenAIServingChat.__init__ to accomodate this (analogous implementation to ServingEngine.__init__.
Added test to check correct behaviour
Switched scope of server-fixture in existing test to make sure GPU-memory is freed once the server is not needed any more.

simon-mo

Sorry about the delay in review. A more sensible fix seems to be changing the __init__ arguments of the super class OpenAIServing to accept any asyncio task to be run during its _post_init function so we are sure that the chat template can be loaded after tokenizers.

schoennenbeck · 2024-03-28T09:15:37Z

@simon-mo No worries about the delay, I very much appreciate the fact that you maintain this OSS-project in your free time.

Thanks for the feedback, I'll have a look into your suggestion and see what I can do.

schoennenbeck · 2024-04-15T12:13:04Z

@simon-mo
New implementation:

OpenAIServing's __init__ now accepts an optional await_post_init, which can be any awaitable that will be awaited at the end of the _post_init-method.
Some corresponding changes to related tests.

Open questions:

Naming of the argument: await_post_init is not a fantastic name I feel.
Interface: One could also make it possible for await_post_init to be a list of awaitables that are awaited in order at the end of _post_init to be even more flexible. I'm open for suggestions.

abin-tiger · 2024-04-18T09:11:00Z

We had faced the same when using with ray serve. This fix worked for us!

schoennenbeck · 2024-05-02T08:44:34Z

@simon-mo Would it be possible to get some feedback on this PR. The code changes are pretty minimal and seeing as there was another PR/ bug report last week it seems like there are multiple people facing this problem.

simon-mo · 2024-05-02T18:28:00Z

This looks good, and you fix the conflict and I will click merge

schoennenbeck · 2024-05-03T07:25:31Z

@simon-mo Thanks a lot. Merge requests are resolved and remaining issues with tests are fixed.

DarkLight1337 · 2024-05-07T08:49:22Z

vllm/entrypoints/openai/serving_chat.py

@@ -356,7 +359,10 @@ async def chat_completion_full_generator(

        return response

-    def _load_chat_template(self, chat_template: Optional[str]):
+    async def _load_chat_template(self, chat_template: Optional[str]):
+        while self.tokenizer is None:


I think this line is faulty. There is no code that actually initializes self.tokenizer = None before self._load_chat_template is called. You might want to instead use hasattr to check the existence of the attribute.

@DarkLight1337 You are right. Until two weeks ago ServingEngine set self.tokenizer = None in its __init__ but that changed in this commit.
The tests still pass because by the time _load_chat_template is awaited the tokenizer is now already there (which was the idea behind this in the first place). How do you want to handle this?

I could open another PR simply replacing while self.tokenizer is None by while getattr(self, "tokenizer", None) is None.

I guess inside tests/async_engine/test_chat_template.py, you can use another MockServingChat that doesn't have the chat_template attribute. However, I am doubtful about the value of such a test since it does not depend on OpenAIServing.__init__, making it useless if another commit changes the logic.

IMO It would be more useful to test a situation where the tokenizer takes considerably longer to load, making it more likely that the chat template will be accessed before the engine has fully loaded.

Honestly, I think the current design is not that great as it puts async logic into __init__. It would be better if __init__ requires tokenizer and chat_template upfront so that developers are encouraged to place the async logic outside of the constructor.

To maintain the same functionality as the current __init__, we can have a separate async staticmethod factory that does both async and initialization.

I agree with that sentiment. The only reason _post_init is async in the first place is that engine.get_model_config is async and in turn this is only async in order to enable the engine-workers to use ray. So 95% of the code is already synchronous and the remaining 5% are only artificially asynchronous to enable ray workers.

Loading the chat_template used to be synchronous (before this PR) but this didn't mesh with the async code in the ServingEngine's __init__

Honestly, I think the current design is not that great as it puts async logic into init. It would be better if init requires tokenizer and chat_template upfront so that developers are encouraged to place the async logic outside of the constructor.

To maintain the same functionality as the current init, we can have a separate async staticmethod factory that does both async and initialization.

If you don't mind, I'll work on a PR regarding this issue. This should make it no longer necessary to test the async behaviour of _load_chat_template, so the relevant tests will be removed.

Feel free to open a PR, but I'm not sure what you mean regarding the tests. As long as OpenAIServingChat can still be initiated in an async function I'm fine with everything ;)

I mean that the chat template tests will be reverted to sync functions as opposed to async.

Edit: I have opened #4674, feel free to take a look.

schoennenbeck added 6 commits February 2, 2024 10:04

Fixed chat serving init in async case

64060f3

Formatting

4f37b97

Smaller model in test

40fa66e

Formatting

deb2636

Better compatibility to old modality

c35d173

Use mock-engine in test

3ea2d5e

simon-mo self-assigned this Feb 2, 2024

simon-mo reviewed Mar 26, 2024

View reviewed changes

schoennenbeck added 4 commits April 15, 2024 08:15

Merge branch 'main' into fix/async_chat_serving

17fd021

Switched to arbitrary awaitable in post-init

0b5f1b3

isort

b94b75e

Forgot formatting

9716a8e

This was referenced Apr 24, 2024

[Bug]: AttributeError in OpenAIServingChat when accessing chat_template when using ray serve #4296

Closed

[Bugfix] Fix async initializer for OpenAI serving #4321

Closed

simon-mo approved these changes May 2, 2024

View reviewed changes

schoennenbeck added 3 commits May 3, 2024 05:41

Merge branch 'main' into fix/async_chat_serving

def3d4e

Format merged files

c0b79d6

Fixed tests

8e8db99

simon-mo merged commit f8e7add into vllm-project:main May 3, 2024
59 checks passed

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 6, 2024

Fix/async chat serving (vllm-project#2727)

e132240

DarkLight1337 reviewed May 7, 2024

View reviewed changes

z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024

Fix/async chat serving (vllm-project#2727)

83f0437

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request May 7, 2024

Fix/async chat serving (vllm-project#2727)

fe88e74

DarkLight1337 mentioned this pull request May 8, 2024

[Frontend] Move async logic outside of constructor #4674

Merged

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

schoennenbeck deleted the fix/async_chat_serving branch September 30, 2024 06:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/async chat serving #2727

Fix/async chat serving #2727

schoennenbeck commented Feb 2, 2024 •

edited

Loading

simon-mo left a comment

schoennenbeck commented Mar 28, 2024

schoennenbeck commented Apr 15, 2024

abin-tiger commented Apr 18, 2024

schoennenbeck commented May 2, 2024

simon-mo commented May 2, 2024

schoennenbeck commented May 3, 2024

DarkLight1337 May 7, 2024

schoennenbeck May 7, 2024 •

edited

Loading

schoennenbeck May 7, 2024

DarkLight1337 May 7, 2024

DarkLight1337 May 7, 2024 •

edited

Loading

schoennenbeck May 7, 2024

schoennenbeck May 7, 2024

DarkLight1337 May 8, 2024

schoennenbeck May 8, 2024

DarkLight1337 May 8, 2024 •

edited

Loading

Fix/async chat serving #2727

Fix/async chat serving #2727

Conversation

schoennenbeck commented Feb 2, 2024 • edited Loading

simon-mo left a comment

Choose a reason for hiding this comment

schoennenbeck commented Mar 28, 2024

schoennenbeck commented Apr 15, 2024

abin-tiger commented Apr 18, 2024

schoennenbeck commented May 2, 2024

simon-mo commented May 2, 2024

schoennenbeck commented May 3, 2024

DarkLight1337 May 7, 2024

Choose a reason for hiding this comment

schoennenbeck May 7, 2024 • edited Loading

Choose a reason for hiding this comment

schoennenbeck May 7, 2024

Choose a reason for hiding this comment

DarkLight1337 May 7, 2024

Choose a reason for hiding this comment

DarkLight1337 May 7, 2024 • edited Loading

Choose a reason for hiding this comment

schoennenbeck May 7, 2024

Choose a reason for hiding this comment

schoennenbeck May 7, 2024

Choose a reason for hiding this comment

DarkLight1337 May 8, 2024

Choose a reason for hiding this comment

schoennenbeck May 8, 2024

Choose a reason for hiding this comment

DarkLight1337 May 8, 2024 • edited Loading

Choose a reason for hiding this comment

schoennenbeck commented Feb 2, 2024 •

edited

Loading

schoennenbeck May 7, 2024 •

edited

Loading

DarkLight1337 May 7, 2024 •

edited

Loading

DarkLight1337 May 8, 2024 •

edited

Loading