-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core][Frontend][Doc] Initial support for LLaVA-NeXT and GPT-4V Chat Completions API #3978
Conversation
- Refactor `OpenAIServingChat` and add function for loading image - Move `pillow` dev dependency to common - Add example chat template for LLaVA model
- Add general guide for using VLMs - Add LLavA to list of supported models
- Move `ServerRunner` to common file
- Incorrect loading of config (also rename `openai_api` to `image_openai`) - Incorrect await of stream generator
- Also, use the type definitions from `openai` directly
…ions API and legacy Completions API
- Remove channel conversion and resizing from OpenAI server preprocessing since the image processor in HuggingFace should be able to handle that - `MultiModalData` is now an abstract class that outputs additional kwargs to be input into the model. This was intially done to support LLaVA-NeXT's `image_size` parameter but can be extended to other models as well. - The application of image processor is now defined inside `MultiModalData` so that there is no need to extensively edit the engine to support other types of data - New `MultiModalData` subclasses: `ImagePixelData` and `ImageFeatureData` to better differentiate the two cases of image input - Refactored LLaVA-1.5 model to make it easier to inherit for defining LLaVA-NeXT model
@ywang96 Regarding your latest comment in #3042:
Actually, I have been working on supporting LLaVA-NeXT as well. As part of the effort, I have further refactored the image processing pipeline to output a dictionary which are used to input |
Yep - this is exactly what I had in my mind as well, but I think there are more issues in addition to it that we may want to address. For example, do we want to support the prompt format the same way as huggingface to make user experience easier, or at least keep it the same on the Interface level? I do think these are worth discussing (and the refactoring can happen later after we merge model support and API server support depending on how much work it will be). |
- Now, `ImagePixelData` only accepts `PIL.Image` input - Also move `torch` import out of `TYPE_CHECKING` as it is loaded anyways when importing `SamplingParams`
- Note the patch in `ImagePixelData`. To fully leverage the potential of LLaVA-Next, we should allow image of any size, but the feature size would then be variable.
I have just added support for LLaVA-NeXT, with one big caveat: the size of the input image is fixed, otherwise the feature size (i.e. number of |
These force pushes consolidate the fixes to the LLaVA test and example code. |
- Note that we now load the images directly instead of from `.pt` files
@ywang96 I think that this PR is suffering from scope creep. Perhaps I should break apart the changes into smaller segments to facilitate the conversation in #4194? I could split the changes as follows, with each item being its own PR:
Edit: Added links to the child PRs. |
I agree - I think OpenAI API server will be a good starting point since the interface should agree with OpenAI protocol anyways, and I'm sorry that this PR suffered :/ One suggestion I have is for a big change like this - it's probably good to have a series of PRs anyways. Take a look at Speculative decoding or Chunked Prefill - those are great examples. |
I have created the child PRs. |
- These changes are propagated to the child PRs
All of the child PRs have been completed, so I'm closing this now. |
To combat scope creep, this PR has been split into smaller ones.
[Frontend] Support GPT-4V Chat Completions API #4200[Frontend] Add OpenAI Vision API Support #5237The branch associated with this PR has been frozen (except for critical fixes). Once all dependencies have been merged, I will compare this branch against the merged (
main
) branch to verify that I didn't miss any changes.