Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model]: Support for InternVL2 #6321

Closed
Weiyun1025 opened this issue Jul 11, 2024 · 1 comment · Fixed by #6514
Closed

[Model]: Support for InternVL2 #6321

Weiyun1025 opened this issue Jul 11, 2024 · 1 comment · Fixed by #6514
Labels
new model Requests to new models

Comments

@Weiyun1025
Copy link

🚀 The feature, motivation and pitch

InternVL2 is currently the most powerful open-source Multimodal Large Language Model (MLLM). The InternVL2 family includes models ranging from a 2B model, suitable for edge devices, to a 108B model, which is significantly more powerful. With larger-scale language models, InternVL2-Pro demonstrates outstanding multimodal understanding capabilities, matching the performance of commercial closed-source models across various benchmarks.

Given the significant potential of InternVL2, we believe that integrating it with vLLM would greatly benefit both the vLLM community and users of this model. We kindly request your assistance in enabling the deployment of InternVL2 using the vLLM framework.

We look forward to your positive response and are eager to collaborate on this exciting endeavor.

Alternatives

No response

Additional context

Blog:https://internvl.github.io/blog/2024-07-02-InternVL-2.0/
Model Family:https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e

@DarkLight1337 DarkLight1337 added new model Requests to new models and removed feature request labels Jul 11, 2024
@DarkLight1337 DarkLight1337 changed the title [Feature]: Suport for InternVL2 [Model]: Support for InternVL2 Jul 11, 2024
@ywang96
Copy link
Member

ywang96 commented Jul 14, 2024

Hey @Weiyun1025! Thank you for making this issue and I took a brief look at the model repo https://huggingface.co/OpenGVLab/InternVL2-40B/tree/main. It seems to me that supporting this model should be pretty straightforward (similar to what we did with Phi-3-vision).

Are you planning to make a pull request on this? If so, feel free to take a look at other vision language model implementations on vLLM and let us know if you run into any issue. We're happy to help you on getting this model supported.

If you cannot make a pull request, I will try to see if I have some bandwidth to make a PR on this. Feel free to check out #4194 for the full roadmap around multi-modality.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model Requests to new models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants