Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Move guide for multimodal model and other improvements #6168

Merged
merged 11 commits into from
Jul 6, 2024
6 changes: 3 additions & 3 deletions docs/source/dev/input_processing/model_inputs_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ Input Processing

.. currentmodule:: vllm.inputs

vLLM provides a mechanism for defining input processors for each model so that the inputs are processed
in :class:`~vllm.LLMEngine` before they are passed to model executors.
Each model can override parts of vLLM's :ref:`input processing pipeline <input_processing_pipeline>` via
:data:`~vllm.inputs.INPUT_REGISTRY` and :data:`~vllm.multimodal.MULTIMODAL_REGISTRY`.

Currently, this mechanism is only utilized in :ref:`multi-modal models <multi_modality>` for preprocessing multi-modal input
Currently, this mechanism is only utilized in :ref:`multi-modal <multi_modality>` models for preprocessing multi-modal input
data in addition to input prompt, but it can be extended to text-only language models when needed.

Guides
Expand Down
20 changes: 6 additions & 14 deletions docs/source/dev/multimodal/multimodal_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,25 +7,17 @@ Multi-Modality

vLLM provides experimental support for multi-modal models through the :mod:`vllm.multimodal` package.

:class:`vllm.inputs.PromptStrictInputs` accepts an additional attribute ``multi_modal_data``
which allows you to pass in multi-modal input alongside text and token prompts.
Multi-modal input can be passed alongside text and token prompts to :ref:`supported models <supported_vlms>`
via the ``multi_modal_data`` field in :class:`vllm.inputs.PromptStrictInputs`.

.. note::
``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through
:class:`vllm.multimodal.MULTIMODAL_REGISTRY`.
the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`.

By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model, please follow :ref:`the guide for adding a new multimodal model. <adding_a_new_multimodal_model>`.
To implement a new multi-modal model in vLLM, please follow :ref:`this guide <enabling_multimodal_inputs>`.


# TODO: Add more instructions on how to do that once embeddings is in.

Guides
++++++

.. toctree::
:maxdepth: 1

adding_multimodal_model
..
TODO: Add more instructions on how to add new plugins once embeddings is in.

Module Contents
+++++++++++++++
Expand Down
2 changes: 2 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ Documentation

models/supported_models
models/adding_model
models/enabling_multimodal_inputs
models/engine_args
models/lora
models/vlm
Expand All @@ -116,6 +117,7 @@ Documentation
automatic_prefix_caching/details

.. toctree::
:maxdepth: 2
:caption: Developer Documentation

dev/sampling_params
Expand Down
38 changes: 21 additions & 17 deletions docs/source/models/adding_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ This document provides a high-level guide on integrating a `HuggingFace Transfor
The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM.
However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex.

.. note::
By default, vLLM models do not support multi-modal inputs. To enable multi-modal support,
please follow :ref:`this guide <enabling_multimodal_inputs>` after implementing the model here.

.. tip::
If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ repository.
We will be happy to help you out!
Expand Down Expand Up @@ -44,23 +48,23 @@ Next, you need to rewrite the :meth:`~torch.nn.Module.forward` method of your mo

.. code-block:: diff

def forward(
self,
input_ids: torch.Tensor,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- past_key_values: Optional[List[torch.FloatTensor]] = None,
- inputs_embeds: Optional[torch.FloatTensor] = None,
- labels: Optional[torch.LongTensor] = None,
- use_cache: Optional[bool] = None,
- output_attentions: Optional[bool] = None,
- output_hidden_states: Optional[bool] = None,
- return_dict: Optional[bool] = None,
-) -> Union[Tuple, CausalLMOutputWithPast]:
+ positions: torch.Tensor,
+ kv_caches: List[torch.Tensor],
+ attn_metadata: AttentionMetadata,
+) -> Optional[SamplerOutput]:
def forward(
self,
input_ids: torch.Tensor,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- past_key_values: Optional[List[torch.FloatTensor]] = None,
- inputs_embeds: Optional[torch.FloatTensor] = None,
- labels: Optional[torch.LongTensor] = None,
- use_cache: Optional[bool] = None,
- output_attentions: Optional[bool] = None,
- output_hidden_states: Optional[bool] = None,
- return_dict: Optional[bool] = None,
- ) -> Union[Tuple, CausalLMOutputWithPast]:
+ positions: torch.Tensor,
+ kv_caches: List[torch.Tensor],
+ attn_metadata: AttentionMetadata,
+ ) -> Optional[SamplerOutput]:
ywang96 marked this conversation as resolved.
Show resolved Hide resolved

1. Update the code by considering that :code:`input_ids` and :code:`positions` are now flattened tensors.
2. Replace the attention operation with either :code:`PagedAttention`, :code:`PagedAttentionWithRoPE`, or :code:`PagedAttentionWithALiBi` depending on the model's architecture.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,26 +1,21 @@
.. _adding_a_new_multimodal_model:
.. _enabling_multimodal_inputs:

Adding a New Multimodal Model
=============================
Enabling Multimodal Inputs
==========================

This document provides a high-level guide on integrating a :ref:`multi-modal model <multi_modality>` into vLLM.
This document walks you through the steps to extend a vLLM model so that it accepts :ref:`multi-modal <multi_modality>` inputs.

.. note::
The complexity of adding a new model depends heavily on the model's architecture.
The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM.
However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex.

.. tip::
If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ repository.
We will be happy to help you out!
.. seealso::
Comment on lines -8 to -15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe keep some of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already linked users to the base "Adding a New Model" which includes the same content.

:ref:`adding_a_new_model`


1. Set up the base vLLM model
1. Update the base vLLM model
-----------------------------

As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model in vLLM, but note the following:
It is assumed that you have already implemented the model in vLLM according to :ref:`these steps <adding_a_new_model>`.
Further update the model as follows:

- You should additionally implement the :class:`~vllm.model_executor.models.interfaces.SupportsVision` interface.
- Implement the :class:`~vllm.model_executor.models.interfaces.SupportsVision` interface.

.. code-block:: diff

Expand All @@ -33,19 +28,19 @@ As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model
The model class does not have to be named :code:`*ForCausalLM`.
Check out `the HuggingFace Transformers documentation <https://huggingface.co/docs/transformers/model_doc/auto#multimodal>`__ for some examples.

- While implementing the :meth:`~torch.nn.Module.forward` method, reserve a keyword parameter
- If you haven't already done so, reserve a keyword parameter in :meth:`~torch.nn.Module.forward`
for each input tensor that corresponds to a multi-modal input, as shown in the following example:

.. code-block:: diff

def forward(
self,
input_ids: torch.Tensor,
positions: torch.Tensor,
kv_caches: List[torch.Tensor],
attn_metadata: AttentionMetadata,
+ pixel_values: torch.Tensor,
) -> SamplerOutput:
def forward(
self,
input_ids: torch.Tensor,
positions: torch.Tensor,
kv_caches: List[torch.Tensor],
attn_metadata: AttentionMetadata,
+ pixel_values: torch.Tensor,
) -> SamplerOutput:


2. Register input mappers
Expand All @@ -68,8 +63,8 @@ A default mapper is available for each modality in the core vLLM library. This i
:ref:`input_processing_pipeline`


3. Register maximum number of multimodal tokens
----------------------------------------------------------
3. Register maximum number of multi-modal tokens
------------------------------------------------

For each modality type that the model accepts as input, calculate the maximum possible number of tokens
and register it via :meth:`INPUT_REGISTRY.register_dummy_data <vllm.inputs.registry.InputRegistry.register_max_multimodal_tokens>`.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/models/supported_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ Vision Language Models
-

If your model uses one of the above model architectures, you can seamlessly run your model with vLLM.
Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Adding a New Multimodal Model <adding_a_new_multimodal_model>`
Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Enabling Multimodal Inputs <enabling_multimodal_inputs>`
for instructions on how to implement support for your model.
Alternatively, you can raise an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ project.

Expand Down
2 changes: 1 addition & 1 deletion vllm/inputs/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ def dummy_data_for_profiling(self, model_config: "ModelConfig",
The model is identified by ``model_config``.

See also:
:ref:`adding_a_new_multimodal_model`
:ref:`enabling_multimodal_inputs`
"""
# Avoid circular import
from vllm.model_executor.model_loader import get_model_architecture
Expand Down
11 changes: 6 additions & 5 deletions vllm/multimodal/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,8 @@ def register_input_mapper(
If `None` is provided, then the default input mapper is used instead.

See also:
:ref:`input_processing_pipeline`
:ref:`adding_a_new_multimodal_model`
- :ref:`input_processing_pipeline`
- :ref:`enabling_multimodal_inputs`
"""

def wrapper(model_cls: N) -> N:
Expand Down Expand Up @@ -192,7 +192,8 @@ def map_input(self, model_config: ModelConfig,
TypeError: If the data type is not supported.

See also:
:ref:`adding_a_new_multimodal_model`
- :ref:`input_processing_pipeline`
- :ref:`enabling_multimodal_inputs`
"""
# Avoid circular import
from vllm.model_executor.model_loader import get_model_architecture
Expand Down Expand Up @@ -230,7 +231,7 @@ def register_max_multimodal_tokens(
If `None` is provided, then the default calculation is used instead.

See also:
:ref:`adding_a_new_multimodal_model`
:ref:`enabling_multimodal_inputs`
"""

def wrapper(model_cls: N) -> N:
Expand Down Expand Up @@ -260,7 +261,7 @@ def get_max_multimodal_tokens(self, model_config: ModelConfig) -> int:
The model is identified by ``model_config``.

See also:
:ref:`adding_a_new_multimodal_model`
:ref:`enabling_multimodal_inputs`
"""
# Avoid circular import
from vllm.model_executor.model_loader import get_model_architecture
Expand Down
Loading