Skip to content

Commit

Permalink
[Doc] Guide for adding multi-modal plugins (vllm-project#6205)
Browse files Browse the repository at this point in the history
  • Loading branch information
DarkLight1337 authored and dtrifiro committed Jul 17, 2024
1 parent a6eb1fb commit 6b715a2
Show file tree
Hide file tree
Showing 7 changed files with 64 additions and 23 deletions.
1 change: 1 addition & 0 deletions docs/source/_templates/sections/header.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
justify-content: center;
align-items: center;
font-size: 16px;
padding: 0 6px 0 6px;
}
.notification-bar p {
margin: 0;
Expand Down
17 changes: 17 additions & 0 deletions docs/source/dev/multimodal/adding_multimodal_plugin.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.. _adding_multimodal_plugin:

Adding a Multimodal Plugin
==========================

This document teaches you how to add a new modality to vLLM.

Each modality in vLLM is represented by a :class:`~vllm.multimodal.MultiModalPlugin` and registered to :data:`~vllm.multimodal.MULTIMODAL_REGISTRY`.
For vLLM to recognize a new modality type, you have to create a new plugin and then pass it to :meth:`~vllm.multimodal.MultiModalRegistry.register_plugin`.

The remainder of this document details how to define custom :class:`~vllm.multimodal.MultiModalPlugin` s.

.. note::
This article is a work in progress.

..
TODO: Add more instructions on how to add new plugins once embeddings is in.
24 changes: 16 additions & 8 deletions docs/source/dev/multimodal/multimodal_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,21 @@ Multi-Modality

vLLM provides experimental support for multi-modal models through the :mod:`vllm.multimodal` package.

Multi-modal input can be passed alongside text and token prompts to :ref:`supported models <supported_vlms>`
Multi-modal inputs can be passed alongside text and token prompts to :ref:`supported models <supported_vlms>`
via the ``multi_modal_data`` field in :class:`vllm.inputs.PromptStrictInputs`.

.. note::
``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through
the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`.
Currently, vLLM only has built-in support for image data. You can extend vLLM to process additional modalities
by following :ref:`this guide <adding_multimodal_plugin>`.

To implement a new multi-modal model in vLLM, please follow :ref:`this guide <enabling_multimodal_inputs>`.
Looking to add your own multi-modal model? Please follow the instructions listed :ref:`here <enabling_multimodal_inputs>`.

..
TODO: Add more instructions on how to add new plugins once embeddings is in.
Guides
++++++

.. toctree::
:maxdepth: 1

adding_multimodal_plugin

Module Contents
+++++++++++++++
Expand All @@ -36,10 +40,14 @@ Registry
Base Classes
------------

.. autoclass:: vllm.multimodal.MultiModalDataDict
.. autodata:: vllm.multimodal.BatchedTensors

.. autoclass:: vllm.multimodal.MultiModalDataBuiltins
:members:
:show-inheritance:

.. autodata:: vllm.multimodal.MultiModalDataDict

.. autoclass:: vllm.multimodal.MultiModalInputs
:members:
:show-inheritance:
Expand Down
5 changes: 3 additions & 2 deletions vllm/multimodal/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from .base import (BatchedTensors, MultiModalDataDict, MultiModalInputs,
MultiModalPlugin)
from .base import (BatchedTensors, MultiModalDataBuiltins, MultiModalDataDict,
MultiModalInputs, MultiModalPlugin)
from .registry import MultiModalRegistry

MULTIMODAL_REGISTRY = MultiModalRegistry()
Expand All @@ -13,6 +13,7 @@

__all__ = [
"BatchedTensors",
"MultiModalDataBuiltins",
"MultiModalDataDict",
"MultiModalInputs",
"MultiModalPlugin",
Expand Down
21 changes: 13 additions & 8 deletions vllm/multimodal/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,6 @@ def try_concat(
*,
device: torch.types.Device,
) -> BatchedTensors:
# Avoid initializing CUDA too early
import torch

unbatched_shape = tensors[0].shape[1:]

for tensor in tensors:
Expand Down Expand Up @@ -84,16 +81,21 @@ def batch(


class MultiModalDataBuiltins(TypedDict, total=False):
"""Modality types that are predefined by vLLM."""

image: Image.Image
"""The input image."""


MultiModalDataDict = Union[MultiModalDataBuiltins, Dict[str, Any]]
"""
A dictionary containing an item for each modality type to input.
The data belonging to each modality is converted into keyword arguments
to the model by the corresponding mapper. By default, the mapper of
the corresponding plugin with the same modality key is applied.
Note:
This dictionary also accepts modality keys defined outside
:class:`MultiModalDataBuiltins` as long as a customized plugin is registered
through the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`.
Read more on that :ref:`here <adding_multimodal_plugin>`.
"""

MultiModalInputMapper = Callable[[InputContext, object], MultiModalInputs]
Expand Down Expand Up @@ -123,6 +125,9 @@ class MultiModalPlugin(ABC):
process the same data differently). This registry is in turn used by
:class:`~MultiModalRegistry` which acts at a higher level
(i.e., the modality of the data).
See also:
:ref:`adding_multimodal_plugin`
"""

def __init__(self) -> None:
Expand Down Expand Up @@ -183,8 +188,8 @@ def wrapper(model_cls: N) -> N:
def map_input(self, model_config: ModelConfig,
data: object) -> MultiModalInputs:
"""
Apply an input mapper to a data passed
to the model, transforming the data into a dictionary of model inputs.
Transform the data into a dictionary of model inputs using the
input mapper registered for that model.
The model is identified by ``model_config``.
Expand Down
1 change: 1 addition & 0 deletions vllm/multimodal/image.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ def repeat_and_pad_image_tokens(


class ImagePlugin(MultiModalPlugin):
"""Plugin for image data."""

def get_data_key(self) -> str:
return "image"
Expand Down
18 changes: 13 additions & 5 deletions vllm/multimodal/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,8 @@

class MultiModalRegistry:
"""
A registry to dispatch data processing
according to its modality and the target model.
The registry handles both external and internal data input.
A registry that dispatches data processing to the
:class:`~vllm.multimodal.MultiModalPlugin` for each modality.
"""

DEFAULT_PLUGINS = (ImagePlugin(), )
Expand All @@ -30,6 +28,12 @@ def __init__(
self._plugins = {p.get_data_key(): p for p in plugins}

def register_plugin(self, plugin: MultiModalPlugin) -> None:
"""
Register a multi-modal plugin so it can be recognized by vLLM.
See also:
:ref:`adding_multimodal_plugin`
"""
data_type_key = plugin.get_data_key()

if data_type_key in self._plugins:
Expand Down Expand Up @@ -75,7 +79,11 @@ def map_input(self, model_config: ModelConfig,
data: MultiModalDataDict) -> MultiModalInputs:
"""
Apply an input mapper to the data passed to the model.
The data belonging to each modality is passed to the corresponding
plugin which in turn converts the data into into keyword arguments
via the input mapper registered for that model.
See :meth:`MultiModalPlugin.map_input` for more details.
"""
merged_dict: Dict[str, torch.Tensor] = {}
Expand Down

0 comments on commit 6b715a2

Please sign in to comment.