From be37713e1bf316ad22e8d5ab44ee91d809f781e0 Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Sat, 6 Jul 2024 05:00:27 +0000 Subject: [PATCH 01/10] Fix unintentional bold font --- docs/source/dev/multimodal/multimodal_index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/dev/multimodal/multimodal_index.rst b/docs/source/dev/multimodal/multimodal_index.rst index c2d1b771e27e3..643c22d34b8c1 100644 --- a/docs/source/dev/multimodal/multimodal_index.rst +++ b/docs/source/dev/multimodal/multimodal_index.rst @@ -12,7 +12,7 @@ which allows you to pass in multi-modal input alongside text and token prompts. .. note:: ``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through - :class:`vllm.multimodal.MULTIMODAL_REGISTRY`. + the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`. By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model, please follow :ref:`the guide for adding a new multimodal model. `. From 75377327e9c667bbfbe14317c1a4b2dc9e04045a Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Sat, 6 Jul 2024 05:00:47 +0000 Subject: [PATCH 02/10] Limit nesting --- docs/source/index.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source/index.rst b/docs/source/index.rst index e99a0a9a13899..b3f6d19758f2b 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -116,6 +116,7 @@ Documentation automatic_prefix_caching/details .. toctree:: + :maxdepth: 2 :caption: Developer Documentation dev/sampling_params From 206e38b54480923e9a9fadeff14a8ee932f00093 Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Sat, 6 Jul 2024 05:02:17 +0000 Subject: [PATCH 03/10] Update link --- docs/source/dev/multimodal/adding_multimodal_model.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/dev/multimodal/adding_multimodal_model.rst b/docs/source/dev/multimodal/adding_multimodal_model.rst index 32f62003f0e2f..1de497535024f 100644 --- a/docs/source/dev/multimodal/adding_multimodal_model.rst +++ b/docs/source/dev/multimodal/adding_multimodal_model.rst @@ -3,7 +3,7 @@ Adding a New Multimodal Model ============================= -This document provides a high-level guide on integrating a :ref:`multi-modal model ` into vLLM. +This document provides a high-level guide on integrating a :ref:`multi-modal ` model into vLLM. .. note:: The complexity of adding a new model depends heavily on the model's architecture. From 2541ca13980a2e9c34d1a3cddc0330fa9569131b Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Sat, 6 Jul 2024 05:07:38 +0000 Subject: [PATCH 04/10] Move guide for adding multimodal model --- docs/source/dev/multimodal/multimodal_index.rst | 12 ++---------- docs/source/index.rst | 1 + .../adding_multimodal_model.rst | 0 3 files changed, 3 insertions(+), 10 deletions(-) rename docs/source/{dev/multimodal => models}/adding_multimodal_model.rst (100%) diff --git a/docs/source/dev/multimodal/multimodal_index.rst b/docs/source/dev/multimodal/multimodal_index.rst index 643c22d34b8c1..008100c4f9b79 100644 --- a/docs/source/dev/multimodal/multimodal_index.rst +++ b/docs/source/dev/multimodal/multimodal_index.rst @@ -16,16 +16,8 @@ which allows you to pass in multi-modal input alongside text and token prompts. By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model, please follow :ref:`the guide for adding a new multimodal model. `. - -# TODO: Add more instructions on how to do that once embeddings is in. - -Guides -++++++ - -.. toctree:: - :maxdepth: 1 - - adding_multimodal_model +.. + TODO: Add more instructions on how to do that once embeddings is in. Module Contents +++++++++++++++ diff --git a/docs/source/index.rst b/docs/source/index.rst index b3f6d19758f2b..37c5df28c92dd 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -92,6 +92,7 @@ Documentation models/supported_models models/adding_model + models/adding_multimodal_model models/engine_args models/lora models/vlm diff --git a/docs/source/dev/multimodal/adding_multimodal_model.rst b/docs/source/models/adding_multimodal_model.rst similarity index 100% rename from docs/source/dev/multimodal/adding_multimodal_model.rst rename to docs/source/models/adding_multimodal_model.rst From 241441037d4d23cc0df88718b8562bf44ff7c049 Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Sat, 6 Jul 2024 06:05:45 +0000 Subject: [PATCH 05/10] Update index page --- docs/source/dev/input_processing/model_inputs_index.rst | 4 ++-- docs/source/dev/multimodal/multimodal_index.rst | 5 +++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/source/dev/input_processing/model_inputs_index.rst b/docs/source/dev/input_processing/model_inputs_index.rst index 2dde251aa1447..bcefc44f05c2b 100644 --- a/docs/source/dev/input_processing/model_inputs_index.rst +++ b/docs/source/dev/input_processing/model_inputs_index.rst @@ -5,8 +5,8 @@ Input Processing .. currentmodule:: vllm.inputs -vLLM provides a mechanism for defining input processors for each model so that the inputs are processed -in :class:`~vllm.LLMEngine` before they are passed to model executors. +Each model can override parts of vLLM's :ref:`input processing pipeline ` via +:data:`~vllm.inputs.INPUT_REGISTRY` and :data:`~vllm.multimodal.MULTIMODAL_REGISTRY`. Currently, this mechanism is only utilized in :ref:`multi-modal models ` for preprocessing multi-modal input data in addition to input prompt, but it can be extended to text-only language models when needed. diff --git a/docs/source/dev/multimodal/multimodal_index.rst b/docs/source/dev/multimodal/multimodal_index.rst index 008100c4f9b79..3ed4f46fa3234 100644 --- a/docs/source/dev/multimodal/multimodal_index.rst +++ b/docs/source/dev/multimodal/multimodal_index.rst @@ -14,10 +14,11 @@ which allows you to pass in multi-modal input alongside text and token prompts. ``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`. -By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model, please follow :ref:`the guide for adding a new multimodal model. `. +By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model, +please follow :ref:`this guide for adding a new multi-modal model. `. .. - TODO: Add more instructions on how to do that once embeddings is in. + TODO: Add more instructions on how to add new plugins once embeddings is in. Module Contents +++++++++++++++ From 5e751a72c1aa056edd998253715526f997e89d6a Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Sat, 6 Jul 2024 06:10:16 +0000 Subject: [PATCH 06/10] Cleanup --- .../input_processing/model_inputs_index.rst | 2 +- docs/source/models/adding_model.rst | 34 +++++++++---------- .../source/models/adding_multimodal_model.rst | 20 +++++------ 3 files changed, 28 insertions(+), 28 deletions(-) diff --git a/docs/source/dev/input_processing/model_inputs_index.rst b/docs/source/dev/input_processing/model_inputs_index.rst index bcefc44f05c2b..5d895837590ba 100644 --- a/docs/source/dev/input_processing/model_inputs_index.rst +++ b/docs/source/dev/input_processing/model_inputs_index.rst @@ -8,7 +8,7 @@ Input Processing Each model can override parts of vLLM's :ref:`input processing pipeline ` via :data:`~vllm.inputs.INPUT_REGISTRY` and :data:`~vllm.multimodal.MULTIMODAL_REGISTRY`. -Currently, this mechanism is only utilized in :ref:`multi-modal models ` for preprocessing multi-modal input +Currently, this mechanism is only utilized in :ref:`multi-modal ` models for preprocessing multi-modal input data in addition to input prompt, but it can be extended to text-only language models when needed. Guides diff --git a/docs/source/models/adding_model.rst b/docs/source/models/adding_model.rst index f282b594590be..d3aa6e239d702 100644 --- a/docs/source/models/adding_model.rst +++ b/docs/source/models/adding_model.rst @@ -44,23 +44,23 @@ Next, you need to rewrite the :meth:`~torch.nn.Module.forward` method of your mo .. code-block:: diff - def forward( - self, - input_ids: torch.Tensor, - - attention_mask: Optional[torch.Tensor] = None, - - position_ids: Optional[torch.LongTensor] = None, - - past_key_values: Optional[List[torch.FloatTensor]] = None, - - inputs_embeds: Optional[torch.FloatTensor] = None, - - labels: Optional[torch.LongTensor] = None, - - use_cache: Optional[bool] = None, - - output_attentions: Optional[bool] = None, - - output_hidden_states: Optional[bool] = None, - - return_dict: Optional[bool] = None, - -) -> Union[Tuple, CausalLMOutputWithPast]: - + positions: torch.Tensor, - + kv_caches: List[torch.Tensor], - + attn_metadata: AttentionMetadata, - +) -> Optional[SamplerOutput]: + def forward( + self, + input_ids: torch.Tensor, + - attention_mask: Optional[torch.Tensor] = None, + - position_ids: Optional[torch.LongTensor] = None, + - past_key_values: Optional[List[torch.FloatTensor]] = None, + - inputs_embeds: Optional[torch.FloatTensor] = None, + - labels: Optional[torch.LongTensor] = None, + - use_cache: Optional[bool] = None, + - output_attentions: Optional[bool] = None, + - output_hidden_states: Optional[bool] = None, + - return_dict: Optional[bool] = None, + - ) -> Union[Tuple, CausalLMOutputWithPast]: + + positions: torch.Tensor, + + kv_caches: List[torch.Tensor], + + attn_metadata: AttentionMetadata, + + ) -> Optional[SamplerOutput]: 1. Update the code by considering that :code:`input_ids` and :code:`positions` are now flattened tensors. 2. Replace the attention operation with either :code:`PagedAttention`, :code:`PagedAttentionWithRoPE`, or :code:`PagedAttentionWithALiBi` depending on the model's architecture. diff --git a/docs/source/models/adding_multimodal_model.rst b/docs/source/models/adding_multimodal_model.rst index 1de497535024f..0c13e1f873d58 100644 --- a/docs/source/models/adding_multimodal_model.rst +++ b/docs/source/models/adding_multimodal_model.rst @@ -38,14 +38,14 @@ As usual, follow :ref:`these steps ` to implement the model .. code-block:: diff - def forward( - self, - input_ids: torch.Tensor, - positions: torch.Tensor, - kv_caches: List[torch.Tensor], - attn_metadata: AttentionMetadata, - + pixel_values: torch.Tensor, - ) -> SamplerOutput: + def forward( + self, + input_ids: torch.Tensor, + positions: torch.Tensor, + kv_caches: List[torch.Tensor], + attn_metadata: AttentionMetadata, + + pixel_values: torch.Tensor, + ) -> SamplerOutput: 2. Register input mappers @@ -68,8 +68,8 @@ A default mapper is available for each modality in the core vLLM library. This i :ref:`input_processing_pipeline` -3. Register maximum number of multimodal tokens ----------------------------------------------------------- +3. Register maximum number of multi-modal tokens +------------------------------------------------ For each modality type that the model accepts as input, calculate the maximum possible number of tokens and register it via :meth:`INPUT_REGISTRY.register_dummy_data `. From 55419d7e0d659ec7da1b273e8a138fca6741e33f Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Sat, 6 Jul 2024 06:40:45 +0000 Subject: [PATCH 07/10] Avoid duplicating content from the base model guide --- .../dev/multimodal/multimodal_index.rst | 7 +++--- docs/source/models/adding_model.rst | 4 +++ .../source/models/adding_multimodal_model.rst | 25 ++++++++----------- docs/source/models/supported_models.rst | 2 +- 4 files changed, 18 insertions(+), 20 deletions(-) diff --git a/docs/source/dev/multimodal/multimodal_index.rst b/docs/source/dev/multimodal/multimodal_index.rst index 3ed4f46fa3234..46bfddda461ba 100644 --- a/docs/source/dev/multimodal/multimodal_index.rst +++ b/docs/source/dev/multimodal/multimodal_index.rst @@ -7,15 +7,14 @@ Multi-Modality vLLM provides experimental support for multi-modal models through the :mod:`vllm.multimodal` package. -:class:`vllm.inputs.PromptStrictInputs` accepts an additional attribute ``multi_modal_data`` -which allows you to pass in multi-modal input alongside text and token prompts. +Multi-modal input can be passed alongside text and token prompts to :ref:`supported models ` +via the ``multi_modal_data`` field in :class:`vllm.inputs.PromptStrictInputs`. .. note:: ``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`. -By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model, -please follow :ref:`this guide for adding a new multi-modal model. `. +To implement a new multi-modal model in vLLM, please follow :ref:`this guide `. .. TODO: Add more instructions on how to add new plugins once embeddings is in. diff --git a/docs/source/models/adding_model.rst b/docs/source/models/adding_model.rst index d3aa6e239d702..cce3f683562b2 100644 --- a/docs/source/models/adding_model.rst +++ b/docs/source/models/adding_model.rst @@ -10,6 +10,10 @@ This document provides a high-level guide on integrating a `HuggingFace Transfor The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM. However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex. +.. note:: + By default, vLLM models do not support multi-modal inputs. To enable multi-modal support, + please follow :ref:`this guide ` after implementing the model. + .. tip:: If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub `_ repository. We will be happy to help you out! diff --git a/docs/source/models/adding_multimodal_model.rst b/docs/source/models/adding_multimodal_model.rst index 0c13e1f873d58..25185eb90b9ed 100644 --- a/docs/source/models/adding_multimodal_model.rst +++ b/docs/source/models/adding_multimodal_model.rst @@ -1,26 +1,21 @@ .. _adding_a_new_multimodal_model: -Adding a New Multimodal Model -============================= +Enabling Multimodal Inputs +========================== -This document provides a high-level guide on integrating a :ref:`multi-modal ` model into vLLM. +This document walks you through the steps to extend a vLLM model so that it accepts :ref:`multi-modal ` inputs. -.. note:: - The complexity of adding a new model depends heavily on the model's architecture. - The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM. - However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex. - -.. tip:: - If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub `_ repository. - We will be happy to help you out! +.. seealso:: + :ref:`adding_a_new_model` -1. Set up the base vLLM model +1. Update the base vLLM model ----------------------------- -As usual, follow :ref:`these steps ` to implement the model in vLLM, but note the following: +It is assumed that you have already implemented the model in vLLM according to :ref:`these steps `. +Further update the model as follows: -- You should additionally implement the :class:`~vllm.model_executor.models.interfaces.SupportsVision` interface. +- Implement the :class:`~vllm.model_executor.models.interfaces.SupportsVision` interface. .. code-block:: diff @@ -33,7 +28,7 @@ As usual, follow :ref:`these steps ` to implement the model The model class does not have to be named :code:`*ForCausalLM`. Check out `the HuggingFace Transformers documentation `__ for some examples. -- While implementing the :meth:`~torch.nn.Module.forward` method, reserve a keyword parameter +- In the :meth:`~torch.nn.Module.forward` method, reserve a keyword parameter for each input tensor that corresponds to a multi-modal input, as shown in the following example: .. code-block:: diff diff --git a/docs/source/models/supported_models.rst b/docs/source/models/supported_models.rst index f5511580d1957..688bf35ef8ef3 100644 --- a/docs/source/models/supported_models.rst +++ b/docs/source/models/supported_models.rst @@ -192,7 +192,7 @@ Vision Language Models - If your model uses one of the above model architectures, you can seamlessly run your model with vLLM. -Otherwise, please refer to :ref:`Adding a New Model ` and :ref:`Adding a New Multimodal Model ` +Otherwise, please refer to :ref:`Adding a New Model ` and :ref:`Enabling Multimodal Inputs ` for instructions on how to implement support for your model. Alternatively, you can raise an issue on our `GitHub `_ project. From b001f09d4c95ec80f9e9e1d72142ba12f3801968 Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Sat, 6 Jul 2024 06:44:42 +0000 Subject: [PATCH 08/10] Rename --- docs/source/dev/multimodal/multimodal_index.rst | 2 +- docs/source/index.rst | 2 +- docs/source/models/adding_model.rst | 2 +- ...ultimodal_model.rst => enabling_multimodal_inputs.rst} | 2 +- docs/source/models/supported_models.rst | 2 +- vllm/inputs/registry.py | 2 +- vllm/multimodal/base.py | 8 ++++---- 7 files changed, 10 insertions(+), 10 deletions(-) rename docs/source/models/{adding_multimodal_model.rst => enabling_multimodal_inputs.rst} (99%) diff --git a/docs/source/dev/multimodal/multimodal_index.rst b/docs/source/dev/multimodal/multimodal_index.rst index 46bfddda461ba..39daf30a3338f 100644 --- a/docs/source/dev/multimodal/multimodal_index.rst +++ b/docs/source/dev/multimodal/multimodal_index.rst @@ -14,7 +14,7 @@ via the ``multi_modal_data`` field in :class:`vllm.inputs.PromptStrictInputs`. ``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`. -To implement a new multi-modal model in vLLM, please follow :ref:`this guide `. +To implement a new multi-modal model in vLLM, please follow :ref:`this guide `. .. TODO: Add more instructions on how to add new plugins once embeddings is in. diff --git a/docs/source/index.rst b/docs/source/index.rst index 37c5df28c92dd..67c039f25e98d 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -92,7 +92,7 @@ Documentation models/supported_models models/adding_model - models/adding_multimodal_model + models/enabling_multimodal_inputs models/engine_args models/lora models/vlm diff --git a/docs/source/models/adding_model.rst b/docs/source/models/adding_model.rst index cce3f683562b2..53c19e5829218 100644 --- a/docs/source/models/adding_model.rst +++ b/docs/source/models/adding_model.rst @@ -12,7 +12,7 @@ This document provides a high-level guide on integrating a `HuggingFace Transfor .. note:: By default, vLLM models do not support multi-modal inputs. To enable multi-modal support, - please follow :ref:`this guide ` after implementing the model. + please follow :ref:`this guide ` after implementing the model here. .. tip:: If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub `_ repository. diff --git a/docs/source/models/adding_multimodal_model.rst b/docs/source/models/enabling_multimodal_inputs.rst similarity index 99% rename from docs/source/models/adding_multimodal_model.rst rename to docs/source/models/enabling_multimodal_inputs.rst index 25185eb90b9ed..599f43704e60a 100644 --- a/docs/source/models/adding_multimodal_model.rst +++ b/docs/source/models/enabling_multimodal_inputs.rst @@ -1,4 +1,4 @@ -.. _adding_a_new_multimodal_model: +.. _enabling_multimodal_inputs: Enabling Multimodal Inputs ========================== diff --git a/docs/source/models/supported_models.rst b/docs/source/models/supported_models.rst index 688bf35ef8ef3..e64a072394680 100644 --- a/docs/source/models/supported_models.rst +++ b/docs/source/models/supported_models.rst @@ -192,7 +192,7 @@ Vision Language Models - If your model uses one of the above model architectures, you can seamlessly run your model with vLLM. -Otherwise, please refer to :ref:`Adding a New Model ` and :ref:`Enabling Multimodal Inputs ` +Otherwise, please refer to :ref:`Adding a New Model ` and :ref:`Enabling Multimodal Inputs ` for instructions on how to implement support for your model. Alternatively, you can raise an issue on our `GitHub `_ project. diff --git a/vllm/inputs/registry.py b/vllm/inputs/registry.py index 9396296ffd907..4a7e5c5832917 100644 --- a/vllm/inputs/registry.py +++ b/vllm/inputs/registry.py @@ -141,7 +141,7 @@ def dummy_data_for_profiling(self, model_config: "ModelConfig", The model is identified by ``model_config``. See also: - :ref:`adding_a_new_multimodal_model` + :ref:`enabling_multimodal_inputs` """ # Avoid circular import from vllm.model_executor.model_loader import get_model_architecture diff --git a/vllm/multimodal/base.py b/vllm/multimodal/base.py index 56cee73bd388a..8fe03a119ab73 100644 --- a/vllm/multimodal/base.py +++ b/vllm/multimodal/base.py @@ -163,7 +163,7 @@ def register_input_mapper( See also: :ref:`input_processing_pipeline` - :ref:`adding_a_new_multimodal_model` + :ref:`enabling_multimodal_inputs` """ def wrapper(model_cls: N) -> N: @@ -192,7 +192,7 @@ def map_input(self, model_config: ModelConfig, TypeError: If the data type is not supported. See also: - :ref:`adding_a_new_multimodal_model` + :ref:`enabling_multimodal_inputs` """ # Avoid circular import from vllm.model_executor.model_loader import get_model_architecture @@ -230,7 +230,7 @@ def register_max_multimodal_tokens( If `None` is provided, then the default calculation is used instead. See also: - :ref:`adding_a_new_multimodal_model` + :ref:`enabling_multimodal_inputs` """ def wrapper(model_cls: N) -> N: @@ -260,7 +260,7 @@ def get_max_multimodal_tokens(self, model_config: ModelConfig) -> int: The model is identified by ``model_config``. See also: - :ref:`adding_a_new_multimodal_model` + :ref:`enabling_multimodal_inputs` """ # Avoid circular import from vllm.model_executor.model_loader import get_model_architecture From 40de7fdccb3d1ac7e5e8fe31f5b4ca8f29983ad7 Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Sat, 6 Jul 2024 06:48:03 +0000 Subject: [PATCH 09/10] Fix format --- vllm/multimodal/base.py | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/vllm/multimodal/base.py b/vllm/multimodal/base.py index 8fe03a119ab73..0e31816a8e8ac 100644 --- a/vllm/multimodal/base.py +++ b/vllm/multimodal/base.py @@ -162,8 +162,8 @@ def register_input_mapper( If `None` is provided, then the default input mapper is used instead. See also: - :ref:`input_processing_pipeline` - :ref:`enabling_multimodal_inputs` + - :ref:`input_processing_pipeline` + - :ref:`enabling_multimodal_inputs` """ def wrapper(model_cls: N) -> N: @@ -192,7 +192,8 @@ def map_input(self, model_config: ModelConfig, TypeError: If the data type is not supported. See also: - :ref:`enabling_multimodal_inputs` + - :ref:`input_processing_pipeline` + - :ref:`enabling_multimodal_inputs` """ # Avoid circular import from vllm.model_executor.model_loader import get_model_architecture From abc8c43fe0c92649cef4108c85d3606813cb7f78 Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Sat, 6 Jul 2024 06:50:59 +0000 Subject: [PATCH 10/10] Reword --- docs/source/models/enabling_multimodal_inputs.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/models/enabling_multimodal_inputs.rst b/docs/source/models/enabling_multimodal_inputs.rst index 599f43704e60a..20be920b5f699 100644 --- a/docs/source/models/enabling_multimodal_inputs.rst +++ b/docs/source/models/enabling_multimodal_inputs.rst @@ -28,7 +28,7 @@ Further update the model as follows: The model class does not have to be named :code:`*ForCausalLM`. Check out `the HuggingFace Transformers documentation `__ for some examples. -- In the :meth:`~torch.nn.Module.forward` method, reserve a keyword parameter +- If you haven't already done so, reserve a keyword parameter in :meth:`~torch.nn.Module.forward` for each input tensor that corresponds to a multi-modal input, as shown in the following example: .. code-block:: diff