From be37713e1bf316ad22e8d5ab44ee91d809f781e0 Mon Sep 17 00:00:00 2001
From: DarkLight1337 <tlleungac@connect.ust.hk>
Date: Sat, 6 Jul 2024 05:00:27 +0000
Subject: [PATCH 01/10] Fix unintentional bold font

---
 docs/source/dev/multimodal/multimodal_index.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/dev/multimodal/multimodal_index.rst b/docs/source/dev/multimodal/multimodal_index.rst
index c2d1b771e27e3..643c22d34b8c1 100644
--- a/docs/source/dev/multimodal/multimodal_index.rst
+++ b/docs/source/dev/multimodal/multimodal_index.rst
@@ -12,7 +12,7 @@ which allows you to pass in multi-modal input alongside text and token prompts.
 
 .. note::
    ``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through 
-    :class:`vllm.multimodal.MULTIMODAL_REGISTRY`.
+   the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`.
 
 By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model, please follow :ref:`the guide for adding a new multimodal model. <adding_a_new_multimodal_model>`.
 

From 75377327e9c667bbfbe14317c1a4b2dc9e04045a Mon Sep 17 00:00:00 2001
From: DarkLight1337 <tlleungac@connect.ust.hk>
Date: Sat, 6 Jul 2024 05:00:47 +0000
Subject: [PATCH 02/10] Limit nesting

---
 docs/source/index.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/source/index.rst b/docs/source/index.rst
index e99a0a9a13899..b3f6d19758f2b 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -116,6 +116,7 @@ Documentation
    automatic_prefix_caching/details
 
 .. toctree::
+   :maxdepth: 2
    :caption: Developer Documentation
 
    dev/sampling_params

From 206e38b54480923e9a9fadeff14a8ee932f00093 Mon Sep 17 00:00:00 2001
From: DarkLight1337 <tlleungac@connect.ust.hk>
Date: Sat, 6 Jul 2024 05:02:17 +0000
Subject: [PATCH 03/10] Update link

---
 docs/source/dev/multimodal/adding_multimodal_model.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/dev/multimodal/adding_multimodal_model.rst b/docs/source/dev/multimodal/adding_multimodal_model.rst
index 32f62003f0e2f..1de497535024f 100644
--- a/docs/source/dev/multimodal/adding_multimodal_model.rst
+++ b/docs/source/dev/multimodal/adding_multimodal_model.rst
@@ -3,7 +3,7 @@
 Adding a New Multimodal Model
 =============================
 
-This document provides a high-level guide on integrating a :ref:`multi-modal model <multi_modality>` into vLLM.
+This document provides a high-level guide on integrating a :ref:`multi-modal <multi_modality>` model into vLLM.
 
 .. note::
     The complexity of adding a new model depends heavily on the model's architecture.

From 2541ca13980a2e9c34d1a3cddc0330fa9569131b Mon Sep 17 00:00:00 2001
From: DarkLight1337 <tlleungac@connect.ust.hk>
Date: Sat, 6 Jul 2024 05:07:38 +0000
Subject: [PATCH 04/10] Move guide for adding multimodal model

---
 docs/source/dev/multimodal/multimodal_index.rst      | 12 ++----------
 docs/source/index.rst                                |  1 +
 .../adding_multimodal_model.rst                      |  0
 3 files changed, 3 insertions(+), 10 deletions(-)
 rename docs/source/{dev/multimodal => models}/adding_multimodal_model.rst (100%)

diff --git a/docs/source/dev/multimodal/multimodal_index.rst b/docs/source/dev/multimodal/multimodal_index.rst
index 643c22d34b8c1..008100c4f9b79 100644
--- a/docs/source/dev/multimodal/multimodal_index.rst
+++ b/docs/source/dev/multimodal/multimodal_index.rst
@@ -16,16 +16,8 @@ which allows you to pass in multi-modal input alongside text and token prompts.
 
 By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model, please follow :ref:`the guide for adding a new multimodal model. <adding_a_new_multimodal_model>`.
 
-
-# TODO: Add more instructions on how to do that once embeddings is in.
-
-Guides
-++++++
-
-.. toctree::
-   :maxdepth: 1
-
-   adding_multimodal_model
+..
+  TODO: Add more instructions on how to do that once embeddings is in.
 
 Module Contents
 +++++++++++++++
diff --git a/docs/source/index.rst b/docs/source/index.rst
index b3f6d19758f2b..37c5df28c92dd 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -92,6 +92,7 @@ Documentation
 
    models/supported_models
    models/adding_model
+   models/adding_multimodal_model
    models/engine_args
    models/lora
    models/vlm
diff --git a/docs/source/dev/multimodal/adding_multimodal_model.rst b/docs/source/models/adding_multimodal_model.rst
similarity index 100%
rename from docs/source/dev/multimodal/adding_multimodal_model.rst
rename to docs/source/models/adding_multimodal_model.rst

From 241441037d4d23cc0df88718b8562bf44ff7c049 Mon Sep 17 00:00:00 2001
From: DarkLight1337 <tlleungac@connect.ust.hk>
Date: Sat, 6 Jul 2024 06:05:45 +0000
Subject: [PATCH 05/10] Update index page

---
 docs/source/dev/input_processing/model_inputs_index.rst | 4 ++--
 docs/source/dev/multimodal/multimodal_index.rst         | 5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/docs/source/dev/input_processing/model_inputs_index.rst b/docs/source/dev/input_processing/model_inputs_index.rst
index 2dde251aa1447..bcefc44f05c2b 100644
--- a/docs/source/dev/input_processing/model_inputs_index.rst
+++ b/docs/source/dev/input_processing/model_inputs_index.rst
@@ -5,8 +5,8 @@ Input Processing
 
 .. currentmodule:: vllm.inputs
 
-vLLM provides a mechanism for defining input processors for each model so that the inputs are processed
-in :class:`~vllm.LLMEngine` before they are passed to model executors. 
+Each model can override parts of vLLM's :ref:`input processing pipeline <input_processing_pipeline>` via
+:data:`~vllm.inputs.INPUT_REGISTRY` and :data:`~vllm.multimodal.MULTIMODAL_REGISTRY`.
 
 Currently, this mechanism is only utilized in :ref:`multi-modal models <multi_modality>` for preprocessing multi-modal input 
 data in addition to input prompt, but it can be extended to text-only language models when needed.
diff --git a/docs/source/dev/multimodal/multimodal_index.rst b/docs/source/dev/multimodal/multimodal_index.rst
index 008100c4f9b79..3ed4f46fa3234 100644
--- a/docs/source/dev/multimodal/multimodal_index.rst
+++ b/docs/source/dev/multimodal/multimodal_index.rst
@@ -14,10 +14,11 @@ which allows you to pass in multi-modal input alongside text and token prompts.
    ``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through 
    the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`.
 
-By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model, please follow :ref:`the guide for adding a new multimodal model. <adding_a_new_multimodal_model>`.
+By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model,
+please follow :ref:`this guide for adding a new multi-modal model. <adding_a_new_multimodal_model>`.
 
 ..
-  TODO: Add more instructions on how to do that once embeddings is in.
+  TODO: Add more instructions on how to add new plugins once embeddings is in.
 
 Module Contents
 +++++++++++++++

From 5e751a72c1aa056edd998253715526f997e89d6a Mon Sep 17 00:00:00 2001
From: DarkLight1337 <tlleungac@connect.ust.hk>
Date: Sat, 6 Jul 2024 06:10:16 +0000
Subject: [PATCH 06/10] Cleanup

---
 .../input_processing/model_inputs_index.rst   |  2 +-
 docs/source/models/adding_model.rst           | 34 +++++++++----------
 .../source/models/adding_multimodal_model.rst | 20 +++++------
 3 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/docs/source/dev/input_processing/model_inputs_index.rst b/docs/source/dev/input_processing/model_inputs_index.rst
index bcefc44f05c2b..5d895837590ba 100644
--- a/docs/source/dev/input_processing/model_inputs_index.rst
+++ b/docs/source/dev/input_processing/model_inputs_index.rst
@@ -8,7 +8,7 @@ Input Processing
 Each model can override parts of vLLM's :ref:`input processing pipeline <input_processing_pipeline>` via
 :data:`~vllm.inputs.INPUT_REGISTRY` and :data:`~vllm.multimodal.MULTIMODAL_REGISTRY`.
 
-Currently, this mechanism is only utilized in :ref:`multi-modal models <multi_modality>` for preprocessing multi-modal input 
+Currently, this mechanism is only utilized in :ref:`multi-modal <multi_modality>` models for preprocessing multi-modal input 
 data in addition to input prompt, but it can be extended to text-only language models when needed.
 
 Guides
diff --git a/docs/source/models/adding_model.rst b/docs/source/models/adding_model.rst
index f282b594590be..d3aa6e239d702 100644
--- a/docs/source/models/adding_model.rst
+++ b/docs/source/models/adding_model.rst
@@ -44,23 +44,23 @@ Next, you need to rewrite the :meth:`~torch.nn.Module.forward` method of your mo
 
 .. code-block:: diff
 
-    def forward(
-        self,
-        input_ids: torch.Tensor,
-    -    attention_mask: Optional[torch.Tensor] = None,
-    -    position_ids: Optional[torch.LongTensor] = None,
-    -    past_key_values: Optional[List[torch.FloatTensor]] = None,
-    -    inputs_embeds: Optional[torch.FloatTensor] = None,
-    -    labels: Optional[torch.LongTensor] = None,
-    -    use_cache: Optional[bool] = None,
-    -    output_attentions: Optional[bool] = None,
-    -    output_hidden_states: Optional[bool] = None,
-    -    return_dict: Optional[bool] = None,
-    -) -> Union[Tuple, CausalLMOutputWithPast]:
-    +    positions: torch.Tensor,
-    +    kv_caches: List[torch.Tensor],
-    +    attn_metadata: AttentionMetadata,
-    +) -> Optional[SamplerOutput]:
+      def forward(
+          self,
+          input_ids: torch.Tensor,
+    -     attention_mask: Optional[torch.Tensor] = None,
+    -     position_ids: Optional[torch.LongTensor] = None,
+    -     past_key_values: Optional[List[torch.FloatTensor]] = None,
+    -     inputs_embeds: Optional[torch.FloatTensor] = None,
+    -     labels: Optional[torch.LongTensor] = None,
+    -     use_cache: Optional[bool] = None,
+    -     output_attentions: Optional[bool] = None,
+    -     output_hidden_states: Optional[bool] = None,
+    -     return_dict: Optional[bool] = None,
+    - ) -> Union[Tuple, CausalLMOutputWithPast]:
+    +     positions: torch.Tensor,
+    +     kv_caches: List[torch.Tensor],
+    +     attn_metadata: AttentionMetadata,
+    + ) -> Optional[SamplerOutput]:
 
 1. Update the code by considering that :code:`input_ids` and :code:`positions` are now flattened tensors.
 2. Replace the attention operation with either :code:`PagedAttention`, :code:`PagedAttentionWithRoPE`, or :code:`PagedAttentionWithALiBi` depending on the model's architecture.
diff --git a/docs/source/models/adding_multimodal_model.rst b/docs/source/models/adding_multimodal_model.rst
index 1de497535024f..0c13e1f873d58 100644
--- a/docs/source/models/adding_multimodal_model.rst
+++ b/docs/source/models/adding_multimodal_model.rst
@@ -38,14 +38,14 @@ As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model
 
   .. code-block:: diff
 
-      def forward(
-          self,
-          input_ids: torch.Tensor,
-          positions: torch.Tensor,
-          kv_caches: List[torch.Tensor],
-          attn_metadata: AttentionMetadata,
-      +   pixel_values: torch.Tensor,
-      ) -> SamplerOutput:
+        def forward(
+            self,
+            input_ids: torch.Tensor,
+            positions: torch.Tensor,
+            kv_caches: List[torch.Tensor],
+            attn_metadata: AttentionMetadata,
+      +     pixel_values: torch.Tensor,
+        ) -> SamplerOutput:
 
 
 2. Register input mappers
@@ -68,8 +68,8 @@ A default mapper is available for each modality in the core vLLM library. This i
     :ref:`input_processing_pipeline`
 
 
-3. Register maximum number of multimodal tokens
-----------------------------------------------------------
+3. Register maximum number of multi-modal tokens
+------------------------------------------------
 
 For each modality type that the model accepts as input, calculate the maximum possible number of tokens
 and register it via :meth:`INPUT_REGISTRY.register_dummy_data <vllm.inputs.registry.InputRegistry.register_max_multimodal_tokens>`.

From 55419d7e0d659ec7da1b273e8a138fca6741e33f Mon Sep 17 00:00:00 2001
From: DarkLight1337 <tlleungac@connect.ust.hk>
Date: Sat, 6 Jul 2024 06:40:45 +0000
Subject: [PATCH 07/10] Avoid duplicating content from the base model guide

---
 .../dev/multimodal/multimodal_index.rst       |  7 +++---
 docs/source/models/adding_model.rst           |  4 +++
 .../source/models/adding_multimodal_model.rst | 25 ++++++++-----------
 docs/source/models/supported_models.rst       |  2 +-
 4 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/docs/source/dev/multimodal/multimodal_index.rst b/docs/source/dev/multimodal/multimodal_index.rst
index 3ed4f46fa3234..46bfddda461ba 100644
--- a/docs/source/dev/multimodal/multimodal_index.rst
+++ b/docs/source/dev/multimodal/multimodal_index.rst
@@ -7,15 +7,14 @@ Multi-Modality
     
 vLLM provides experimental support for multi-modal models through the :mod:`vllm.multimodal` package.
 
-:class:`vllm.inputs.PromptStrictInputs` accepts an additional attribute ``multi_modal_data``
-which allows you to pass in multi-modal input alongside text and token prompts.
+Multi-modal input can be passed alongside text and token prompts to :ref:`supported models <supported_vlms>`
+via the ``multi_modal_data`` field in :class:`vllm.inputs.PromptStrictInputs`.
 
 .. note::
    ``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through 
    the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`.
 
-By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model,
-please follow :ref:`this guide for adding a new multi-modal model. <adding_a_new_multimodal_model>`.
+To implement a new multi-modal model in vLLM, please follow :ref:`this guide <adding_a_new_multimodal_model>`.
 
 ..
   TODO: Add more instructions on how to add new plugins once embeddings is in.
diff --git a/docs/source/models/adding_model.rst b/docs/source/models/adding_model.rst
index d3aa6e239d702..cce3f683562b2 100644
--- a/docs/source/models/adding_model.rst
+++ b/docs/source/models/adding_model.rst
@@ -10,6 +10,10 @@ This document provides a high-level guide on integrating a `HuggingFace Transfor
     The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM.
     However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex.
 
+.. note::
+    By default, vLLM models do not support multi-modal inputs. To enable multi-modal support,
+    please follow :ref:`this guide <adding_a_new_multimodal_model>` after implementing the model.
+
 .. tip::
     If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ repository.
     We will be happy to help you out!
diff --git a/docs/source/models/adding_multimodal_model.rst b/docs/source/models/adding_multimodal_model.rst
index 0c13e1f873d58..25185eb90b9ed 100644
--- a/docs/source/models/adding_multimodal_model.rst
+++ b/docs/source/models/adding_multimodal_model.rst
@@ -1,26 +1,21 @@
 .. _adding_a_new_multimodal_model:
 
-Adding a New Multimodal Model
-=============================
+Enabling Multimodal Inputs
+==========================
 
-This document provides a high-level guide on integrating a :ref:`multi-modal <multi_modality>` model into vLLM.
+This document walks you through the steps to extend a vLLM model so that it accepts :ref:`multi-modal <multi_modality>` inputs.
 
-.. note::
-    The complexity of adding a new model depends heavily on the model's architecture.
-    The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM.
-    However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex.
-
-.. tip::
-    If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ repository.
-    We will be happy to help you out!
+.. seealso::
+    :ref:`adding_a_new_model`
 
 
-1. Set up the base vLLM model
+1. Update the base vLLM model
 -----------------------------
 
-As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model in vLLM, but note the following:
+It is assumed that you have already implemented the model in vLLM according to :ref:`these steps <adding_a_new_model>`.
+Further update the model as follows:
 
-- You should additionally implement the :class:`~vllm.model_executor.models.interfaces.SupportsVision` interface.
+- Implement the :class:`~vllm.model_executor.models.interfaces.SupportsVision` interface.
 
   .. code-block:: diff
 
@@ -33,7 +28,7 @@ As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model
       The model class does not have to be named :code:`*ForCausalLM`.
       Check out `the HuggingFace Transformers documentation <https://huggingface.co/docs/transformers/model_doc/auto#multimodal>`__ for some examples.
 
-- While implementing the :meth:`~torch.nn.Module.forward` method, reserve a keyword parameter
+- In the :meth:`~torch.nn.Module.forward` method, reserve a keyword parameter
   for each input tensor that corresponds to a multi-modal input, as shown in the following example:
 
   .. code-block:: diff
diff --git a/docs/source/models/supported_models.rst b/docs/source/models/supported_models.rst
index f5511580d1957..688bf35ef8ef3 100644
--- a/docs/source/models/supported_models.rst
+++ b/docs/source/models/supported_models.rst
@@ -192,7 +192,7 @@ Vision Language Models
     -
 
 If your model uses one of the above model architectures, you can seamlessly run your model with vLLM.
-Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Adding a New Multimodal Model <adding_a_new_multimodal_model>` 
+Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Enabling Multimodal Inputs <adding_a_new_multimodal_model>` 
 for instructions on how to implement support for your model.
 Alternatively, you can raise an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ project.
 

From b001f09d4c95ec80f9e9e1d72142ba12f3801968 Mon Sep 17 00:00:00 2001
From: DarkLight1337 <tlleungac@connect.ust.hk>
Date: Sat, 6 Jul 2024 06:44:42 +0000
Subject: [PATCH 08/10] Rename

---
 docs/source/dev/multimodal/multimodal_index.rst           | 2 +-
 docs/source/index.rst                                     | 2 +-
 docs/source/models/adding_model.rst                       | 2 +-
 ...ultimodal_model.rst => enabling_multimodal_inputs.rst} | 2 +-
 docs/source/models/supported_models.rst                   | 2 +-
 vllm/inputs/registry.py                                   | 2 +-
 vllm/multimodal/base.py                                   | 8 ++++----
 7 files changed, 10 insertions(+), 10 deletions(-)
 rename docs/source/models/{adding_multimodal_model.rst => enabling_multimodal_inputs.rst} (99%)

diff --git a/docs/source/dev/multimodal/multimodal_index.rst b/docs/source/dev/multimodal/multimodal_index.rst
index 46bfddda461ba..39daf30a3338f 100644
--- a/docs/source/dev/multimodal/multimodal_index.rst
+++ b/docs/source/dev/multimodal/multimodal_index.rst
@@ -14,7 +14,7 @@ via the ``multi_modal_data`` field in :class:`vllm.inputs.PromptStrictInputs`.
    ``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through 
    the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`.
 
-To implement a new multi-modal model in vLLM, please follow :ref:`this guide <adding_a_new_multimodal_model>`.
+To implement a new multi-modal model in vLLM, please follow :ref:`this guide <enabling_multimodal_inputs>`.
 
 ..
   TODO: Add more instructions on how to add new plugins once embeddings is in.
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 37c5df28c92dd..67c039f25e98d 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -92,7 +92,7 @@ Documentation
 
    models/supported_models
    models/adding_model
-   models/adding_multimodal_model
+   models/enabling_multimodal_inputs
    models/engine_args
    models/lora
    models/vlm
diff --git a/docs/source/models/adding_model.rst b/docs/source/models/adding_model.rst
index cce3f683562b2..53c19e5829218 100644
--- a/docs/source/models/adding_model.rst
+++ b/docs/source/models/adding_model.rst
@@ -12,7 +12,7 @@ This document provides a high-level guide on integrating a `HuggingFace Transfor
 
 .. note::
     By default, vLLM models do not support multi-modal inputs. To enable multi-modal support,
-    please follow :ref:`this guide <adding_a_new_multimodal_model>` after implementing the model.
+    please follow :ref:`this guide <enabling_multimodal_inputs>` after implementing the model here.
 
 .. tip::
     If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ repository.
diff --git a/docs/source/models/adding_multimodal_model.rst b/docs/source/models/enabling_multimodal_inputs.rst
similarity index 99%
rename from docs/source/models/adding_multimodal_model.rst
rename to docs/source/models/enabling_multimodal_inputs.rst
index 25185eb90b9ed..599f43704e60a 100644
--- a/docs/source/models/adding_multimodal_model.rst
+++ b/docs/source/models/enabling_multimodal_inputs.rst
@@ -1,4 +1,4 @@
-.. _adding_a_new_multimodal_model:
+.. _enabling_multimodal_inputs:
 
 Enabling Multimodal Inputs
 ==========================
diff --git a/docs/source/models/supported_models.rst b/docs/source/models/supported_models.rst
index 688bf35ef8ef3..e64a072394680 100644
--- a/docs/source/models/supported_models.rst
+++ b/docs/source/models/supported_models.rst
@@ -192,7 +192,7 @@ Vision Language Models
     -
 
 If your model uses one of the above model architectures, you can seamlessly run your model with vLLM.
-Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Enabling Multimodal Inputs <adding_a_new_multimodal_model>` 
+Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Enabling Multimodal Inputs <enabling_multimodal_inputs>` 
 for instructions on how to implement support for your model.
 Alternatively, you can raise an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ project.
 
diff --git a/vllm/inputs/registry.py b/vllm/inputs/registry.py
index 9396296ffd907..4a7e5c5832917 100644
--- a/vllm/inputs/registry.py
+++ b/vllm/inputs/registry.py
@@ -141,7 +141,7 @@ def dummy_data_for_profiling(self, model_config: "ModelConfig",
         The model is identified by ``model_config``.
 
         See also:
-            :ref:`adding_a_new_multimodal_model`
+            :ref:`enabling_multimodal_inputs`
         """
         # Avoid circular import
         from vllm.model_executor.model_loader import get_model_architecture
diff --git a/vllm/multimodal/base.py b/vllm/multimodal/base.py
index 56cee73bd388a..8fe03a119ab73 100644
--- a/vllm/multimodal/base.py
+++ b/vllm/multimodal/base.py
@@ -163,7 +163,7 @@ def register_input_mapper(
 
         See also:
             :ref:`input_processing_pipeline`
-            :ref:`adding_a_new_multimodal_model`
+            :ref:`enabling_multimodal_inputs`
         """
 
         def wrapper(model_cls: N) -> N:
@@ -192,7 +192,7 @@ def map_input(self, model_config: ModelConfig,
             TypeError: If the data type is not supported.
 
         See also:
-            :ref:`adding_a_new_multimodal_model`
+            :ref:`enabling_multimodal_inputs`
         """
         # Avoid circular import
         from vllm.model_executor.model_loader import get_model_architecture
@@ -230,7 +230,7 @@ def register_max_multimodal_tokens(
         If `None` is provided, then the default calculation is used instead.
 
         See also:
-            :ref:`adding_a_new_multimodal_model`
+            :ref:`enabling_multimodal_inputs`
         """
 
         def wrapper(model_cls: N) -> N:
@@ -260,7 +260,7 @@ def get_max_multimodal_tokens(self, model_config: ModelConfig) -> int:
         The model is identified by ``model_config``.
 
         See also:
-            :ref:`adding_a_new_multimodal_model`
+            :ref:`enabling_multimodal_inputs`
         """
         # Avoid circular import
         from vllm.model_executor.model_loader import get_model_architecture

From 40de7fdccb3d1ac7e5e8fe31f5b4ca8f29983ad7 Mon Sep 17 00:00:00 2001
From: DarkLight1337 <tlleungac@connect.ust.hk>
Date: Sat, 6 Jul 2024 06:48:03 +0000
Subject: [PATCH 09/10] Fix format

---
 vllm/multimodal/base.py | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/vllm/multimodal/base.py b/vllm/multimodal/base.py
index 8fe03a119ab73..0e31816a8e8ac 100644
--- a/vllm/multimodal/base.py
+++ b/vllm/multimodal/base.py
@@ -162,8 +162,8 @@ def register_input_mapper(
         If `None` is provided, then the default input mapper is used instead.
 
         See also:
-            :ref:`input_processing_pipeline`
-            :ref:`enabling_multimodal_inputs`
+            - :ref:`input_processing_pipeline`
+            - :ref:`enabling_multimodal_inputs`
         """
 
         def wrapper(model_cls: N) -> N:
@@ -192,7 +192,8 @@ def map_input(self, model_config: ModelConfig,
             TypeError: If the data type is not supported.
 
         See also:
-            :ref:`enabling_multimodal_inputs`
+            - :ref:`input_processing_pipeline`
+            - :ref:`enabling_multimodal_inputs`
         """
         # Avoid circular import
         from vllm.model_executor.model_loader import get_model_architecture

From abc8c43fe0c92649cef4108c85d3606813cb7f78 Mon Sep 17 00:00:00 2001
From: DarkLight1337 <tlleungac@connect.ust.hk>
Date: Sat, 6 Jul 2024 06:50:59 +0000
Subject: [PATCH 10/10] Reword

---
 docs/source/models/enabling_multimodal_inputs.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/models/enabling_multimodal_inputs.rst b/docs/source/models/enabling_multimodal_inputs.rst
index 599f43704e60a..20be920b5f699 100644
--- a/docs/source/models/enabling_multimodal_inputs.rst
+++ b/docs/source/models/enabling_multimodal_inputs.rst
@@ -28,7 +28,7 @@ Further update the model as follows:
       The model class does not have to be named :code:`*ForCausalLM`.
       Check out `the HuggingFace Transformers documentation <https://huggingface.co/docs/transformers/model_doc/auto#multimodal>`__ for some examples.
 
-- In the :meth:`~torch.nn.Module.forward` method, reserve a keyword parameter
+- If you haven't already done so, reserve a keyword parameter in :meth:`~torch.nn.Module.forward`
   for each input tensor that corresponds to a multi-modal input, as shown in the following example:
 
   .. code-block:: diff