Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Create a new "Usage" section #10827

Merged
merged 9 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions docs/source/design/multimodal/multimodal_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,14 @@ Multi-Modality

vLLM provides experimental support for multi-modal models through the :mod:`vllm.multimodal` package.

Multi-modal inputs can be passed alongside text and token prompts to :ref:`supported models <supported_vlms>`
Multi-modal inputs can be passed alongside text and token prompts to :ref:`supported models <supported_mm_models>`
via the ``multi_modal_data`` field in :class:`vllm.inputs.PromptType`.

Currently, vLLM only has built-in support for image data. You can extend vLLM to process additional modalities
by following :ref:`this guide <adding_multimodal_plugin>`.

Looking to add your own multi-modal model? Please follow the instructions listed :ref:`here <enabling_multimodal_inputs>`.

..
TODO: Add usage of --limit-mm-per-prompt when multi-image input is officially supported

Guides
++++++

Expand Down
25 changes: 15 additions & 10 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,12 +85,8 @@ Documentation
serving/deploying_with_nginx
serving/distributed_serving
serving/metrics
serving/env_vars
serving/usage_stats
serving/integrations
serving/tensorizer
serving/compatibility_matrix
serving/faq

.. toctree::
:maxdepth: 1
Expand All @@ -99,12 +95,21 @@ Documentation
models/supported_models
models/adding_model
models/enabling_multimodal_inputs
models/engine_args
models/lora
models/vlm
models/structured_outputs
models/spec_decode
models/performance

.. toctree::
:maxdepth: 1
:caption: Usage

usage/lora
usage/multimodal_inputs
usage/structured_outputs
usage/spec_decode
usage/compatibility_matrix
usage/performance
usage/faq
usage/engine_args
usage/env_vars
usage/usage_stats

.. toctree::
:maxdepth: 1
Expand Down
2 changes: 1 addition & 1 deletion docs/source/models/enabling_multimodal_inputs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Enabling Multimodal Inputs
==========================

This document walks you through the steps to extend a vLLM model so that it accepts :ref:`multi-modal <multi_modality>` inputs.
This document walks you through the steps to extend a vLLM model so that it accepts :ref:`multi-modal inputs <multimodal_inputs>`.

.. seealso::
:ref:`adding_a_new_model`
Expand Down
19 changes: 17 additions & 2 deletions docs/source/models/supported_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -471,6 +471,8 @@ Sentence Pair Scoring
.. note::
These models are supported in both offline and online inference via Score API.

.. _supported_mm_models:

Multimodal Language Models
^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -489,8 +491,6 @@ On the other hand, modalities separated by :code:`/` are mutually exclusive.

- e.g.: :code:`T / I` means that the model supports text-only and image-only inputs, but not text-with-image inputs.

.. _supported_vlms:

Text Generation
---------------

Expand Down Expand Up @@ -646,6 +646,21 @@ Text Generation
| :sup:`E` Pre-computed embeddings can be inputted for this modality.
| :sup:`+` Multiple items can be inputted per text prompt for this modality.

.. important::
To enable multiple multi-modal items per text prompt, you have to set :code:`limit_mm_per_prompt` (offline inference)
or :code:`--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:

.. code-block:: python

llm = LLM(
model="Qwen/Qwen2-VL-7B-Instruct",
limit_mm_per_prompt={"image": 4},
)

.. code-block:: bash

vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4

.. note::
vLLM currently only supports adding LoRA to the language backbone of multimodal models.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/serving/openai_compatible_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ We currently support the following OpenAI APIs:
- [Completions API](https://platform.openai.com/docs/api-reference/completions)
- *Note: `suffix` parameter is not supported.*
- [Chat Completions API](https://platform.openai.com/docs/api-reference/chat)
- [Vision](https://platform.openai.com/docs/guides/vision)-related parameters are supported; see [Using VLMs](../models/vlm.rst).
- [Vision](https://platform.openai.com/docs/guides/vision)-related parameters are supported; see [Multimodal Inputs](../usage/multimodal_inputs.rst).
- *Note: `image_url.detail` parameter is not supported.*
- We also support `audio_url` content type for audio files.
- Refer to [vllm.entrypoints.chat_utils](https://github.com/vllm-project/vllm/tree/main/vllm/entrypoints/chat_utils.py) for the exact schema.
Expand All @@ -41,7 +41,7 @@ We currently support the following OpenAI APIs:
- [Embeddings API](https://platform.openai.com/docs/api-reference/embeddings)
- Instead of `inputs`, you can pass in a list of `messages` (same schema as Chat Completions API),
which will be treated as a single prompt to the model according to its chat template.
- This enables multi-modal inputs to be passed to embedding models, see [Using VLMs](../models/vlm.rst).
- This enables multi-modal inputs to be passed to embedding models, see [this page](../usage/multimodal_inputs.rst) for details.
- *Note: You should run `vllm serve` with `--task embedding` to ensure that the model is being run in embedding mode.*

## Score API for Cross Encoder Models
Expand Down
File renamed without changes.
File renamed without changes.
2 changes: 2 additions & 0 deletions docs/source/serving/faq.rst → docs/source/usage/faq.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _faq:

Frequently Asked Questions
===========================

Expand Down
4 changes: 2 additions & 2 deletions docs/source/models/lora.rst → docs/source/usage/lora.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _lora:

Using LoRA adapters
===================
LoRA Adapters
=============

This document shows you how to use `LoRA adapters <https://arxiv.org/abs/2106.09685>`_ with vLLM on top of a base model.

Expand Down
Loading
Loading