OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving

OpenEMMA is an open-source implementation of Waymo's End-to-End Multimodal Model for Autonomous Driving (EMMA), offering an end-to-end framework for motion planning in autonomous vehicles. OpenEMMA leverages the pretrained world knowledge of Vision Language Models (VLMs), such as GPT-4 and LLaVA, to integrate text and front-view camera inputs, enabling precise predictions of future ego waypoints and providing decision rationales. Our goal is to provide accessible tools for researchers and developers to advance autonomous driving research and applications.

Figure 1. EMMA: Waymo's End-to-End Multimodal Model for Autonomous Driving.

Figure 2. OpenEMMA: Our Open-Source End-to-End Autonomous Driving Framework based on Pre-trained VLMs.

News

[2025/1/12] 🔥OpenEMMA is now available as a PyPI package! You can install it using pip install openemma.
[2024/12/19] 🔥We released OpenEMMA, an open-source project for end-to-end motion planning in autonomous driving tasks. Explore our paper for more details.

Demos

Installation

To get started with OpenEMMA, follow these steps to set up your environment and dependencies.

Environment Setup
Set up a Conda environment for OpenEMMA with Python 3.8:
```
conda create -n openemma python=3.8
conda activate openemma
```
Install OpenEMMA
You can now install OpenEMMA with a single command using PyPI:
```
pip install openemma
```
Alternatively, follow these steps:
- Clone OpenEMMA Repository
  Clone the OpenEMMA repository and navigate to the root directory:
```
git clone [email protected]:taco-group/OpenEMMA.git
cd OpenEMMA
```
- Install Dependencies
  Ensure you have cudatoolkit installed. If not, use the following command:
```
conda install nvidia/label/cuda-12.4.0::cuda-toolkit
```
  To install the core packages required for OpenEMMA, run the following command:
```
pip install -r requirements.txt
```
  This will install all dependencies, including those for YOLO-3D, an external tool used for critical object detection. The weights needed to run YOLO-3D will be automatically downloaded during the first execution.
Set up GPT-4 API Access
To enable GPT-4’s reasoning capabilities, obtain an API key from OpenAI. You can add your API key directly in the code where prompted or set it up as an environment variable:
```
export OPENAI_API_KEY="your_openai_api_key"
```
This allows OpenEMMA to access GPT-4 for generating future waypoints and decision rationales.

Usage

After setting up the environment, you can start using OpenEMMA with the following instructions:

Prepare Input Data
Download and extract the nuScenes dataset
Run OpenEMMA
Use the following command to execute OpenEMMA's main script:
- PyPI:
```
openemma \
    --model-path qwen \
    --dataroot [dir-of-nuScenes-dataset] \
    --version [version-of-nuScenes-dataset] \
    --method openemma
```
- Github Repo:
```
python main.py \
    --model-path qwen \
    --dataroot [dir-of-nuscnse-dataset] \
    --version [version-of-nuscnse-dataset] \
    --method openemma
```
Currently, we support the following models: GPT-4o, LLaVA-1.6-Mistral-7B, Llama-3.2-11B-Vision-Instruct, and Qwen2-VL-7B-Instruct. To use a specific model, simply pass gpt, llava, llama, and qwenas the argument to --model-path.
Output Interpretation
After running the model, OpenEMMA generates the following output in the ./qwen-results location:
- Waypoints: A list of future waypoints predicting the ego vehicle’s trajectory.
- Decision Rationales: Text explanations of the model’s reasoning, including scene context, critical objects, and behavior decisions.
- Annotated Images: Visualizations of the planned trajectory and detected critical objects overlaid on the original images.
- Compiled Video: A video (e.g., output_video.mp4) created from the annotated images, showing the predicted path over time.

Contact

For help or issues using this package, please submit a GitHub issue.

For personal communication related to this project, please contact Shuo Xing ([email protected]).

Citation

We are more than happy if this code is helpful to your work. If you use our code or extend our work, please consider citing our paper:

@article{openemma,
	author = {Xing, Shuo and Qian, Chengyuan and Wang, Yuping and Hua, Hongyuan and Tian, Kexin and Zhou, Yang and Tu, Zhengzhong},
	title = {OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving},
	journal = {arXiv},
	year = {2024},
	month = dec,
	eprint = {2412.15208},
	doi = {10.48550/arXiv.2412.15208}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
assets		assets
llava		llava
openemma		openemma
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_ja-JP.md		README_ja-JP.md
README_zh-CN.md		README_zh-CN.md
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving

News

Table of Contents

Demos

Installation

Usage

Contact

Citation

About

Releases

Packages

Contributors 9

Languages

License

taco-group/OpenEMMA

Folders and files

Latest commit

History

Repository files navigation

OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving

News

Table of Contents

Demos

Installation

Usage

Contact

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages