WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model

Songyan Zhang^1*, Wenhui Huang^1*, Zihui Gao², Hao Chen², Lv Chen^1†

Nanyang Technology University¹, Zhejiang University²

*Equal Contributions, †Corresponding Author

An overview of the framework of our WiseAD.

✨Capabilities

An overview of the capability of our proposed WiseAD, a specialized vision-language model for end-to-end autonomous driving with extensive fundamental driving knowledge. Given a clip of the video sequence, our WiseAD is capable of answering various driving-related questions and performing knowledge-augmented trajectory planning according to the target waypoints.

🦙 Data & Model Zoo

Our WiseAD is built on the MobileVLM V2 1.7B and finetuned on a mixture of datasets including LingoQA, DRAMA, and Carla datasets, which can be downloaded via the related sites.
Our WiseAD is now available at huggingface. Enjoy playing with it!

🛠️ Install

Clone this repository and navigate to MobileVLM folder

git clone https://github.com/wyddmw/WiseAD.git
cd WiseAD

Install Package

conda create -n wisead python=3.10 -y
conda activate wisead
pip install --upgrade pip
pip install torch==2.0.1
pip install -r requirements.txt

🗝️ Quick Start

Example of answering driving-related questions.

python run_infr.py

🔨 TODO LIST

[✓] Release hugging face model and inference demo.
Carla closed-loop evaluation (coming soon).
Training data and code (coming soon).

Reference

We appeciate the awesome open-source projects of MobileVLM and LMDrive.

✏️ Citation

If you find WiseAD is useful in your research or applications, please consider giving a star ⭐ and citing using the following BibTeX:

@article{zhang2024wisead,
  title={WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model},
  author={Zhang, Songyan and Huang, Wenhui and Gao, Zihui and Chen, Hao and Lv, Chen},
  journal={arXiv preprint arXiv:2412.09951},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
demo		demo
mobilevlm		mobilevlm
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_infer.py		run_infer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model

✨Capabilities

🦙 Data & Model Zoo

🛠️ Install

🗝️ Quick Start

Example of answering driving-related questions.

🔨 TODO LIST

Reference

✏️ Citation

About

Languages

wyddmw/WiseAD

Folders and files

Latest commit

History

Repository files navigation

WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model

✨Capabilities

🦙 Data & Model Zoo

🛠️ Install

🗝️ Quick Start

Example of answering driving-related questions.

🔨 TODO LIST

Reference

✏️ Citation

About

Resources

Stars

Watchers

Forks

Languages