Skip to content

wyddmw/WiseAD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model

ArXiv

Songyan Zhang1*, Wenhui Huang1*, Zihui Gao2, Hao Chen2, Lv Chen1†

Nanyang Technology University1, Zhejiang University2

*Equal Contributions, †Corresponding Author


An overview of the framework of our WiseAD.

✨Capabilities

An overview of the capability of our proposed WiseAD, a specialized vision-language model for end-to-end autonomous driving with extensive fundamental driving knowledge. Given a clip of the video sequence, our WiseAD is capable of answering various driving-related questions and performing knowledge-augmented trajectory planning according to the target waypoints.

🦙 Data & Model Zoo

Our WiseAD is built on the MobileVLM V2 1.7B and finetuned on a mixture of datasets including LingoQA, DRAMA, and Carla datasets, which can be downloaded via the related sites.
Our WiseAD is now available at huggingface. Enjoy playing with it!

🛠️ Install

  1. Clone this repository and navigate to MobileVLM folder

    git clone https://github.com/wyddmw/WiseAD.git
    cd WiseAD
  2. Install Package

    conda create -n wisead python=3.10 -y
    conda activate wisead
    pip install --upgrade pip
    pip install torch==2.0.1
    pip install -r requirements.txt

🗝️ Quick Start

Example of answering driving-related questions.

python run_infr.py

🔨 TODO LIST

  • [✓] Release hugging face model and inference demo.
  • Carla closed-loop evaluation (coming soon).
  • Training data and code (coming soon).

Reference

We appeciate the awesome open-source projects of MobileVLM and LMDrive.

✏️ Citation

If you find WiseAD is useful in your research or applications, please consider giving a star ⭐ and citing using the following BibTeX:

@article{zhang2024wisead,
  title={WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model},
  author={Zhang, Songyan and Huang, Wenhui and Gao, Zihui and Chen, Hao and Lv, Chen},
  journal={arXiv preprint arXiv:2412.09951},
  year={2024}
}

About

This is the official implementation of WiseAD.

Resources

Stars

Watchers

Forks

Languages