Songyan Zhang1*, Wenhui Huang1*, Zihui Gao2, Hao Chen2, Lv Chen1†
Nanyang Technology University1, Zhejiang University2
*Equal Contributions, †Corresponding Author
An overview of the capability of our proposed WiseAD, a specialized vision-language model for end-to-end autonomous driving with extensive fundamental driving knowledge. Given a clip of the video sequence, our WiseAD is capable of answering various driving-related questions and performing knowledge-augmented trajectory planning according to the target waypoints.
Our WiseAD is built on the MobileVLM V2 1.7B and finetuned on a mixture of datasets including LingoQA, DRAMA, and Carla datasets, which can be downloaded via the related sites.
Our WiseAD is now available at huggingface. Enjoy playing with it!
-
Clone this repository and navigate to MobileVLM folder
git clone https://github.com/wyddmw/WiseAD.git cd WiseAD
-
Install Package
conda create -n wisead python=3.10 -y conda activate wisead pip install --upgrade pip pip install torch==2.0.1 pip install -r requirements.txt
python run_infr.py
- [✓] Release hugging face model and inference demo.
- Carla closed-loop evaluation (coming soon).
- Training data and code (coming soon).
We appeciate the awesome open-source projects of MobileVLM and LMDrive.
If you find WiseAD is useful in your research or applications, please consider giving a star ⭐ and citing using the following BibTeX:
@article{zhang2024wisead,
title={WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model},
author={Zhang, Songyan and Huang, Wenhui and Gao, Zihui and Chen, Hao and Lv, Chen},
journal={arXiv preprint arXiv:2412.09951},
year={2024}
}