Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
👋 Join our QQ Chat Group
2025/02/08
: Many thanks to the open-source community contributors for making the ComfyUI version of Sonic a reality. Your efforts are truly appreciated! ComfyUI version of Sonic
2025/02/06
: Commercialization: Note that our license is non-commercial. If commercialization is required, please use Tencent Cloud Video Creation Large Model: Introduction / API documentation
2025/01/17
: Our Online huggingface Demo is released.
2025/01/17
: Thank you to NewGenAI for promoting our Sonic and creating a Windows-based tutorial on YouTube.
2024/12/16
: Our Online Demo is released.
Input | Output | Input | Output |
---|---|---|---|
![]() |
anime1.mp4 |
![]() |
female_diaosu.mp4 |
![]() |
hair.mp4 |
![]() |
leonnado.mp4 |
For more visual demos, please visit our Page.
If you develop/use Sonic in your projects, welcome to let us know.
- ComfyUI version of Sonic: ComfyUI_Sonic
2025/01/14
: Our inference code and weights are released. Stay tuned, we will continue to polish the model.
- An NVIDIA GPU with CUDA support is required.
- The model is tested on a single 32G GPU.
- Tested operating system: Linux
- install pytorch
pip3 install -r requirements.txt
- All models are stored in
checkpoints
by default, and the file structure is as follows
Sonic
├──checkpoints
│ ├──Sonic
│ │ ├──audio2bucket.pth
│ │ ├──audio2token.pth
│ │ ├──unet.pth
│ ├──stable-video-diffusion-img2vid-xt
│ │ ├──...
│ ├──whisper-tiny
│ │ ├──...
│ ├──RIFE
│ │ ├──flownet.pkl
│ ├──yoloface_v5m.pt
├──...
Download by huggingface-cli
follow
python3 -m pip install "huggingface_hub[cli]"
huggingface-cli download LeonJoe13/Sonic --local-dir checkpoints
huggingface-cli download stabilityai/stable-video-diffusion-img2vid-xt --local-dir checkpoints/stable-video-diffusion-img2vid-xt
huggingface-cli download openai/whisper-tiny --local-dir checkpoints/whisper-tiny
or manully download pretrain model, svd-xt and whisper-tiny to checkpoints/
python3 demo.py \
'/path/to/input_image' \
'/path/to/input_audio' \
'/path/to/output_video'
If you find our work helpful for your research, please consider citing our work.
@article{ji2024sonic,
title={Sonic: Shifting Focus to Global Audio Perception in Portrait Animation},
author={Ji, Xiaozhong and Hu, Xiaobin and Xu, Zhihong and Zhu, Junwei and Lin, Chuming and He, Qingdong and Zhang, Jiangning and Luo, Donghao and Chen, Yi and Lin, Qin and others},
journal={arXiv preprint arXiv:2411.16331},
year={2024}
}
@article{ji2024realtalk,
title={Realtalk: Real-time and realistic audio-driven face generation with 3d facial prior-guided identity alignment network},
author={Ji, Xiaozhong and Lin, Chuming and Ding, Zhonggan and Tai, Ying and Zhu, Junwei and Hu, Xiaobin and Luo, Donghao and Ge, Yanhao and Wang, Chengjie},
journal={arXiv preprint arXiv:2406.18284},
year={2024}
}
Explore our related researches:
- [Super-fast talk:real-time and less GPU computation] Realtalk: Real-time and realistic audio-driven face generation with 3d facial prior-guided identity alignment network