I am Jiazheng Xu, a third-year PhD student in Tsinghua University.
- ๐ญ Interested in multimodal generative models, especially RLHF and alignment. Find my up-to-date publication list in Google Scholar!
- ๐ฑ Some of my proud leading works about RLHF for multimodal generative models:
- ImageReward (NeurIPS'23): the first general-purpose text-to-image human preference reward model (RM) for RLHF, outperforming CLIP/BLIP/Aesthetic by 30% in terms of human preference prediction.
- VisionReward: a fine-grained and multi-dimensional reward model for image and video generation, outperforming VideoScore by 17.2% and enabling multi-objective optimization.
- ๐ฑ I'm also honored to work with the team on multimodal foundation models:
- CogVLM (NeurIPS'24): a powerful open-source visual language model (VLM), which achieves state-of-the-art performance on 10 classic cross-modal benchmarks.
- CogAgent (CVPR'24): a visual agent being able to return a plan, next action, and specific operations with coordinates for any given task on any GUI screenshot, enhancing GUI-related question-answering capabilities.
- CogVideoX: a large-scale diffusion transformer models designed for generating videos based on text prompts.
- ๐ฌ Feel free to drop me an email for:
- Any form of collaboration
- Any issue about my works or code
- Interesting ideas to discuss or just chatting