PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

Zheng Zhang*, Yeyao Ma*, Enming Zhang*, Xiang Bai

^{* Equal Contribution}

Arxiv Paper

Features

A powerful extension of the Large Multi-modal Model for generic (panoptic, instance, semantic) segmentation, referring segmentation and interactivate segmentation.
Support joint training across multiple segmentation tasks and visual-language tasks.
Demonstrates zero-shot capabilities on unseen task, such as open-vocabulary segmentation, generalizaed referring segmentation, and video object segmentation.

Updates

Release evaluation code
Release training code

Installation

See Installation instructions.

Getting Started

See Preparing Datasets for PSALM.

See Getting Started with PSALM.

Model Zoo

Download PSALM here.

Citation

If you think this work is useful for your research, please use the following BibTeX entry.

@inproceedings{zhang2025psalm,
  title={Psalm: Pixelwise segmentation with large multi-modal model},
  author={Zhang, Zheng and Ma, Yeyao and Zhang, Enming and Bai, Xiang},
  booktitle={European Conference on Computer Vision},
  pages={74--91},
  year={2025},
  organization={Springer}
}

Acknowledgement

Thanks for awesome works: Mask2former, Mask2former-Simplify and LLaVA. Code is based on these works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

Zheng Zhang, Yeyao Ma, Enming Zhang*, Xiang Bai

Features

Updates

Installation

Getting Started

Model Zoo

Citation

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

Zheng Zhang*, Yeyao Ma*, Enming Zhang*, Xiang Bai

Features

Updates

Installation

Getting Started

Model Zoo

Citation

Acknowledgement

Zheng Zhang, Yeyao Ma, Enming Zhang*, Xiang Bai