GLaMM Model Zoo 🚀

Welcome to the GLaMM Model Zoo! This repository contains a collection of state-of-the-art models from the GLaMM (Pixel Grounding Large Multimodal Model) family. Each model is designed for specific tasks in the realm of multimodal learning, combining visual and textual data processing.

Models Overview

The following table provides an overview of the available models in our zoo. For each model, you can find links to its Hugging Face page.

To evaluate the pretrained models, please follow the instructions at evaluation.md.
To run offline demo, please follow the instructions at offline_demo.md.

Model Name	Hugging Face Link	Summary
GLaMM-GranD-Pretrained	Hugging Face	Pretrained on GranD dataset.
GLaMM-FullScope	Hugging Face	Model recommended for offline demo.
GLaMM-GCG	Hugging Face	Finetuned on GranD-f dataset for GCG task.
GLaMM-RefSeg	Hugging Face	Finetuned on RefCOCO, RefCOCO+ and RefCOCOg datasets for referring expression segmentation task.
GLaMM-RegCap-RefCOCOg	Hugging Face	Finetuned on RefCOCOg for region captioning task.
GLaMM-RegCap-VG	Hugging Face	Finetuned on Visual Genome dataset for region captioning task.

Note that all models are finetuned on GLaMM-GranD-Pretrained.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model_zoo.md

model_zoo.md

GLaMM Model Zoo 🚀

Models Overview

Files

model_zoo.md

Latest commit

History

model_zoo.md

File metadata and controls

GLaMM Model Zoo 🚀

Models Overview