This repository contains the codebase of prompt learning techniques integrated with CAT-Seg (CVPR'24) to adapt the Vision-Language Model CLIP to the downstream task of semantic segmentation in an Open-Vocabulary setting
Following is the list of the Prompt Learning techniques contained in this repository
-
Context Optimization CoOp (IJCV'22)
-
Modeling a prompt’s context using a set of learnable vectors, which can be optimized through minimizing the loss
-
Instead of using a vanilla template "a photo of a [CLASS]", use learnable context vectors as prompts
- e.g. "X X X X [CLASS]"
-
The integration of this technique into CAT-Seg can be found in
class CLIP
of./catseg/third_party/model_vpt.py
on main
-
-
Conditional Context Optimization CoCoOp (CVPR'22)
-
It follows a similar approach as CoOp but the in this case, the context vectors are conditioned on the image features
-
This augments the learnable prompts with the image context as a prior
-
The integration of this technique into CAT-Seg can be found in
class CLIP
of./catseg/third_party/model_vpt.py
on branch CoCoOp
-
-
Textual-based Class-aware Prompt tuning for Visual-Language Model TCP (CVPR'24)
-
This technique proposes to induce textual-knowledge into learnable prompts
-
This enhances the generalizability across unseen classes by combining the prior textual knowledge into the finetuned learnable prompts
-
The integration of this technique into CAT-Seg can be found in
class CLIP
of./catseg/third_party/model_vpt.py
on branch TCP
-