Skip to content

liuruixun/MV-CC

Repository files navigation

MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption

Ruixun Liu*, Kaiyu Li*, Jiayi Song*, Dongwei Sun, Xiangyong Cao

Preparation

  • Install the required packages: pip install -r requirements.txt
  • Download the remote sensing change captioning datasets. We have adopted LEVIR-MCI in this repository.
  • The data structure of LEVIR-MCI is organized as follows:
├─/root/Data/LEVIR-MCI-dataset/
        ├─LevirCCcaptions.json
        ├─images
             ├─train
             │  ├─A
             │  ├─B
             │  ├─label
             │  ├─semi_mask
             │  ├─supervised_mask
             │  ├─video_data
             ├─val
             │  ├─A
             │  ├─B
             │  ├─label
             │  ├─semi_mask
             │  ├─supervised_mask
             │  ├─video_data
             ├─test
             │  ├─A
             │  ├─B
             │  ├─label
             │  ├─semi_mask
             │  ├─supervised_mask
             │  ├─video_data

where folder A contains images of pre-phase, folder B contains images of post-phase.

The semi_mask represents labels obtained from the 5% semi-supervised method, while supervised_mask represents using supervised method.

They can be acquired by using the SemiCD method.

To generate the video and prepare the data:

$ python generate_MP4.py
$ python preprocess_data.py

Download the video_encoder.pth from MV-CC checkpoints and put it in checkpoints folder.

Download the model from InternVideo2_Chat_8B_InternLM2_5 and place it in the /root/video_model folder.

Training

  • Ensure you have completed the data preparation steps above, and then proceed to train the model as follows:
$ python train_video_sty.py

!NOTE: If the program encounters the error: "'Meteor' object has no attribute 'lock'," we recommend installing it with sudo apt install openjdk-11-jdk to resolve this issue.

Alternatively, you can obtain our pretrained models from MV-CC checkpoints.

Caption Generation

  • To generate captions, run the following command:
$ python test_video_sty.py

Mask mode

Mask mode is set by args.mode

5% semi-supervised method: semi_mask

Supervised method: supervised_mask

GT method: label

Paper

MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption

Please cite the following paper if you find it useful for your research:

@misc{liu2024mvccmaskenhancedvideo,
      title={MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption}, 
      author={Ruixun Liu and Kaiyu Li and Jiayi Song and Dongwei Sun and Xiangyong Cao},
      year={2024},
      eprint={2410.23946},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.23946}, 
}

Acknowledgement

The authors would like to thank the contributors to the LEVIR-MCI.

License

This repo is distributed under MIT License. The code can be used for academic purposes only.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages