From 81f48752d9e9bc48b37ed2e8182a0ae764dc2b2c Mon Sep 17 00:00:00 2001 From: zhengjinaling Date: Tue, 23 Jul 2024 21:47:58 +0200 Subject: [PATCH] add news --- README.md | 3 ++- index.html | 21 ++++++++++++++++++++- 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 0893ba4..4b07d45 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,8 @@ [[paper]](https://arxiv.org/abs/2405.19783) [[project page]](https://2toinf.github.io/IVM/) +### 🔥 IVM has been selected as outstanding paper at MFM-EAI workshop @ICML2024 + ## Introduction We introduce Instruction-guided Visual Masking (IVM), a new versatile visual grounding model that is compatible with diverse multimodal models, such as LMM and robot model. By constructing visual masks for instruction-irrelevant regions, IVM-enhanced multimodal models can effectively focus on task-relevant image regions to better align with complex instructions. Specifically, we design a visual masking data generation pipeline and create an IVM-Mix-1M dataset with 1 million image-instruction pairs. We further introduce a new learning technique, Discriminator Weighted Supervised Learning (DWSL) for preferential IVM training that prioritizes high-quality data samples. Experimental results on generic multimodal tasks such as VQA and embodied robotic control demonstrate the versatility of IVM, which as a plug-and-play tool, significantly boosts the performance of diverse multimodal models. @@ -95,7 +97,6 @@ Robot Infrastructure: [https://github.com/rail-berkeley/bridge_data_robot](https This work is built upon the [LLaVA](https://github.com/haotian-liu/LLaVA) and [SAM](https://github.com/facebookresearch/segment-anything). And we borrow ideas from [LISA](https://github.com/dvlab-research/LISA) - ## Citation ``` diff --git a/index.html b/index.html index 60bc7fc..bcd4116 100644 --- a/index.html +++ b/index.html @@ -102,7 +102,26 @@

✉Corresponding author: zhanxianyuan@air.tsinghua.edu.cn - + +
+ + Exciting News! +
Our paper has been selected as + outstanding paper + + at MFM-EAI workshop@ICML2024
+