GitHub - Madjid-CH/auto-mixer

MixMAS

An official implementation for paper: "MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning"

About The Project

In this paper, we propose MixMAS, a framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning. Our framework automatically selects the adequate MLP-based architecture for a given multimodal machine learning task (MML).

Built With

Getting Started

Prerequisites

python 3.11 or higher

Installation

pip install -r requirements.txt

Usage

python run.py

Datasets

AV-MNIST

AV-MNIST is a multimodal dataset, formed by combining the MNIST and FSDD datasets which is used for digit pronunciation. The dataset is divided into 55000 train, 5000 validation and 10000 test instances. Given the near- perfect performance already achieved by existing uni-modal models on these datasets, we follow the M2-Mixer setup to reduce the information in the modalities. Finally, we convert audio samples to spectrograms.

MM-IMDB

MM-IMDB is a multimodal dataset that includes two modalities, image and text. It combines movie posters and plots for movie genre classification. The dataset is divided into 15552 train, 2608 validation, and 7799 test samples in a stratified manner. We used BERT to generate text embeddings.

MIMIC-III

MIMIC-III is a comprehensive clinical database, freely- available collection of de-identified health data from over 40,000 critical care patients at the Beth Israel Deaconess Medical Center between 2001 and 2012. The data is organized into two modalities: Time-series which consists of 12 different medical measurements of the patients taken each hour for 24 hours; and tabular modality that represents various medical information about the patient. The train, validation and test splits are 26093, 3261 and 3261 respectively.

Experiments and results

We compare MixMAS's performance against M2-Mixer. The hyperparameters of M2-Mixer are as follows:

Dataset	Hidden Dim.	Patch Sizes	Token Dim.	Channel Dim.	Blocks (modality 1/ modality 2 / Fusion)	Params (M)
MM-IMDB	256	16 Image / 512 Text	32	3072	4 / 4 / 2	16.7
AV-MNIST	128	14 Image / 56 Audio	32	3072	4 / 4 / 2	8.3
MIMIC-III	64	24 Time-series / -	16	64	1 / 2 / 1	0.029

All the blocks in the M2-Mixer are MLP-Mixer blocks. For MixMAS, the hyperparameters are the same except for the type of the blocks that will be selected during the micro benchmarking.

The final results are summarized in the chart below:

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
auto_mixer		auto_mixer
images		images
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
run.py		run.py
run_for_signifigance.py		run_for_signifigance.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MixMAS

Table Of Contents

About The Project

Built With

Getting Started

Prerequisites

Installation

Usage

Datasets

AV-MNIST

MM-IMDB

MIMIC-III

Experiments and results

About

Releases

Packages

Contributors 2

Languages

Madjid-CH/auto-mixer

Folders and files

Latest commit

History

Repository files navigation

MixMAS

Table Of Contents

About The Project

Built With

Getting Started

Prerequisites

Installation

Usage

Datasets

AV-MNIST

MM-IMDB

MIMIC-III

Experiments and results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages