MultimodalHugs

MultimodalHugs is a streamlined extension of Hugging Face designed for training, evaluating, and deploying multimodal AI models. Built atop Hugging Face’s powerful ecosystem, MultimodalHugs integrates seamlessly with standard pipelines while providing additional functionalities to handle multilingual and multimodal inputs—reducing boilerplate and simplifying your codebase.

Key Features

Unified Framework: Train and evaluate multimodal models (e.g., image-to-text, pose-to-text, signwriting-to-text) using a consistent API.
Minimal Code Changes: Leverage Hugging Face’s pipelines with only minor modifications.
Data in TSV: Avoid the complexity of numerous hyperparameters by maintaining data splits in .tsv files—easily specify prompts, languages, targets, or other attributes in dedicated columns.
Modular Design: Use or extend any of the components (datasets, models, modules, processors) to suit your custom tasks.
Examples Included: Refer to the examples/ directory for guided scripts, configurations, and best practices.

Installation

Clone the repository:

git clone https://github.com/GerrySant/multimodalhugs.git

Navigate and install the package:

Standard installation:
```
 cd multimodalhugs
 pip install .
```

Developer installation:

 cd multimodalhugs
 pip install -e .[dev]

Usage

Explore the examples/multimodal_translation/ directory for an end-to-end workflow demonstrating how to:

Preprocess Data: Convert raw data into .tsv format with columns for prompts, languages, target labels, etc.
Configure Training: Tune model hyperparameters via YAML or Python script.
Train & Evaluate: Utilize the included training scripts and Hugging Face's Trainer for effortless experimentation.
Extend & Adapt: Incorporate custom datasets, tokenizers, or specialized processing modules.

Note: Each example folder (e.g., Image2text_translation, pose2text_translation) contains its own detailed documentation. Refer there for more specifics.

Directory Overview

multimodalhugs
├── examples
│   └── multimodal_translation
│       ├── image2text_translation
│       ├── pose2text_translation
│       └── signwriting2text_translation
├── multimodalhugs
│   ├── custom_datasets
│   ├── data
│   ├── models
│   ├── modules
│   ├── multimodalhugs_cli
│   ├── processors
│   ├── tasks
│   ├── training_setup
│   └── utils
└── tests

examples/: Contains ready-to-run demos for various multimodal tasks.
multimodalhugs/: Core library code (datasets, models, modules, etc.).
tests/: Automated tests to ensure code integrity.

Contributing

All contributions—bug reports, feature requests, or pull requests—are welcome. Please see our GitHub repository to get involved.

License

This project is licensed under the terms of the MIT License.

Citing this Work

If you use MultimodalHugs in your research or applications, please cite:

@misc{multimodalhugs2024, 
    title={MultimodalHugs: Extending HuggingFace for Generalized Multimodal AI Model Training and Evaluation},
    author={Sant, Gerard and Moryossef, Amit},
    howpublished={\url{https://github.com/GerrySant/multimodalhugs}},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.github/workflows		.github/workflows
examples/multimodal_translation		examples/multimodal_translation
multimodalhugs		multimodalhugs
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultimodalHugs

Key Features

Installation

Usage

Directory Overview

Contributing

License

Citing this Work

About

Releases

Packages

Contributors 2

Languages

License

GerrySant/multimodalhugs

Folders and files

Latest commit

History

Repository files navigation

MultimodalHugs

Key Features

Installation

Usage

Directory Overview

Contributing

License

Citing this Work

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages