In this repository, we present Muzeeglot, a propotype aiming at illustrating how multilingual music genre embedding space representations can be leveraged to generate cross-lingual music genre annotations for DBpedia music entities (artists, albums, tracks, etc ...).
Muzeeglot includes a web interface to visualize these multilingual music genre embeddings.
- ❓ How it works
- ⚙️ Architecture
- 🚀 Deployment
- 💻 Development
- 📖 Cite
Based on annotations from one or several source languages, our system automatically predicts the corresponding annotations in a target language.
Languages supported:
- 🇫🇷 French
- 🇬🇧 English
- 🇪🇸 Spanish
- 🇳🇱 Dutch
- 🇨🇿 Czech
- 🇯🇵 Japanese
You will find more information about application usage here.
Muzeeglot is based on a classic N-tier architecture including :
- A Redis instance as storage engine.
- A REST API developed in Python with FastAPI.
- A frontend developed with VueJS, as a SPA (Single Page Application).
The overall stack is loadbalanced using Nginx webserver :
Data such as entities, tags, and languages are stored into the Redis instance. Additionnally, a text search index based on Whoosh is maintained using ngram tokenization on entity names.
Deploying Muzeeglot requires the following tools to be installed :
You can then clone this repository and start Muzeeglot1 :
git clone https://github.com/deezer/muzeeglot
cd muzeeglot
make start
Behind the scene it will build the required docker images and run a compose file with everything required locally in daemon mode.
1 first deployment will be long as it requires data ingestion and indexing.
In case you want to deploy Muzeeglot with SSL using LetsEncrypt, you need to first create certificate using the provided bot challenge. Start by editing the following configuration files to add your target domain :
frontend/nginx/certificate-builder.conf
frontend/nginx/muzeeglot-ssl.conf
Once you did so, you can run the following command to generate SSL certificates:
make letsencrypt DOMAIN=mydomain.tld
It will create a docker volume and provision it with certificate. Then you can run Muzeeglot as follows:
make ssl start
Project can be managed using GNU Make
through the following goals :
Goal | Description |
---|---|
api | Build api image |
frontend | Build frontend image |
run | Start the entire stack using docker-compose |
start | Start the entire stack in daemon mode |
stop | Stop the entier stack using docker-compose |
logs | Display stack logs when running in daemon mode |
clean | Clean docker volume for storage and indexes |
letsencrypt | Generate certificate volume |
Additional goals can be used to provide extra parameters:
Goal | Description |
---|---|
no-cache | Build images using --no-cache flag |
ssl | Enable SSL support |
If you want to use your own data, please provide the following files into api/data
directory2:
- Tag embeddings such as music genres are expected through
embeddings.csv
CSV file. - Reduced embeddings for display are expected through
embeddings_reduced.csv
CSV file. - Supported language are expected through
languages.csv
CSV file. - Indexed entities are expected through
entites.csv
CSV file. - Test corpus is expected through
corpus.csv
CSV file.
2 you need to clean the data storage and index to force data ingestion when you redeploy.
@inproceedings{epure2020muzeeglot,
title={Muzeeglot: annotation multilingue et multi-sources d'entit{\'e}s musicales {\`a} partir de repr{\'e}sentations de genres musicaux},
author={Epure, Elena V and Salha, Guillaume and Voituret, F{\'e}lix and Baranes, Marion and Hennequin, Romain},
booktitle={Actes de la 6e conf{\'e}rence conjointe Journ{\'e}es d'{\'E}tudes sur la Parole (JEP, 31e {\'e}dition), Traitement Automatique des Langues Naturelles (TALN, 27e {\'e}dition), Rencontre des {\'E}tudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (R{\'E}CITAL, 22e {\'e}dition). Volume 4: D{\'e}monstrations et r{\'e}sum{\'e}s d'articles internationaux},
pages={18--21},
year={2020},
organization={ATALA}
}
How we learn multilingual music genre embeddings in more detail:
@inproceedings{epure2020modeling,
title={Modeling the Music Genre Perception across Language-Bound Cultures},
author={Epure, Elena V and Salha, Guillaume and Manuel, Moussallam and Hennequin, Romain},
booktitle={The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)},
month = nov,
year={2020},
publisher = {Association for Computational Linguistics},
}