This is the implementation of our paper "EmoNoBa: A Dataset for Analyzing Fine-Grained Emotions on Noisy Bangla Texts". This work has been accepted at the AACL-IJCNLP 2022. You can find the paper here.
For low-resourced Bangla language, works on detecting emotions on textual data suffer from size and cross-domain adaptability. In our paper, we propose a manually annotated dataset of 22,698 Bangla public comments from social media sites covering 12 different domains such as Personal, Politics, and Health, labeled for 6 fine-grained emotion categories of the Junto Emotion Wheel. We invest efforts in the data preparation to 1) preserve the linguistic richness and 2) challenge any classification model. Our experiments to develop a benchmark classification system show that random baselines perform better than neural networks and pre-trained language models as hand-crafted features provide superior performance.
- Khondoker Ittehadul Islam 1
- Tanvir Hossain Yuvraz 1
- Md Saiful Islam 1,2
- Enamul Hassan 1
1 Shahjalal University of Science and Technology, Bangladesh
2 University of Alberta, Canada
EmoNoBa Dataset is available here
- Train.csv
- Val.csv
- Test.csv
Column Title | Description |
---|---|
Data | Social media comment |
Love | 0, 1. '1' for Love, '0' for Not Love |
Joy | 0, 1. '1' for Joy, '0' for Not Joy |
Surprise | 0, 1. '1' for Surprise, '0' for Not Surprise |
Anger | 0, 1. '1' for Anger, '0' for Not Anger |
Sadness | 0, 1. '1' for Sadness, '0' for Not Sadness |
Fear | 0, 1. '1' for Fear, '0' for Not Fear |
Topic | Topic of the comment |
Domain | Source of the comment from {Youtube, Facebook and Twitter} |
Requires the following packages:
- Python 3.10.7 or higher
It is recommended to use virtual environment packages such as virtualenv. Follow the steps below to setup the project:
- Clone this repository via
git clone https://github.com/KhondokerIslam/EmoNoBa.git
- Use this command to install required packages
pip install -r requirements.txt
- Run the setup.sh file to download additional data and setup pre-processing
- Download the EmoNoBa dataset from here
- Unzip the folder
- Ensure the folder name is "EmoNoBa Dataset"
- Go to data_processing folder and run
python preprocess.py
to obtain the preprocessed data.
- Go to Models folder
- Use
python feature_based.py
- Type in the model name when you will be asked to specify the model name in the console
- Model Names (Please follow the paper to read the details about experiments):
- W1
- W2
- W3
- W4
- W1+W2
- W1+W2+W3
- W1+W2+W3+W4
- C2
- C3
- C4
- C5
- C1+C2+C3
- C1+C2+C3+C4
- C1+C2+C3+C4+C5
- W1+C1+C2+C3+C4+C5
- W1+W2+W3+C1+C2+C3
- W1+W2+W3+W4+C1+C2+C3
- Go to Models folder
- Use "python neural_network_(random).py" to run an experiment.
- Go to Models folder
- Use "python neural_network_(embedding).py" to run an experiment.
- Go to Models folder
- Use "python bangla-bert.py" to run an experiment.
@inproceedings{islam2022emonoba,
title={EmoNoBa: A Dataset for Analyzing Fine-Grained Emotions on Noisy Bangla Texts},
author={Islam, Khondoker Ittehadul and Yuvraz, Tanvir and Islam, Md Saiful and Hassan, Enamul},
booktitle={Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing},
pages={128--134},
year={2022}
}