Hungarian Sentiment Analyser

Overview

This project aims to implement and train a sentiment analysis model specifically for the Hungarian language, based on the architecture proposed in this research paper. Initially presented at the CINTI 2023 conference, the model has been trained on a dataset of over 300,000 reviews. This README provides a comprehensive guide to the project's structure, setup, and usage.

Model Description

The model architecture is adapted from the referenced paper with specific adjustments for the Hungarian language and to the dataset.

Dataset Description

The mock dataset used for demonstration purposes is a scaled-down version of the original dataset. It includes essential preprocessing steps such as duplicate removal, NaN handling, and character count restrictions. While not as extensive as the full dataset, it serves as a representative sample for testing and development.

Installation Instructions

First clone the repository

git clone

Enter the directory

cd hu_sentiment_analyser

Install dependencies

poetry install

Run the training

poetry run python .\src\main.py

Run mlflow to see the metrics, or go to your dagshub account.

mlflow run

File Structure

Data Ingestion: It downloads the dataset.
Data Preprocess: It separates the dataset into 3 parts: training, validation and test data.
Prepare Base Model: It contains the base model architecture.
Training: It trains the model on training dataset, check the config.yaml file for its output.
Evaluation: It evaluates the model and uses mlflow for storing the expreminents and models.

Usage Instructions

Follow these steps to run each stage of the pipeline:

Data Ingestion: Run data_ingest.py to download and preprocess the dataset.
Data Preprocess: Execute data_preprocess.py to prepare the data.
Prepare Base Model: Use model_init.py to initialize the model architecture.
Training: Run train_model.py to train the model on the dataset.
Evaluation: Execute evaluate_model.py to generate the evaluation report.

Configuration Details

Refer to .env.example for environment variable setups. Here you can setup your dagshub account, but also you can leave it empty, then the evaluation ouput will be stored in the local mlflow directory. You can change the model's parameters and the training's settings in the params.yaml file

Evaluation Metrics and Results

The model is evaluated based on accuracy, precision, and recall. A summary of these metrics, along with a confusion matrix is stored in the environment file specified or in the local mlflow directory.

References

Original research paper: Link to Paper
Additional materials and resources used in this project.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
params.yaml		params.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hungarian Sentiment Analyser

Overview

Model Description

Dataset Description

Installation Instructions

File Structure

Usage Instructions

Configuration Details

Evaluation Metrics and Results

References

About

Releases

Packages

Languages

popolopo21/hu_sentiment_analyser

Folders and files

Latest commit

History

Repository files navigation

Hungarian Sentiment Analyser

Overview

Model Description

Dataset Description

Installation Instructions

File Structure

Usage Instructions

Configuration Details

Evaluation Metrics and Results

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages