Skip to content

The aim of this task is to build a model that predicts the dialect given the text.

Notifications You must be signed in to change notification settings

FatimaMHelmy/NLP-Dialect-Detection

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP-Dialect-Detection

Arabic Dialect Detection

This repository is to provide an application for detecting and identifying the arabic dialects using ML and DL models.

Demo:

bandicam.2023-05-13.19-38-05-633.mp4

APP Pipeline

nlp-pipeline

Project Pipline:

01 Data Fetching

  • used SQLite connection and pandas to perform a join query and save the result in a dataframe.

02 Dara pre processing

  • Preprocessing has a pipeline that applied to our fetched dataset:
    • Removing Punctuations
    • Removing Symbols
    • Removing Emojis
    • Removing Diacritics
    • Removing Non-Arabic Characters
    • Removing Repeated
    • Apply Lemmatisation

03 ML Model Training

  • Text representation using TF-IDF
  • Model selection
    • SVC F1 score of 82%
    • Lightgbm F1 score of 75%

04 DL Model Training

  • Hugging Face AraBert accuracy 84%

05 Deployment

  • convert our model into ONNX model
  • Deploy with FastAPI

About

The aim of this task is to build a model that predicts the dialect given the text.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.6%
  • Python 4.4%