This repository is to provide an application for detecting and identifying the arabic dialects using ML and DL models.
Demo:
bandicam.2023-05-13.19-38-05-633.mp4
- used SQLite connection and pandas to perform a join query and save the result in a dataframe.
- Preprocessing has a pipeline that applied to our fetched dataset:
- Removing Punctuations
- Removing Symbols
- Removing Emojis
- Removing Diacritics
- Removing Non-Arabic Characters
- Removing Repeated
- Apply Lemmatisation
- Text representation using TF-IDF
- Model selection
- SVC F1 score of 82%
- Lightgbm F1 score of 75%
- Hugging Face AraBert accuracy 84%
- convert our model into ONNX model
- Deploy with FastAPI