Table of contents
This repository provides an implementation of a Spell Checker leveraging the power of NLTK and NumPy. The project is designed to efficiently detect and correct spelling errors in text using probabilistic models and distance-based algorithms. This project is divided in two different python source code, one dedicated to the Levenshtein edit distance
and the other to apply the Spelling correction using: preprocessing, Levenshtein edit distance and a naive probability approach.
- Find misspelled words into the query
- Compute edit distance among query and each term in the vocabulary
- Store the candidates
- Compute the probability for each candidate
- Pick the candidate with higher probability
- Replace the misspelled term with the founded candidate
from spelling_correction.Levenshtein import levenshtein
edit_distance_calculator = levenshtein(source="play", target="stay")
levenshtein_matrix = edit_distance_calculator.distance_matrix
Levenshtein.py
from spelling_correction.SpellingCorrector import SpellCorrector
query = "Iranin financal banks are strongss"
corrector = SpellCorrector(string=query)
corrected_query = corrector.retrive_corrected()
SpellingCorrector.py
Could be interesting implements different probability computation into the __compute_probabilities__(self)
of the class SpellCorrector
, current implementation is a very naive solution to compute the words probability. I suggest to try implement others approach, such that Kernighan, Noisy Channel model
Emilio Garzia, 2024