Skip to content

EmilioGarzia/Spelling-Correction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spelling Correction

Table of contents

Intro

This repository provides an implementation of a Spell Checker leveraging the power of NLTK and NumPy. The project is designed to efficiently detect and correct spelling errors in text using probabilistic models and distance-based algorithms. This project is divided in two different python source code, one dedicated to the Levenshtein edit distance and the other to apply the Spelling correction using: preprocessing, Levenshtein edit distance and a naive probability approach.

Main steps for spelling correction

  1. Find misspelled words into the query
  2. Compute edit distance among query and each term in the vocabulary
  3. Store the candidates
  4. Compute the probability for each candidate
  5. Pick the candidate with higher probability
  6. Replace the misspelled term with the founded candidate

How to use

from spelling_correction.Levenshtein import levenshtein
edit_distance_calculator = levenshtein(source="play", target="stay")
levenshtein_matrix = edit_distance_calculator.distance_matrix

Levenshtein.py

from spelling_correction.SpellingCorrector import SpellCorrector
query = "Iranin financal banks are strongss"
corrector = SpellCorrector(string=query)
corrected_query = corrector.retrive_corrected()

SpellingCorrector.py

Possible improvements

Could be interesting implements different probability computation into the __compute_probabilities__(self) of the class SpellCorrector, current implementation is a very naive solution to compute the words probability. I suggest to try implement others approach, such that Kernighan, Noisy Channel model

Dependencies

Author

Emilio Garzia, 2024

About

Simple spelling corrector using Levenshtein edit distance

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages