NLP-Search-Engine

This repo contain my project developed during the "Natural Language Processing" course at university

The aim of the project is develope a NLP Search Engine, using NLTK (Natural Language toolkit) that given a query string the engine retrive the first $k$ documents in the corpus that have best similarity respect to the query, in this project I have explored the main tools useful in the NLP context, such as:

Corpus loading
Preprocessing on text data
- Stopwords removal
- Lemmatization
- Tokenization
- Punctuation removal
- Part of Speech
- Data cleaning in general
Document representation
- Continous Bags of Word (CBOW)
- Word embeddings (Word2Vec)
Document represtation
- Embedding average for documents representation
- Doc2Vec model
- TF-IDF
Cosine similarity
K-means alghoritm
t-SNE dimensionality reduction
Evaluation of the model (Precision, Recall, F1)
Spelling correction (using Levenshtein edit distance)

Dependencies

You can install theese dependencies from requirements.txt using pip manager in your environment as shown below:

pip install -r requirements.txt

Author

Emilio Garzia, 2024

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
spell_checker		spell_checker
LICENSE		LICENSE
MiddleProject.pdf		MiddleProject.pdf
README.md		README.md
project_one.ipynb		project_one.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-Search-Engine

Dependencies

Author

About

Releases

Packages

Languages

License

EmilioGarzia/NLP-Search-Engine

Folders and files

Latest commit

History

Repository files navigation

NLP-Search-Engine

Dependencies

Author

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages