Skip to content

Latest commit

 

History

History
13 lines (10 loc) · 419 Bytes

README.md

File metadata and controls

13 lines (10 loc) · 419 Bytes

html-indexer

Simple python indexer that analyzes a set of html documents and returns the direct and indirect indexes

install

run pip install -r requirements.txt run nltk.py run python -m nltk.downloader wordnet stopwords

run

python html_indexer.py input_folder special_files_folder result_folder

======= Simple python indexer that analyzes a set of html documents and returns the direct and inverted indexes