Duplicate Detection Algorithm, developed as part of the Data Profiling / Data Cleansing seminar
Install python 2.7 and pip, then execute pip install -r requirements.txt
cd source
python duplicate_detector.py <csv_file>
csv_file
has to be a well-formatted CSV file WITH headers!