Skip to content

Duplicate Detection Algorithm, developed as part of the Data Profiling / Data Cleansing seminar

Notifications You must be signed in to change notification settings

georgwiese/DPDC-Duplicate-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DPDC-Duplicate-Detection

Duplicate Detection Algorithm, developed as part of the Data Profiling / Data Cleansing seminar

Setup

Install python 2.7 and pip, then execute pip install -r requirements.txt

Executecution

cd source
python duplicate_detector.py <csv_file>

csv_file has to be a well-formatted CSV file WITH headers!

About

Duplicate Detection Algorithm, developed as part of the Data Profiling / Data Cleansing seminar

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages