Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate reports over (domain, org, reportid), not just over reportid #113

Merged
merged 1 commit into from
Jul 15, 2023

Conversation

mwander
Copy link
Contributor

@mwander mwander commented Mar 6, 2023

This pull request closes #112.

WARNING: The scheme of a running database must be changed to use the new index definition. Example for MySQL:
ALTER TABLE report DROP INDEX domain, ADD UNIQUE KEY domain (domain, org, reportid);

Any idea to automate this?

Without this change, the following failure will occur when encountering an identical (domain, reportid) tuple:
DBD::mysql::db do failed: Duplicate entry 'wander.science-wander.science.1677801600.1677888000' for key 'domain' at dmarcts-report-parser.pl line 859.
dmarcts-report-parser.pl: aperture-labs.org: wander.science.1677801600.1677888000: Cannot add report to database. Skipped.
dmarcts-report-parser.pl: Skipping IMAP message with UID #2869 due to database errors.

WARNING: This change requires to alter your database index. Mysql:
ALTER TABLE report DROP INDEX domain, ADD UNIQUE KEY domain (domain, org, reportid);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parser deduplicates reports by Report-ID only, but should consider the reporter, too
2 participants