Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Simplify duplicate feature detection #1602

Merged
merged 1 commit into from
Apr 8, 2019

Conversation

mpkorstanje
Copy link
Contributor

@mpkorstanje mpkorstanje commented Apr 7, 2019

Summary

Between #165 and #259 cucumber would start to ignore duplicate features
by taking their MD5 sum and comparing newly parsed features. Replacing
the set of MD5 sums with a map of source to features simplifies the code
and allows duplicates to be logged as warnings.

I couldn't discover any good reason to use an MD5 hash over javas
hashCode and equals.

  1. Memory consumption doesn't seem to be a problem. CucumberFeature
    already keeps a reference the original source.

  2. Collision doesn't appear to be a problem. hashCode produces a 32
    bit hash. So by the birth-day paradox math we'd need approximately 9000
    feature files for a 1% chance of collision.

Types of changes

  • Bug fix (non-breaking change which fixes an issue).
  • New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).

Checklist:

  • I've added tests for my code.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

Between #165 and #259 cucumber would start to ignore duplicate features
by taking their MD5 sum and comparing newly parsed features. Replacing
the set of MD5 sums with a map of source to features simplifies the code
and allows duplicates to be logged as warnings.

I couldn't discover any good reason to use an MD5 hash over javas
`hashCode` and `equals`.

 1) Memory consumption doesn't to be a problem. `CucumberFeature`
 already keeps a reference the original source.

 2) Collision doesn't appear to be a problem. `hashCode` produces a 32
 bit hash. So by the birth-day paradox math we'd need approximately 9000
 feature files for a 1% chance of collision.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 86.207% when pulling 350d99d on simplify-duplicate-feature-detection into 0caf37c on master.

@mpkorstanje mpkorstanje merged commit b6a58ea into master Apr 8, 2019
@mpkorstanje mpkorstanje deleted the simplify-duplicate-feature-detection branch August 2, 2019 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants