Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved performance of @UniqueElements(by=NOT_SET)` #491

Merged
merged 1 commit into from
Jun 18, 2023

Conversation

DirkToewe
Copy link
Contributor

Overview

This PR aims at significantly improving the performance of @UniqueElements(by=NOT_SET), mainly by replacing calls to uniqueElements(x->x) by uniqueElements() wherever possible.

Details

If we look at the following two property tests:

@PropertyDefaults( tries = 100_000 )
public class UniqueElementsTest
{
  @Property void testA( @ForAll @UniqueElements int[] a ) {
    Arrays.sort(a);
    for( int i=1; i < a.length; i++ )
      assert a[i-1] <= a[i];
  }

  @Provide Arbitrary<int[]> arrays() {
    return Arbitraries.integers().array(int[].class).uniqueElements();
  }

  @Property void testB( @ForAll("arrays") int[] a ) {
    Arrays.sort(a);
    for( int i=1; i < a.length; i++ )
      assert a[i-1] <= a[i];
  }
}

On my machine, it takes 20 seconds for testA to finish while testB only takes 2 seconds. The problem becomes much more emphasized with larger array sizes.


I hereby agree to the terms of the jqwik Contributor Agreement.

// instead of `items instanceof HashSet`, because subclasses
// of HashSet may break Set conventions.
return items -> items.getClass().equals(HashSet.class)
|| items.size() == items.stream().distinct().count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a suboptimal way to check uniqueness. You can stop as soon as you detect the first non-unique element.

At the same time, elements in hashset are always unique, aren't they?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh You're right, it would be much faster to stop early. I will fix that.

Yes that's why added the HashSet clause. It might speed things up sometimes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the logic. Now it should stop early.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. There's one more case doing stream.distinct.count:

long uniqueCount = elements.stream().map(this::applySafe).distinct().count();

I wonder if it makes sense to unify them.

WDYT?

@jlink
Copy link
Collaborator

jlink commented Jun 17, 2023

@DirkToewe As soon as you deem it ready, just rebase it, so that I can merge it.

@DirkToewe
Copy link
Contributor Author

I did a soft reset and re-committed the changes. All merge conflicts should be gone now.

The PR now also includes the change to FeatureExtractor::areUnique suggested by @vlsi. It felt like it fits the theme of the PR. I hope that's okay.

@jlink jlink merged commit b87a3ed into jqwik-team:main Jun 18, 2023
@jlink
Copy link
Collaborator

jlink commented Jun 18, 2023

Published in 1.7.4-SNAPSHOT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants