Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uniqueItems check only validates the first two entries for types with a __len__ implementation #866

Closed
omaskery opened this issue Oct 25, 2021 · 5 comments · Fixed by #875
Labels
Bug Something doesn't work the way it should.

Comments

@omaskery
Copy link

omaskery commented Oct 25, 2021

After updating to a newer version of jsonschema, we now find that it does not notice violations of the uniqueItems constraint on lists.

By stepping through the code, I think I've found the issue in this code (originally introduced in this commit in this PR, but now found here):

        sort = sorted(unbool(i) for i in container)
        sliced = itertools.islice(sort, 1, None)

        for i, j in zip(sort, sliced):
            return not _sequence_equal(i, j)

Notice that the for loop will always return on the first iteration, without examining subsequent values.

This code looks like it's supposed to be seeing if any two successive elements is the same, but with the return statement it would only detect unique constraint violations between the first two elements.

I think this bug has escaped notice so far for two reasons:

  • I believe that all of the unit tests for this code only test unique constraint violations in the first two elements
  • This issue only presents if the elements of the container have a __len__ implementation. If a TypeError or NotImplementedError is raised then the "slow, brute force" path runs - and it does not have the same bug.

I wonder if this code is instead supposed to be:

         sort = sorted(unbool(i) for i in container)
         sliced = itertools.islice(sort, 1, None)
 
         for i, j in zip(sort, sliced):
-            return not _sequence_equal(i, j)
+            if _sequence_equal(i, j):
+                return False

For what it's worth, here's some noddy test code I put together to explore the issue:

def test(container):
    print(f"uniq({container}) = {uniq(container)}")


TEST_CASES = [
    [],
    [1],
    [1, 2],
    [1, 2, 3, 4, 5],
    [1, 2, 3, 3, 5],
    [1, 1, 3, 4, 5],

    ["a"],
    ["a", "b"],
    ["a", "b", "c", "d", "e"],
    ["a", "b", "c", "c", "e"],
    ["a", "a", "c", "d", "e"],
]


for test_case in TEST_CASES:
    test(test_case)

And here is the output I got:

uniq([]) = True
uniq([1]) = True
uniq([1, 2]) = True
uniq([1, 2, 3, 4, 5]) = True
uniq([1, 2, 3, 3, 5]) = False
uniq([1, 1, 3, 4, 5]) = False
uniq(['a']) = True
uniq(['a', 'b']) = True
uniq(['a', 'b', 'c', 'd', 'e']) = True
uniq(['a', 'b', 'c', 'c', 'e']) = True
uniq(['a', 'a', 'c', 'd', 'e']) = False
@Julian Julian added the Bug Something doesn't work the way it should. label Oct 26, 2021
@Julian
Copy link
Member

Julian commented Oct 30, 2021

Thanks for the detailed report. Any chance you're interested in a PR? The first step would be adding some tests to the official test suite here: https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/master/tests/draft2020-12/uniqueItems.json (along with the 2019 draft).

After that your suggested change looks like what was intended as well.

@DrGFreeman
Copy link
Contributor

@Julian, I'm interested in submitting a PR in the official test suite at https://github.com/json-schema-org/JSON-Schema-Test-Suite. I got changes to the tests that can trigger the current issue. My understanding is that I should make the changes in the tests of all drafts that support the uniqueItems attribute (all)? I'm asking because here you refer to the 2020-12 and 2019 drafts only.

@Julian
Copy link
Member

Julian commented Nov 3, 2021

Great! Yes please, I believe IIRC all drafts have uniqueItems

@DrGFreeman
Copy link
Contributor

@Julian, I see that the changes from the test suite are already merged here. If it's OK with you, I'll work on a PR to fix the current issue.

@Julian
Copy link
Member

Julian commented Nov 3, 2021

Yep, sounds great thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something doesn't work the way it should.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants