Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ak.approx_equal #2198

Merged
merged 15 commits into from
Feb 3, 2023
Merged

feat: add ak.approx_equal #2198

merged 15 commits into from
Feb 3, 2023

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Feb 3, 2023

Fixes #1105

@agoose77 agoose77 requested a review from jpivarski February 3, 2023 14:14
@codecov
Copy link

codecov bot commented Feb 3, 2023

Codecov Report

Merging #2198 (daf5ce3) into main (0871316) will increase coverage by 0.04%.
The diff coverage is 94.44%.

Additional details and impacted files
Impacted Files Coverage Δ
src/awkward/_util.py 74.52% <ø> (-0.29%) ⬇️
src/awkward/operations/ak_almost_equal.py 94.33% <94.33%> (ø)
src/awkward/operations/__init__.py 100.00% <100.00%> (ø)
src/awkward/forms/form.py 83.71% <0.00%> (+0.37%) ⬆️

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, but @ivirshup brought up the point that the functions in np.testing are all assertion functions, which this one is not. Since the motivation for introducing the ak.testing submodule was to follow the NumPy interface, this would have to be an assert_* type function. See #1105 (comment) and following.

The main benefit of those functions over boolean-returning functions (in my opinion) is that they can give helpful error messages. A failed assert statement (in pytest) would try to print out the two arrays, put an ellipsis in the middle because their str representations are huge, and then underline the differences, which might not even be differences in the part of the str that's not hidden.

A boolean-returning "approximately equal to" function can be top-level operations (through ak.operations), but then we'd want to make it an extension of an existing boolean-returning function, such as np.allclose. What would an Awkward extension of np.allclose look like?

np.allclose does not care about dtypes:

>>> np.allclose(np.array([[1, 2], [3, 4]]), np.array([[1.00001, 2.00001], [3.00001, 4.00001]]))
True

But it does care about shapes, in a raising-exceptions kind of way (not a returning-False kind of way):

>>> np.allclose(np.array([1, 2, 3, 4]), np.array([[1.00001, 2.00001], [3.00001, 4.00001]]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 180, in allclose
  File "/home/jpivarski/mambaforge/lib/python3.9/site-packages/numpy/core/numeric.py", line 2265, in allclose
    res = all(isclose(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan))
  File "<__array_function__ internals>", line 180, in isclose
  File "/home/jpivarski/mambaforge/lib/python3.9/site-packages/numpy/core/numeric.py", line 2375, in isclose
    return within_tol(x, y, atol, rtol)
  File "/home/jpivarski/mambaforge/lib/python3.9/site-packages/numpy/core/numeric.py", line 2356, in within_tol
    return less_equal(abs(x-y), atol + rtol * abs(y))
ValueError: operands could not be broadcast together with shapes (4,) (2,2) 

A generalization of that could be to raise exceptions if the Awkward types differ but return True or False depending only on the values.

The distance metric must be using abs because comparisons with complex numbers do what we'd expect:

>>> np.allclose(np.array([[1, 2], [3, 4]]), np.array([[1+0.00001j, 2+0.00001j], [3+0.00001j, 4+0.00001j]]))
True

but they seem to have not even implemented datetimes or timedeltas. I would think these should be considered separate types, but yet be comparable.

>>> np.allclose(np.array([1, 2, 3, 4], "M8"), np.array([1, 2, 3, 4], "M8"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Converting an integer to a NumPy datetime requires a specified unit

and

>>> np.allclose(np.array([1, 2, 3, 4], "m8"), np.array([1, 2, 3, 4], "m8"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 180, in allclose
  File "/home/jpivarski/mambaforge/lib/python3.9/site-packages/numpy/core/numeric.py", line 2265, in allclose
    res = all(isclose(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan))
  File "<__array_function__ internals>", line 180, in isclose
  File "/home/jpivarski/mambaforge/lib/python3.9/site-packages/numpy/core/numeric.py", line 2375, in isclose
    return within_tol(x, y, atol, rtol)
  File "/home/jpivarski/mambaforge/lib/python3.9/site-packages/numpy/core/numeric.py", line 2356, in within_tol
    return less_equal(abs(x-y), atol + rtol * abs(y))
numpy.core._exceptions._UFuncBinaryResolutionError: ufunc 'add' cannot use operands with types dtype('float64') and dtype('<m8')

That last one is surely a NumPy bug; I'll report it. numpy/numpy#23157

It considers booleans to be numerical, which is not typical of NumPy.

>>> np.allclose(np.array([False, True, False, True]), np.array([0.00000001, 1.00000001, 0.00000001, 1.00000001]))
True

@jpivarski jpivarski marked this pull request as draft February 3, 2023 17:58
@agoose77
Copy link
Collaborator Author

agoose77 commented Feb 3, 2023

(From the other discussion)

np.isclose isn't a ufunc.

Right, to be precise it's implemented mostly in terms of ufuncs, and it broadcasts.

But woah—it just works! How did that happen?

__array_function__ ;)

But it does care about shapes, in a raising-exceptions kind of way (not a returning-False kind of way):

Well, this is a generic broadcasting error. I'd not put that in the "this is a noisy function box", otherwise we'd have to do the same for all ufuncs, for example.

A generalization of that could be to raise exceptions if the Awkward types differ but return True or False depending only on the values.

So if we generalised NumPy, we'd only error on structural differences rather than type alone; if it broadcasts, then it's allowed.

The main benefit of those functions over boolean-returning functions (in my opinion) is that they can give helpful error messages.

We definitely could do something rich-like and make it easier to spot why an assert failed, sure. However, I think our L1 users are most likely to want a boolean, not an assert.

My view is that we have different functions here. We can implement an Awkward-specific isclose (following from the work on adding NEP18 shims everywhere), if we need. We can also add allclose and array_equal, which can follow directly from NumPy's meaning. Perhaps they should take additional Awkward-specific flags to test our datashape equality, like the current function in this PR does. However, allclose is not the same as array_equal: the former broadcasts. Our function would not want to do this in my view, so we probably want array_equal and array_approx_equal, where the former is a NumPy overload, too.

@jpivarski
Copy link
Member

Okay, our need is different enough here (types matter more to Awkward than to NumPy) that we can introduce a new function with a new name. But perhaps it should be in ak.operations, since it's not an asserting function? (Sorry that I said it should be in ak.testing without noticing that np.testing is all asserting functions.)

What would you think about making it a regular operation?

@agoose77
Copy link
Collaborator Author

agoose77 commented Feb 3, 2023

(Sorry that I said it should be in ak.testing without noticing that np.testing is all asserting functions.)

Just want to acknowledge that I appreciate the note, but that there's no need to apologise for this kind of stuff - we're all allowed to change our mind based upon new information!

What would you think about making it a regular operation?

Yes, if you're happy that we have our own not-like-NumPy operation, let's do that.

@jpivarski
Copy link
Member

Yeah, let's just do it as a regular operation: ak.arrays_approx_equal. There's no NumPy equivalent, but there's no NumPy equivalent of ak.num, either.

Someday, we might need assert_* functions, and we'll introduce the ak.testing submodule for that, at that time. (They wouldn't have to be the same as NumPy's functions, but just the general flow of putting asserting functions in ak.testing and normal functions in ak.operations is following the NumPy model.)

@agoose77 agoose77 force-pushed the agoose77/feat-arrays-equal branch from f8e39b8 to 5066df9 Compare February 3, 2023 20:23
@agoose77
Copy link
Collaborator Author

agoose77 commented Feb 3, 2023

@jpivarski how do you feel about ak.approx_equal? I took the line that array is already implied.

@agoose77 agoose77 changed the title feat: add ak.testing.arrays_approx_equal feat: add ak.approx_equal Feb 3, 2023
@agoose77 agoose77 temporarily deployed to docs-preview February 3, 2023 20:36 — with GitHub Actions Inactive
@jpivarski
Copy link
Member

@jpivarski how do you feel about ak.approx_equal? I took the line that array is already implied.

That's a good name. For all of the operations, it's implicit that some argument is going to be an array or array-like.

@agoose77 agoose77 marked this pull request as ready for review February 3, 2023 20:58
@agoose77 agoose77 force-pushed the agoose77/feat-arrays-equal branch from 73eb5b9 to 8e01d77 Compare February 3, 2023 20:59
@agoose77 agoose77 temporarily deployed to docs-preview February 3, 2023 21:08 — with GitHub Actions Inactive
@agoose77 agoose77 temporarily deployed to docs-preview February 3, 2023 21:31 — with GitHub Actions Inactive
@agoose77 agoose77 requested a review from jpivarski February 3, 2023 21:37
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks!

@jpivarski jpivarski enabled auto-merge (squash) February 3, 2023 21:57
@agoose77 agoose77 temporarily deployed to docs-preview February 3, 2023 22:12 — with GitHub Actions Inactive
@jpivarski jpivarski merged commit 85ddf0f into main Feb 3, 2023
@jpivarski jpivarski deleted the agoose77/feat-arrays-equal branch February 3, 2023 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ak.array_equal to override np.array_equal
2 participants