Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow some tolerance in test comparison #219

Merged
merged 2 commits into from
Jun 17, 2024

Conversation

castelao
Copy link
Member

Is it OK to downgrade the comparison from array_equal to all_close with equal_nan? In this particular case I don't see a reason why this wouldn't be necessarily identical, thus array_equal should work, but this is now failing on kestrel. Have anyone had the same issue?

Note that the default is relative tol = 1e-5 & absolute tol = 1e-8. So the question is if this is acceptable or we need to dive into to understand why this tiny difference.

Note that the default is relative tol = 1e-5 & absolute tol = 1e-8. I'm
also considering NaNs as equal values.
@castelao castelao requested review from grantbuster and bnb32 June 17, 2024 16:22
@castelao castelao self-assigned this Jun 17, 2024
@grantbuster
Copy link
Member

Is it OK to downgrade the comparison from array_equal to all_close with equal_nan? In this particular case I don't see a reason why this wouldn't be necessarily identical, thus array_equal should work, but this is now failing on kestrel. Have anyone had the same issue?

Note that the default is relative tol = 1e-5 & absolute tol = 1e-8. So the question is if this is acceptable or we need to dive into to understand why this tiny difference.

Can you answer:

  1. why are there NaNs in the output? What fraction of the outputs are NaN? I don't think fwp should ever result in NaN outputs?
  2. It looks like the outputs are u100m/v100m, what is the max difference?

@castelao
Copy link
Member Author

  • Removed nan_equal.
  • Max difference: 1.92e-6
  • allclose() pass with rtol=1e-05, atol=1e-10.

As suggested by @grantbuster since it was not supposed to have any NaN.
@grantbuster
Copy link
Member

  • Removed nan_equal.
  • Max difference: 1.92e-6
  • allclose() pass with rtol=1e-05, atol=1e-10.

Okay this all looks good. Just wanted to make sure there weren't nan's in the output... that would be problematic haha

@castelao
Copy link
Member Author

  • Removed nan_equal.
  • Max difference: 1.92e-6
  • allclose() pass with rtol=1e-05, atol=1e-10.

Okay this all looks good. Just wanted to make sure there weren't nan's in the output... that would be problematic haha

In that case, shall I place back the nan_equal, and add another check if is there any NaN? If this ever fails we would have a better hint on what is the problem.

Another question. I'm intrigued by why we didn't have this difference before. I know that something changed on CUDNN to approximate some values for speed, but I have no idea if that is the source. This difference seems small enough to ignore for now, but should I open an issue so we don't forget to maybe check this in the future?

@grantbuster
Copy link
Member

  • Removed nan_equal.
  • Max difference: 1.92e-6
  • allclose() pass with rtol=1e-05, atol=1e-10.

Okay this all looks good. Just wanted to make sure there weren't nan's in the output... that would be problematic haha

In that case, shall I place back the nan_equal, and add another check if is there any NaN? If this ever fails we would have a better hint on what is the problem.

Another question. I'm intrigued by why we didn't have this difference before. I know that something changed on CUDNN to approximate some values for speed, but I have no idea if that is the source. This difference seems small enough to ignore for now, but should I open an issue so we don't forget to maybe check this in the future?

don't add the nan_equal... Current test will fail if there are any nans which is appropriate. If you want to add another line that is assert not np.isnan(arr).any() that makes sense to me. Fine to open an issue if you think it will be useful but i dont want you spending a ton of time tracking this down as this really should not affect the sup3r outputs.

@castelao castelao merged commit 2f53a4c into main Jun 17, 2024
9 checks passed
@castelao castelao deleted the fix/forward_pass_test_tolerance branch June 17, 2024 18:39
github-actions bot pushed a commit that referenced this pull request Jun 17, 2024
* Allow some tolerance in test comparison

Note that the default is relative tol = 1e-5 & absolute tol = 1e-8. I'm
also considering NaNs as equal values.

* Removing nan_equal

As suggested by @grantbuster since it was not supposed to have any NaN.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants