-
-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run tests with miri #899
Closed
Closed
Run tests with miri #899
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Remove ftz
Avoid ci errors
Note that crossbeam is an indirect dependency and they are still to release a version that [passes](crossbeam-rs/crossbeam#996) on this miri check. So to temporarily get this out of the way, you can patch crossbeam on the workspace `dfdx/Cargo.toml`.
Thread main finished before other threads were still active. The only thing related to threads were the gemm usage of rayon, and disabling rayon usage for matmul seems to be sufficient to fix this.
- Moved the impl of `Cache::try_empty_cache()` to `TensorCache::clear()`. - This can be invoked both by `Cache::try_empty_cache()` and by `drop(TensorCache)`. - Moved the device cache ptr deallocation to `BytesPtr`, `CudaBytesPtr` (newtype over `CUdeviceptr`) and `Buffer`. - This is abstracted by the `CachePtr` trait. - Can be called by `TensorCache::clear()`. - This method may require some "extra" device information, such as in the cuda case. That information is held by `TensorCache`.
I'll be closing this as this was enough to indicate robustness on the library (at least in my opinion!), but nevertheless this can still be used as a reference whenever someone chooses to do a miri trial. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
dfdx/Cargo.toml
.To run a particular test, add it's rust path as a filter at the end of the miri run command:
Miri UB errors
dfdx
package (74 tests) and underdfdx-core
(293 tests).Memory/Thread leaks
nn::layers::add_into::tests::longer_network
tensor::cpu::tests::test_reuse_allocation_on_clone_tensor
TensorCache
drop seems to fix it. Commit: ae777d4dfdx
package (74 tests) and underdfdx-core
(293 tests).Smaller tensors
Some tests have been reduced in tensor size so miri runs faster. Those changes are specific to the following commits:
The tensor reduction is quite optional, they only makes it simpler to run miri "on everything". But if this change ends up persisting, this wiki page should be updated. Finally, Idk how mha/transformer work, so I can't tell if the reduction shouldn't happen. That is, each test reduction should be reviewed to ack their reduction doesn't lose anything that should be tested.
Note that Miri depends on the test suite to find something, so having a bigger test suite may help. This is to say that a Miri check could be re-run in the future.
I haven't verified the cpu test coverage over the cpu source code, but a good way to go would be to have a high coverage (every relevant line of code getting "tested"). If more assurance would be desired, there could be more tests aiming at different characteristics of the source code, such as those related to branch coverage etc.