-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add defensive checks to halide_buffer_copy_already_locked #6401
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Found while debugging crashes with performance_async_gpu for OpenGLCompute: the 'if' tree wasn't robust enough for malformed buffers being passed, and could attempt to deref and use a null src->device_interface or dst->device_interface in some cases. This patch just improves this function to return an error in these cases (rather than crashing); the fact that we are getting malformed buffers passed to us is likely a separate bug.
abadams
approved these changes
Nov 9, 2021
shoaibkamil
approved these changes
Nov 9, 2021
steven-johnson
added a commit
that referenced
this pull request
Nov 9, 2021
We currently assume that _halide_buffer_crop() will never fail. This is a bad assumption, as it can call device_crop(), which can fail due to unexpected runtime errors, or from a backend simply leaving the `device_crop` field at the default (unimplemented) case (as is currently the case for the OGLC backend). When this happens, the `dst` buffer was left in an inconsistent, invalid state (which was what led to the crashes fixed by #6401). This change modifies _halide_buffer_crop() to return nullptr in the event of an error, and to wrap all callers in a `require()` clause to check the result. (This is not optimal, of course, since the specific error returned by device_crop is getting dropped on the floor, but the existence of an error is no longer ignored.) This addresses at least some of the failure issues we are seeing in performance_async_gpu with the OpenGLCompute backend. (Also: drive-by whitespace fix in CodegenC)
Failure is unrelated. |
steven-johnson
added a commit
that referenced
this pull request
Nov 9, 2021
(Alternate to #6402) We currently assume that _halide_buffer_crop() will never fail. This is a bad assumption, as it can call device_crop(), which can fail due to unexpected runtime errors, or from a backend simply leaving the device_crop field at the default (unimplemented) case (as is currently the case for the OGLC backend). When this happens, the dst buffer was left in an inconsistent, invalid state (which was what led to the crashes fixed by #6401). This change modifies _halide_buffer_crop() to return nullptr in the event of an error, and ensure that all cropped buffers are checked for null at the right point. (This is not optimal, of course, since the specific error returned by device_crop is getting dropped on the floor, but the existence of an error is no longer ignored.) This addresses at least some of the failure issues we are seeing in performance_async_gpu with the OpenGLCompute backend. (Also: drive-by whitespace fix in CodegenC)
steven-johnson
added a commit
that referenced
this pull request
Nov 11, 2021
* _halide_buffer_crop() needs to check for runtime failures (v2) (Alternate to #6402) We currently assume that _halide_buffer_crop() will never fail. This is a bad assumption, as it can call device_crop(), which can fail due to unexpected runtime errors, or from a backend simply leaving the device_crop field at the default (unimplemented) case (as is currently the case for the OGLC backend). When this happens, the dst buffer was left in an inconsistent, invalid state (which was what led to the crashes fixed by #6401). This change modifies _halide_buffer_crop() to return nullptr in the event of an error, and ensure that all cropped buffers are checked for null at the right point. (This is not optimal, of course, since the specific error returned by device_crop is getting dropped on the floor, but the existence of an error is no longer ignored.) This addresses at least some of the failure issues we are seeing in performance_async_gpu with the OpenGLCompute backend. (Also: drive-by whitespace fix in CodegenC) * Oops
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Found while debugging crashes with performance_async_gpu for OpenGLCompute: the 'if' tree wasn't robust enough for malformed buffers being passed, and could attempt to deref and use a null src->device_interface or dst->device_interface in some cases.
This patch just improves this function to return an error in these cases (rather than crashing); the fact that we are getting malformed buffers passed to us is likely a separate bug.