[Runtime] Change default alignment to 64 bytes. #12586
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
One change made in #5252 (which added support for Hexagon to the runtime) was increasing the byte alignment from 64 to 128. This can cause problems when interacting with dlpack. For example tests/python/contrib/test_dlpack.py has a high chance of failing when run locally due to torch returning tensors with 64 rather than 128 byte alignment. I'm not sure why it doesnt fail in CI, perhaps the consistency of the environment always returns an appropriately aligned tensor.
Changing the default alignment allows interoperability with both torch and newer versions of numpy that support dlpack. I've slightly modified the torch test to run multiple times to make sure its behavior is consistent.
See previous discussion in #12564. I chatted with @vinx13 and it seems like default 64 byte alignment should be fine for CUDA, so this change wont break anything. I'm reopening this pull request (in a new pr as I did a rebase and it wont let me reopen the previous one). I think this change is still likely positive while we work out a long term target based solution.
cc @areusch