-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Adreno] Adapt reduction schedule for adreno #13100
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
Origin cuda schedule uses rfactor that is 10x-50x slower on Adreno than without barries
4aa0dc6
to
5320706
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One minor comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
@masahi could you please review/merge? |
* [Adreno] Adapt reduction schedule for adreno Origin cuda schedule uses rfactor that is 10x-50x slower on Adreno than without barries * Address PR comments * Remove copy-paste, start reuse cuda impl * Address pylint hits * Extend comment for cuda schedule_reduce_impl
* [Adreno] Adapt reduction schedule for adreno Origin cuda schedule uses rfactor that is 10x-50x slower on Adreno than without barries * Address PR comments * Remove copy-paste, start reuse cuda impl * Address pylint hits * Extend comment for cuda schedule_reduce_impl
Origin cuda schedule uses rfactor that is 10x-50x slower on Adreno than without barries
for example mean on QHD picture on Snapdragon 888 with cuda schedule is executed for 69ms while with new proposed schedule is executed for 6.2
the same for argmin: 183ms -> 3.9ms