-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ARITH] Enhance Canonical Simplify for LE #15471
Conversation
This PR enhances the canonical simplifier to support the following patterns: x0 * s0 + x1 * s1 + ... + xn + c < 0, let d = gcd(s0, s1, ..., s{n-1}, c) 1. if can prove -d < xn < d, then we can simplify the expression to x0 * (s0/d) + x1 * (s1/d) + ... + x{n-1} * (s{n-1}/d) < c/d, e.g. `x * 8 + y < 16` where `y` \in [0, 8), we can simplify it to `x < 2` 2. if xn is in pattern of yn % m, where m % d == 0, convert it to yn // d % (m/d) e.g. `x1 * 64 + (x2 * 8 + x3) % 64 < 120`, `x3` \in [0, 8), we can simplify it to `x1 * 8 + (x2 * 8 + x3) // 8 % 8 < 15` ==> `x1 * 8 + x2 % 8 < 15`
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
@@ -64,7 +64,8 @@ def test_add_pipeline(): | |||
s[C].pragma(xo1, "parallel_launch_point") | |||
s[C].pragma(xo2, "parallel_stride_pattern") | |||
s[C].pragma(xo2, "parallel_barrier_when_finish") | |||
s[C].vectorize(xi) | |||
# FIXME(tvm-team): vector operators are not supported for codegen to C yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is because of the poor support of vectorization for c_codegen.
Before this PR, TVM failed to simplify the branch predicate and skip the vectorization. This PR enhances the arith, and makes it TRUE vectoring, but failed on codegen stage.
In short:
- It's not related to this PR, as it's codegen issue
- It's not a regression, vectorized step is skipped before this PR.
PR apache#15471 enhances the simplification for LE, while missed a case where the upper bound `kPosInf` is divisible by a factor. Therefore, prior to this PR, when simplifying `x * 1024 + y < z * 7168`, it will fails with the error message ``` InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32 ``` This is just because the upper bound 7 here divides `kPosInf` the maximum value of int64, which passes an "if" condition in apache#15471 unexpectedly. This PR fixes the issue.
PR apache#15471 enhances the simplification for LE, while missed a case where the upper bound `kPosInf` is divisible by a factor. Therefore, prior to this PR, when simplifying `x * 1024 + y < z * 7168`, it will fails with the error message ``` InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32 ``` This is just because the upper bound 7 here divides `kPosInf` the maximum value of int64, which passes an "if" condition in apache#15471 unexpectedly. This PR fixes the issue.
PR apache#15471 enhances the simplification for LE, while missed a case where the upper bound `kPosInf` is divisible by a factor. Therefore, prior to this PR, when simplifying `x * 1024 + y < z * 7168`, it will fails with the error message ``` InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32 ``` This is just because the upper bound 7 here divides `kPosInf` the maximum value of int64, which passes an "if" condition in apache#15471 unexpectedly. This PR fixes the issue.
PR #15471 enhances the simplification for LE, while missed a case where the upper bound `kPosInf` is divisible by a factor. Therefore, prior to this PR, when simplifying `x * 1024 + y < z * 7168`, it will fails with the error message ``` InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32 ``` This is just because the upper bound 7 here divides `kPosInf` the maximum value of int64, which passes an "if" condition in #15471 unexpectedly. This PR fixes the issue.
PR apache#15471 enhances the simplification for LE, while missed a case where the upper bound `kPosInf` is divisible by a factor. Therefore, prior to this PR, when simplifying `x * 1024 + y < z * 7168`, it will fails with the error message ``` InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32 ``` This is just because the upper bound 7 here divides `kPosInf` the maximum value of int64, which passes an "if" condition in apache#15471 unexpectedly. This PR fixes the issue.
This PR enhances the canonical simplifier to support the following patterns:
x0 * s0 + x1 * s1 + ... + xn + c < 0, let d = gcd(s0, s1, ..., s{n-1}, c)
x0 * (s0/d) + x1 * (s1/d) + ... + x{n-1} * (s{n-1}/d) < c/d
, e.g.x * 8 + y < 16
wherey
\in [0, 8), we can simplify it tox < 2
x1 * 64 + (x2 * 8 + x3) % 64 < 120
,x3
\in [0, 8), we can simplify it tox1 * 8 + (x2 * 8 + x3) // 8 % 8 < 15
==>x1 * 8 + x2 % 8 < 15
cc @tqchen