Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TIR] Affine utility support iter lowerbound and diagnostics #9699

Merged
merged 7 commits into from
Dec 25, 2021

Conversation

wrongtest-intellif
Copy link
Contributor

Hi, there. The PR originate from the issue detected in #9527, where a tiled block with both lowerbound and upperbound predicates fail to infer the region_cover property. Some tracing show that the DetectIterMap fail on such cases.

for hh_0, ww_0 in T.grid(28, 28):
    for ax0, ax1 in T.grid(10, 10):
        with T.block("cache"):
            h = T.axis.spatial(224, hh_0 * 8 - 1 + ax0)
            w = T.axis.spatial(224, ww_0 * 8 - 1 + ax1)
            T.where(1 <= hh_0 * 8 + ax0 and hh_0 * 8 + ax0 < 225 and 1 <= ww_0 * 8 + ax1 and ww_0 * 8 + ax1 < 225)
            cache[h, w] = X[h, w]

The PR modify affine utility in aspects below :

  • Free vars
    For DetectIterMap, if the expression do not contain vars in domain map, it should be safe to not analyze affine form into it. eg, x*x + 8y + z could pass affine analysis if x is not an var of iter domains. We can just put x*x into base part of IterSumExpr. It benefit the above case since hh_0 * 8 + ax0 is not an affine form, but outer var might be free in certain analysis procedures.

  • Add min field to IterMark class
    It is zero almost all time. But when lowerbound predicate exists, it seems much hard to represent the lowerbound with only source and extent.

  • More flexible predicate support
    Support resolve all kinds of >, >=, <, <= integer comparisions, for the expr of form like i * 8 < 10 - j, try refactor the expr into domain var related part and domain var free parts.

  • Introduce DiagnosticContext for debug purpose
    Replace each fail point (++unresolved_cnt_) with an error message recorded into diagnostic context. Or else it would be too hard to detect what happend for beginners. However, I do not find a proper point to call Render() of diagnostics for now.

@Hzfengsy
Copy link
Member

cc @spectrometerHBH @junrushao1994

@wrongtest-intellif wrongtest-intellif force-pushed the enforce_affine_utility branch 2 times, most recently from 1ef2e7d to 1b1852e Compare December 10, 2021 09:33
@junrushao
Copy link
Member

Ping @spectrometerHBH

@spectrometerHBH
Copy link
Contributor

spectrometerHBH commented Dec 16, 2021

Here are some comments.

  1. Free vars
    I think currently the DetectIterMap already supports free var, as long as you don't include the var in the domain map
  2. Add min field to IterMark class
    Iters with lower bound constraints can be substituted with a zero-starting iter plus offset, I think. If we want to allow non-zero lower bound constraints in predicates, then we may want to loose the check.
    3 & 4 look good to me.

BTW, the case shown in your description is not the same as the case you added in the testcase. I thought you'd like to support the binding after #9527, so you may want to modify the test case, IIUC?

@wrongtest-intellif
Copy link
Contributor Author

Many thanks to your comments!

  1. Free vars
    I think currently the DetectIterMap already supports free var, as long as you don't include the var in the domain map

Yes, I found that is the case. It should be my mistake on reason of failed cases.

2. Iters with lower bound constraints can be substituted with a zero-starting iter plus offset

Previous trials for me seems it require TryFuseIters be aware of those introduced offsets during rewrites of actual iter expr. So a mark of non-zero start is not a good idea? (for simplicity and correctness). I will try update to follow the suggestion.

the case shown in your description is not the same as the case you added in the testcase

I will change the testcase to match #9527,

@wrongtest-intellif wrongtest-intellif changed the title [TIR] Affine utility support freevars, iter lowerbound and diagnostics [TIR] Affine utility support iter lowerbound and diagnostics Dec 17, 2021
@wrongtest-intellif
Copy link
Contributor Author

2. Iters with lower bound constraints can be substituted with a zero-starting iter plus offset

Hi~ @spectrometerHBH Could you kindly take another round of review for lowerbound feature?

# h3 == 0: region is [1, 9]
# 0 < h3 <= 26: region is [h3 * 8, h3 * 8 + 9]
# h3 > 26: region is [h3 * 8, 223]
do_test_point_access(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if we can add 3 more test cases for each case of h3 :)

// Then, flattened form = IterSum(IterSplit(i, scale=9),
// IterSplit(j, scale=2),
// IterSplit(k, scale=1))
// normal form = IterSum(IterSplit(i, scale=9),
Copy link
Contributor

@spectrometerHBH spectrometerHBH Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to rewrite i*9 + j*2 + k into such a normal form (i*9 + (j*2 + k - 1) + 1).
The reason is that j*2 + k - 1 has extent 8, it's weird that the scale of i is 9. Actually i*9 + j*2 + k with predicate 1 <= j*2 + k < 9 doesn't correspond to an iter, since the value of it is not continuous. Its value is [1, 8] \union [10, 17] \union [19, 26], ....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @wrongtest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we change this example to i*8 + j*2 + k , j*2 + k >= 1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it looks good to me.

Copy link
Contributor

@spectrometerHBH spectrometerHBH Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think you might need to add more test cases to test the sanity.
Such as what if we try to fuse several iters all with lower bound constraints, and what if we split an iter with lower bound constraint.

@wrongtest-intellif
Copy link
Contributor Author

BTW, I notice that negative scale is possble #9776, would we add some cases like -8 * x + y into testcases? @spectrometerHBH

@spectrometerHBH
Copy link
Contributor

spectrometerHBH commented Dec 21, 2021

Try to add these test cases, which I think might be helpful to test the sanity.

  1. i*8 + j*2 + k, with 1 <= j*2 + k < 9, i < 11, j < 5, i < 2
  2. i*8 + j*2 + k, with 1 <= j*2 + k < 9, 3 <= i*8 + j*2 + k < 25, i < 11, j < 5, i < 2
  3. i0*45 + i1*45 + i2*9 + i3*4 + i4, with 3<= i1*5 + i2 < 8, 1 <= i3*4 + i4 < 10, i0 < 3, i1 < 2, i2 < 5, i3 < 3, i4 < 4

These all should be detected successfully (I think). Correct me if anything wrong :)

@wrongtest-intellif
Copy link
Contributor Author

Rebase, add more testcases and try to fix three detected issues:

  1. My previous code is buggy on duplicate lowerbounds on same iteration, I should not subtract min directly because it will break the assumption that mark->source be a structural form IterSumExpr. Instead, set -min to base as suggested.

  2. Previously sum with single iter would not go into TryFuseIters thus if we have the i in [0, 48) and predicate is i < 10, it seems ignore the condition totally, result to an iteration of extent=48. A new testcase with single var depict this.

res = tvm.arith.detect_iter_map([i], var_dom([(i, 48)]), tvm.tir.all(i < 10))
assert_iter_sum_pattern(res[0], 10, 0)
  1. If we fix (2), than the case below and several general unittests on tir will fail because TryFuseIters can not fully match the predicate constraint even all iters are visisted.
res = tvm.arith.detect_iter_map(
        [i, j, k],
        var_dom([(i, 128), (j, 128), (k, 128)]),
        tvm.tir.all(i * 16384 + j * 128 + k < 100),
    )

The workaround currently is that if we fail to fully match the predicate constraint, but a suffix of predicate is matched successfully, step into fallback behaviors and skip this constraint. For example, if target iter expr is k or j*128+k or i*128+j(fix me, this case will just not start a constraint match), the predicate i * 16384 + j * 128 + k < 100 will be skipped.

Unfortunately, the originated region_cover issue now seems can not be fully solved by this PR. I think the issue (2) make the corresponding case incorrectly passed previously. There is only one iteration except free vars in hh_0 * 8 - 1 + ax0 so the produced region is inferred regardless of predicate. A new case is added to intset test (previous case has two iters).

cc @spectrometerHBH

@wrongtest-intellif
Copy link
Contributor Author

For region cover problem, there are two notes:

  1. the consumer block has a conditional read accesses, so the T.reads() annotation of consumed region is larger than actual consumed.

  2. the region_cover check in state.cc actually check produce^buffer_shape can cover consume^buffer_shape, even though, the analyzer can not prove it. I can manually prove it but it depend on a partition of free variable's range.

@spectrometerHBH
Copy link
Contributor

Would be great if you can fix the CI :)

if (i == 0 && num_visited == visited.size()) {
// if match failed because of iterations are used out instead of scale mismatch,
// and all used iters are visited during current match round, fallback to skip the
// constraint. Example: exprs = [i * 2 + j, k], i in [0, 3), j in [0, 2), k in [0, 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fail on such a case since there exists an unexpected predicate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @wrongtest

@spectrometerHBH
Copy link
Contributor

the region_cover check in state.cc actually check produce^buffer_shape can cover consume^buffer_shape, even though, the analyzer can not prove it. I can manually prove it but it depend on a partition of free variable's range.

Would you mind providing the produce^buffer_shape and consume^buffer_shape calculated by the region cover check? That would be a great help.

Copy link
Contributor

@spectrometerHBH spectrometerHBH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be merged if we fail on the predicate unmatch problem.

@spectrometerHBH
Copy link
Contributor

cc @Hzfengsy @junrushao1994

@Hzfengsy Hzfengsy merged commit e2dcba2 into apache:main Dec 25, 2021
@Hzfengsy
Copy link
Member

Thanks a lot @wrongtest and @spectrometerHBH

ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
…9699)

* Enable freevars, iter lowerbound and diagnostics in affine utility

* fix lint issues and compare bug

* update to use iter shift instead of itermark min for lowerbound

* add testcase of fused iters sum with multiple lowerbounds

* add more affine check testcases, fix bug for single iter and duplicate constraints on iter

* add a newline to comment

* forbidden predicate unmatch

Co-authored-by: baoxinqi <[email protected]>
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…9699)

* Enable freevars, iter lowerbound and diagnostics in affine utility

* fix lint issues and compare bug

* update to use iter shift instead of itermark min for lowerbound

* add testcase of fused iters sum with multiple lowerbounds

* add more affine check testcases, fix bug for single iter and duplicate constraints on iter

* add a newline to comment

* forbidden predicate unmatch

Co-authored-by: baoxinqi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants