[StableHLO] Reduce op failure #848

mmanzoorTT · 2024-09-30T17:53:21Z

StableHLO reduce op is failing due to issues with constant op

A sample test case can be found in tt-xla PR

The text was updated successfully, but these errors were encountered:

nsmithtt · 2024-10-09T11:53:16Z

Not sure what the issue is from the description. Not sure which component to send it to, can you add more context?

mmanzoorTT · 2024-10-10T13:19:30Z

jnp.max(array) is lowered to

    %0 = stablehlo.constant dense<0xFF800000> : tensor<f32>
    %1 = stablehlo.reduce(%arg0 init: %0) applies stablehlo.maximum across dimensions = [0, 1] : (tensor<128x784xf32>, tensor<f32>) -> tensor<f32>

Similarly, jnp.sum(array) is lowered to

    %cst = stablehlo.constant dense<0xFF800000> : tensor<f32>
    %0 = stablehlo.reduce(%arg0 init: %cst) applies stablehlo.maximum across dimensions = [0, 1] : (tensor<2x1xf32>, tensor<f32>) -> tensor<f32>

stablehlo.reduce requires a scalar constant for initialization and scalar constant support is in progress. So stablehlo.reduce is blocked. We have stablehlo to TTIR conversion for stablehlo.reduce but end-to-end tests (using jax) fails due to issues with scalar constant.

mmanzoorTT · 2024-10-10T21:24:43Z

TTNN has ttnn.max and ttnn.sum ops which corresponds to stablehlo.reduce op. TTNN version does not require initialization constant and it gets remove as dead code. so scalar constant is not an issue here.
stablehlo.reduce conversion has few bugs which are fixed in #899

uazizTT · 2024-10-11T03:03:30Z

This seems to be a duplicate of #805

@sdjordjevicTT

This PR adds TTNN workarounds for these Metal issues: - tenstorrent/tt-metal#13361 - By decomposing `reduce(keepDim=false)` into `reduce(keepDim=true) + reshape` - tenstorrent/tt-metal#16118 - By annulling dimensions argument when all dims are being reduced As part of this work I've also: - Enabled conversion of `stablehlo.reduce` op with multiple reduce dimensions - Added reduce ops verifiers in TTIR - Added a separate function in TTNNWorkarounds to run rewrite patterns for decomposition and layout workarounds - Added lots of unit tests for reduce ops to cover conversions and verifiers - Added lots of silicon tests for reduce ops Opened issue #1624 on myself to revert these workarounds once Metal issues are fixed. Closes #805, #848 After implementing these workarounds and running tests, I've encountered [another Metal issue](tenstorrent/tt-metal#16104), this time in `reshape` op. I've debugged it and I have a local fix, I will send a PR to fix it in Metal repo, confirmed with reshape op owners. I've opened myself an issue #1640 to enable Reduce ops silicon tests after this fix is uplifted. Another issue that I've encountered while working on this is that after the workaround pass decompositions, if we are changing the shapes of the ops tensors, that means that their layout needs to be changed too, but layout pass is done before the workaround pass. I've managed to solve it by reusing the layout of the input tensor, but I am not sure if that is a good solution and maybe we need to repeat some of the layout logic again after workaround decompositions. FYI @sdjordjevicTT Here is the example TTNN IR before the workarounds: ``` %3 = "ttnn.sum"(%2) <{dim_arg = [0: i32, 1 : i32, 2: i32], keep_dim = false}> : (tensor<128x32x4xf32, #ttnn_layout2>) -> tensor<1xf32, #ttnn_layout2> ``` and after the workarounds: ``` %3 = "ttnn.sum"(%2) <{keep_dim = true}> : (tensor<128x32x4xf32, #ttnn_layout2>) -> tensor<1x1x1xf32, #ttnn_layout2> %4 = "ttnn.reshape"(%3) <{shape = [1 : i32]}> : (tensor<1x1x1xf32, #ttnn_layout2>) -> tensor<1xf32, #ttnn_layout3> ```

mrakitaTT · 2024-12-20T23:38:34Z

Closing as a duplicate of #805

mmanzoorTT mentioned this issue Sep 30, 2024

Add basic end to end tests tenstorrent/tt-xla#15

Merged

mmanzoorTT added this to the [Third Party] HLO + XLA milestone Oct 8, 2024

nsmithtt assigned mmanzoorTT Oct 9, 2024

mmanzoorTT mentioned this issue Oct 10, 2024

Fix bugs in stablehlo reduce op conversion #899

Closed

mrakitaTT mentioned this issue Dec 18, 2024

Add Reduce ops workaround for keepDim=false #1625

Merged

mrakitaTT closed this as completed Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[StableHLO] Reduce op failure #848

[StableHLO] Reduce op failure #848

mmanzoorTT commented Sep 30, 2024

nsmithtt commented Oct 9, 2024

mmanzoorTT commented Oct 10, 2024

mmanzoorTT commented Oct 10, 2024

uazizTT commented Oct 11, 2024

mrakitaTT commented Dec 20, 2024

[StableHLO] Reduce op failure #848

[StableHLO] Reduce op failure #848

Comments

mmanzoorTT commented Sep 30, 2024

nsmithtt commented Oct 9, 2024

mmanzoorTT commented Oct 10, 2024

mmanzoorTT commented Oct 10, 2024

uazizTT commented Oct 11, 2024

mrakitaTT commented Dec 20, 2024