[Feature Request] Reduce ops with keepdim=False are not supported #13361

mrakitaTT · 2024-10-02T10:55:40Z

Currently ttnn reduce ops throw error when called with keepdim=False. There is a check inside tt-metal/ttnn/cpp/ttnn/operations/reduction/generic/generic_reductions.cpp in reduce_impl function which checks for this and throws error.

This presents an issue for us in tt-mlir compiler because then we have to always manually reshape the returned tensor after calling reduce ops with keepdim=True.

Do you have any plans to support this feature in the near future?

The text was updated successfully, but these errors were encountered:

tt-mpantic · 2024-10-24T10:09:33Z

P0 since it is blocking JAX MNIST.

mrakitaTT · 2024-11-04T17:30:13Z

Do you have any plans to support this feature in the near future?

Hi @ntarafdar, could you please provide some info on this?

ntarafdar · 2024-11-04T17:50:40Z

Hey @mrakitaTT , can you ask @bbradelTT . Currently reduction has no home but I think Jasmina mentioned he might be able to take them.

mrakitaTT · 2024-11-04T18:01:00Z

Do you have any plans to support this feature in the near future?

Hi @bbradelTT, could you please provide some info on this? :)

bbradelTT · 2024-11-04T19:13:12Z

I'll take a look.

bbradelTT · 2024-11-04T22:26:27Z

@mrakitaTT can you please provide all of the calls that you are interested in (including shapes if input tensors, etc.)?

With recent changes, depending on the shape, sometimes there are downstream fatals, sometimes there aren't. Depending on the shapes you are interested in, it may be possible to unblock you sooner.

mrakitaTT · 2024-11-05T18:25:03Z

Reducing to P1 since even though it is blocking MNIST model we have a way to make a workaround for this issue in compiler if this will take a lot of time to fix on Metal side.

mrakitaTT · 2024-11-05T18:43:48Z

@bbradelTT Sure, here are the traces for MNIST model:

Reduce op	dimArg	Input tensor shape	Output tensor shape
ttnn::max	1	tensor<128x10xf32>	tensor<128xf32>
ttnn::sum	1	tensor<128x10xf32>	tensor<128xf32>
ttnn::sum	1	tensor<128x1xf32>	tensor<128xf32>
ttnn::sum	0	tensor<128x10xf32>	tensor<10xf32>
ttnn::sum	0	tensor<1x10xf32>	tensor<10xf32>
ttnn::sum	0	tensor<128x512xf32>	tensor<512xf32>
ttnn::sum	0	tensor<1x512xf32>	tensor<512xf32>

bbradelTT · 2024-11-05T23:22:14Z

I'll try out the shapes tomorrow. From my own investigation, it looks like the problem is in reduce, which seems to be called when keepdim=False.

import torch
import ttnn
device_id = 0
device = ttnn.open_device(device_id=device_id)
e=ttnn.from_torch(torch.ones([1,1,32,1], dtype=torch.bfloat16), layout=ttnn.TILE_LAYOUT, device=device)
s=ttnn.Shape([32],[32])
f=ttnn.reshape(e,s)
ttnn.close_device(device)
exit()

results in

RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:211: normalized_index >= 0 and normalized_index < rank
info:
Index is out of bounds for the rank, should be between 0 and 0 however is 18446744073709551615
backtrace:
 --- /localdev/bbradel/tt-metal/ttnn/ttnn/_ttnn.so(+0x1a5c7cb) [0x7f05fab907cb]
 --- tt::tt_metal::LegacyShape::get_normalized_index(long) const
 --- tt::tt_metal::LegacyShape::operator[](long) const
 --- ttnn::types::Shape::operator[](long) const
 --- ttnn::operations::data_movement::ReshapeViewOperation::invoke(tt::tt_metal::Tensor const&, ttnn::types::Shape const&)
...

I need to figure out a way to avoid this error.

bbradelTT · 2024-11-06T21:58:26Z

Unfortunately there are a lot of issues with padding.

E.g. some internal shapes that are showing up in my tests are:

[128, 1[32], 10[32], 32]
[128, 1, 10[32], 32]
[1[32], 128, 10[32], 32]

The new tensor shape work should make all of the padding irrelevant. We'll have to wait for that to be done, and then I'll revisit this issue.

cc @TT-BrianLiu @ayerofieiev-tt

prajaramanTT · 2024-11-27T06:12:18Z

Keeping Priority label in sync with Priority field - P2

### Ticket Link to Github Issue #12662 #14898 #13361 #12170 ### Problem description - padding caused issues for max - keepdim=False errored out ### What's changed - remove the erroring out of keepdim=False and adjust code to handle keepdim=False properly - adding padding within min/max to ensure that it's set up properly has been pushed back to a future PR ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/12432801168 - [x] Blackhole Post commit (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12423085751 - [x] Model regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12423092106 same as main https://github.com/tenstorrent/tt-metal/actions/runs/12422179419/job/34683976776 - [x] Device performance regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12423088573 - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [x] New/Existing tests provide coverage for changes

bbradelTT · 2024-12-20T16:58:14Z

@mrakitaTT could you please test with newest tt-metal main and let me know what you find?

You may need to test with a release build to avoid some asserts.

@sdjordjevicTT

This PR adds TTNN workarounds for these Metal issues: - tenstorrent/tt-metal#13361 - By decomposing `reduce(keepDim=false)` into `reduce(keepDim=true) + reshape` - tenstorrent/tt-metal#16118 - By annulling dimensions argument when all dims are being reduced As part of this work I've also: - Enabled conversion of `stablehlo.reduce` op with multiple reduce dimensions - Added reduce ops verifiers in TTIR - Added a separate function in TTNNWorkarounds to run rewrite patterns for decomposition and layout workarounds - Added lots of unit tests for reduce ops to cover conversions and verifiers - Added lots of silicon tests for reduce ops Opened issue #1624 on myself to revert these workarounds once Metal issues are fixed. Closes #805, #848 After implementing these workarounds and running tests, I've encountered [another Metal issue](tenstorrent/tt-metal#16104), this time in `reshape` op. I've debugged it and I have a local fix, I will send a PR to fix it in Metal repo, confirmed with reshape op owners. I've opened myself an issue #1640 to enable Reduce ops silicon tests after this fix is uplifted. Another issue that I've encountered while working on this is that after the workaround pass decompositions, if we are changing the shapes of the ops tensors, that means that their layout needs to be changed too, but layout pass is done before the workaround pass. I've managed to solve it by reusing the layout of the input tensor, but I am not sure if that is a good solution and maybe we need to repeat some of the layout logic again after workaround decompositions. FYI @sdjordjevicTT Here is the example TTNN IR before the workarounds: ``` %3 = "ttnn.sum"(%2) <{dim_arg = [0: i32, 1 : i32, 2: i32], keep_dim = false}> : (tensor<128x32x4xf32, #ttnn_layout2>) -> tensor<1xf32, #ttnn_layout2> ``` and after the workarounds: ``` %3 = "ttnn.sum"(%2) <{keep_dim = true}> : (tensor<128x32x4xf32, #ttnn_layout2>) -> tensor<1x1x1xf32, #ttnn_layout2> %4 = "ttnn.reshape"(%3) <{shape = [1 : i32]}> : (tensor<1x1x1xf32, #ttnn_layout2>) -> tensor<1xf32, #ttnn_layout3> ```

mmanzoorTT · 2025-01-29T17:50:13Z

@bbradelTT Thanks for the patch. It solves the issue except in case of reducing for entire tensor. If we want to apply reduction for entire tensor with rank 4 (e.g. ttnn.sum(input)); the output will have a shape of 1x1x1x1xf32; which is inconsistent with pytorch or other packages. The expected shape is 1xf32

bbradelTT · 2025-01-29T19:26:03Z

@mmanzoorTT I'm glad to hear the other cases are solved. I need to wait until #15416 is fixed before looking at the 1xf32 case.

mrakitaTT added the feature-request External feature request label Oct 2, 2024

mrakitaTT mentioned this issue Oct 2, 2024

Stablehlo MINIST Softmax test is failing due to Reduction Op tenstorrent/tt-mlir#805

Closed

tt-mpantic added the forge label Oct 14, 2024

tt-mpantic added the P0 label Oct 24, 2024

tt-mpantic assigned ntarafdar Oct 30, 2024

bbradelTT self-assigned this Nov 4, 2024

bbradelTT unassigned ntarafdar Nov 4, 2024

bbradelTT added a commit that referenced this issue Nov 4, 2024

#13361: allow keepdim for generic reductions

e7d0dbf

bbradelTT added a commit that referenced this issue Nov 4, 2024

#13361: allow keepdim for generic reductions

3d8d0bd

mrakitaTT added P1 and removed P0 labels Nov 5, 2024

bbradelTT mentioned this issue Nov 12, 2024

Remove padding along non height/width dims for transpose #14308

Closed

prajaramanTT added P2 and removed P1 labels Nov 27, 2024

mrakitaTT mentioned this issue Dec 18, 2024

Add Reduce ops workaround for keepDim=false tenstorrent/tt-mlir#1625

Merged

bbradelTT mentioned this issue Dec 20, 2024

#12662: add keepdim fixes to reduce #16163

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Reduce ops with keepdim=False are not supported #13361

[Feature Request] Reduce ops with keepdim=False are not supported #13361

mrakitaTT commented Oct 2, 2024

tt-mpantic commented Oct 24, 2024

mrakitaTT commented Nov 4, 2024

ntarafdar commented Nov 4, 2024

mrakitaTT commented Nov 4, 2024

bbradelTT commented Nov 4, 2024

bbradelTT commented Nov 4, 2024

mrakitaTT commented Nov 5, 2024

mrakitaTT commented Nov 5, 2024

bbradelTT commented Nov 5, 2024

bbradelTT commented Nov 6, 2024

prajaramanTT commented Nov 27, 2024

bbradelTT commented Dec 20, 2024

mmanzoorTT commented Jan 29, 2025

bbradelTT commented Jan 29, 2025

[Feature Request] Reduce ops with keepdim=False are not supported #13361

[Feature Request] Reduce ops with keepdim=False are not supported #13361

Comments

mrakitaTT commented Oct 2, 2024

tt-mpantic commented Oct 24, 2024

mrakitaTT commented Nov 4, 2024

ntarafdar commented Nov 4, 2024

mrakitaTT commented Nov 4, 2024

bbradelTT commented Nov 4, 2024

bbradelTT commented Nov 4, 2024

mrakitaTT commented Nov 5, 2024

mrakitaTT commented Nov 5, 2024

bbradelTT commented Nov 5, 2024

bbradelTT commented Nov 6, 2024

prajaramanTT commented Nov 27, 2024

bbradelTT commented Dec 20, 2024

mmanzoorTT commented Jan 29, 2025

bbradelTT commented Jan 29, 2025