Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Reduce ops support reducing across more than 2 dims, but only if dims are not specified #16118

Closed
mrakitaTT opened this issue Dec 18, 2024 · 5 comments
Assignees
Labels
bug Something isn't working op_cat: reduces P1

Comments

@mrakitaTT
Copy link
Contributor

Describe the bug
Reduce ops support reducing across all dims, but only if dims are not specified. If you explicitly specify reduce dims and put all dims in there, it wont work for 3D+ tensors.

When no dims are specified this code path is hit.

When dims are specified this code path is hit and it only supports reductions across 1 or 2 dims.

To Reproduce
Try reducing 3D tensor across all dims, explicitely specifying dims=[0, 1, 2].

Expected behavior
It should be checked if dims.size() == tensor.rank() and allowed to reduce across all dims, like it is allowed when !dim_arg.has_value().

@bbradelTT
Copy link
Contributor

Please test for None, [], and [0,...] (all dims)

@bbradelTT
Copy link
Contributor

There will be subsequent work to handle more combinations #16233

@asandhupatlaTT
Copy link
Contributor

asandhupatlaTT commented Dec 20, 2024

Please test for None, [], and [0,...] (all dims)

dim=[] is throwing error. this PR will address only cases where rank == dim

FAILED tests/ttnn/unit_tests/operations/test_reduction.py::test_sum_4d_tensors[keepdim=True-dim=[]-w=32-h=32-c=10-batch_size=10] - RuntimeError: TT_THROW @ ../ttnn/cpp/ttnn/operations/reduction/generic/generic_reductions.cpp:130: tt::exception

@bbradelTT
Copy link
Contributor

I added dim=[] to #16233 then.

asandhupatlaTT added a commit that referenced this issue Dec 20, 2024
### Ticket
Link to Github Issue
#16118

### Problem description
Provide context for the problem.
Current code doesn't support dim.size()>2

### What's changed
Describe the approach used to solve the problem.
Summarize the changes made and its impact.
Added support for higher dim.
Note: it only supports if input_tensor_rank == dim.size(). Other
combinations are out of scope of this PR

### Checklist
- [x] Post commit CI passes :
https://github.com/tenstorrent/tt-metal/actions/runs/12435618460
- [x] Blackhole Post commit (if applicable) :
https://github.com/tenstorrent/tt-metal/actions/runs/12435616950
- [x] Model regression CI testing passes (if applicable) :
https://github.com/tenstorrent/tt-metal/actions/runs/12435615910
- [x] Device performance regression CI testing passes (if applicable) :
https://github.com/tenstorrent/tt-metal/actions/runs/12435614685
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes

---------

Signed-off-by: Amruth Sandhupatla <[email protected]>
@asandhupatlaTT
Copy link
Contributor

fixed at #16236

mrakitaTT added a commit to tenstorrent/tt-mlir that referenced this issue Dec 20, 2024
This PR adds TTNN workarounds for these Metal issues:
- tenstorrent/tt-metal#13361 - By decomposing
`reduce(keepDim=false)` into `reduce(keepDim=true) + reshape`
- tenstorrent/tt-metal#16118 - By annulling
dimensions argument when all dims are being reduced

As part of this work I've also:
- Enabled conversion of `stablehlo.reduce` op with multiple reduce
dimensions
- Added reduce ops verifiers in TTIR
- Added a separate function in TTNNWorkarounds to run rewrite patterns
for decomposition and layout workarounds
- Added lots of unit tests for reduce ops to cover conversions and
verifiers
- Added lots of silicon tests for reduce ops

Opened issue #1624 on
myself to revert these workarounds once Metal issues are fixed.

Closes #805, #848

After implementing these workarounds and running tests, I've encountered
[another Metal
issue](tenstorrent/tt-metal#16104), this time
in `reshape` op. I've debugged it and I have a local fix, I will send a
PR to fix it in Metal repo, confirmed with reshape op owners. I've
opened myself an issue
#1640 to enable Reduce ops
silicon tests after this fix is uplifted.

Another issue that I've encountered while working on this is that after
the workaround pass decompositions, if we are changing the shapes of the
ops tensors, that means that their layout needs to be changed too, but
layout pass is done before the workaround pass. I've managed to solve it
by reusing the layout of the input tensor, but I am not sure if that is
a good solution and maybe we need to repeat some of the layout logic
again after workaround decompositions. FYI @sdjordjevicTT

Here is the example TTNN IR before the workarounds:
```
%3 = "ttnn.sum"(%2) <{dim_arg = [0: i32, 1 : i32, 2: i32], keep_dim = false}> : (tensor<128x32x4xf32, #ttnn_layout2>) -> tensor<1xf32, #ttnn_layout2>
```

and after the workarounds:
```
%3 = "ttnn.sum"(%2) <{keep_dim = true}> : (tensor<128x32x4xf32, #ttnn_layout2>) -> tensor<1x1x1xf32, #ttnn_layout2>
%4 = "ttnn.reshape"(%3) <{shape = [1 : i32]}> : (tensor<1x1x1xf32, #ttnn_layout2>) -> tensor<1xf32, #ttnn_layout3>
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working op_cat: reduces P1
Projects
None yet
Development

No branches or pull requests

3 participants