Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing test scenarios for ttnn.max #14898

Closed
bbradelTT opened this issue Nov 8, 2024 · 1 comment
Closed

Failing test scenarios for ttnn.max #14898

bbradelTT opened this issue Nov 8, 2024 · 1 comment
Assignees

Comments

@bbradelTT
Copy link
Contributor

bbradelTT commented Nov 8, 2024

Internal commits for #16163 have attempted fix that didn't work because of downstream issues that need to be addressed. Should be a basis for new PRs.

Need to look at 3 parts:

  1. add more shapes to tests besides what's listed below

  2. look at failures (with release build) for

FAILED tests/sweep_framework/sweeps/reduction/traces/max_traces.py::test_forge[layout1-dtype0-params2] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/layout/page_config.cpp:109: (heightAlignment % tile_.get_height()) == 0
  1. Fix the following.
    For
import pytest
import torch
import ttnn
from tests.ttnn.utils_for_testing import assert_with_pcc
from models.utility_functions import torch_random

@pytest.mark.parametrize("input_shape_and_dim", 
                         [
                             ((2, 32, 64, 64),-4),
                             ((2, 22, 37, 41),-4),
                             ((2, 32, 64, 64),-3),
                             ((2, 22, 37, 41),-3),
                             ((2, 32, 64),-3),
                             ((2, 22, 37),-3)
                         ])
def test_max_unsupported_dim(device, input_shape_and_dim):
    
    input_shape, max_dim = input_shape_and_dim
    
    torch_input_tensor = torch_random(input_shape, -100, 100, dtype=torch.bfloat16)
    torch_output_tensor, _ = torch.max(torch_input_tensor, dim=max_dim, keepdim=True)

    input_tensor = ttnn.from_torch(torch_input_tensor, layout=ttnn.TILE_LAYOUT, device=device)

    output_tensor = ttnn.max(input_tensor, dim=max_dim)
    output_tensor = ttnn.to_layout(output_tensor, ttnn.TILE_LAYOUT)
    output_tensor = ttnn.from_device(output_tensor)

    output_tensor = ttnn.to_torch(output_tensor)

    assert_with_pcc(torch_output_tensor, output_tensor)

All tests except the first fail even after indices are adjusted. Output is something like

PASSED test_mean.py::test_max_unsupported_dim[input_shape_and_dim2]
FAILED test_mean.py::test_max_unsupported_dim[input_shape_and_dim0] - AssertionError: 0.9424668632904696
FAILED test_mean.py::test_max_unsupported_dim[input_shape_and_dim1] - AssertionError: list(expected_pytorch_result.shape)=[1, 22, 37, 41] vs list(actual_pytorch_result.shape)=[1, 22, 64, 64]
FAILED test_mean.py::test_max_unsupported_dim[input_shape_and_dim3] - AssertionError: list(expected_pytorch_result.shape)=[2, 1, 37, 41] vs list(actual_pytorch_result.shape)=[2, 1, 64, 64]
FAILED test_mean.py::test_max_unsupported_dim[input_shape_and_dim4] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/operations/data_movement/pad/pad.cpp:41: rank == 4
FAILED test_mean.py::test_max_unsupported_dim[input_shape_and_dim5] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/operations/data_movement/pad/pad.cpp:41: rank == 4

These failing scenarios need to be fixed.

@bbradelTT bbradelTT self-assigned this Nov 8, 2024
bbradelTT added a commit that referenced this issue Dec 20, 2024
### Ticket
Link to Github Issue
#12662
#14898
#13361
#12170

### Problem description
- padding caused issues for max
- keepdim=False errored out

### What's changed
- remove the erroring out of keepdim=False and adjust code to handle
keepdim=False properly
- adding padding within min/max to ensure that it's set up properly has
been pushed back to a future PR

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12432801168
- [x] Blackhole Post commit (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12423085751
- [x] Model regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12423092106 same as
main
https://github.com/tenstorrent/tt-metal/actions/runs/12422179419/job/34683976776
- [x] Device performance regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12423088573
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes
@bbradelTT
Copy link
Contributor Author

Should be fixed by #16989

bbradelTT added a commit that referenced this issue Jan 24, 2025
bbradelTT added a commit that referenced this issue Jan 25, 2025
…eric reduce (#16989)

### Ticket
Link to Github Issues #16720 and #14898

### Problem description
- when specify a dim, output tensor of argmax is not one rank smaller
than input
- transpose seems to insert it's own padding, which occurs for the early
tensor dimensions

### What's changed
- for argmax, change shape of output tensor to have the right rank
- for generic reduce, move pad filling to right before the reduce op is
called
- also update tests and add deprecated to another specialized reduce
function

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12956428377
- [x] Blackhole Post commit (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939182511
- [x] Model regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939185477/job/36091291360
in line with main
https://github.com/tenstorrent/tt-metal/actions/runs/12937069874/job/36084729557
- [x] Device performance regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939183976
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes
tt-rkim pushed a commit that referenced this issue Jan 26, 2025
bbradelTT added a commit that referenced this issue Jan 27, 2025
### Ticket
Link to Github Issue #14898 

Subset of previous PR
(#16989) that caused a hang
in (Single-card) Demo tests and got reverted. Verified that this
pipeline passes for this subset of changes:
https://github.com/tenstorrent/tt-metal/actions/runs/12992459972

### Problem description
- transpose was filling in non-logical areas with default pad value when
called from reduce

### What's changed
- pass in an appropriate pad value for transpose to use
- also mark a method that should only be used by pool to be deprecated
to deter other uses

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12992465641
- [ ] Blackhole Post commit (if applicable)
- [ ] Model regression CI testing passes (if applicable)
- [ ] Device performance regression CI testing passes (if applicable)
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes
williamlyTT pushed a commit that referenced this issue Jan 30, 2025
…eric reduce (#16989)

### Ticket
Link to Github Issues #16720 and #14898

### Problem description
- when specify a dim, output tensor of argmax is not one rank smaller
than input
- transpose seems to insert it's own padding, which occurs for the early
tensor dimensions

### What's changed
- for argmax, change shape of output tensor to have the right rank
- for generic reduce, move pad filling to right before the reduce op is
called
- also update tests and add deprecated to another specialized reduce
function

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12956428377
- [x] Blackhole Post commit (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939182511
- [x] Model regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939185477/job/36091291360
in line with main
https://github.com/tenstorrent/tt-metal/actions/runs/12937069874/job/36084729557
- [x] Device performance regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939183976
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes
williamlyTT pushed a commit that referenced this issue Jan 30, 2025
### Ticket
Link to Github Issue #14898 

Subset of previous PR
(#16989) that caused a hang
in (Single-card) Demo tests and got reverted. Verified that this
pipeline passes for this subset of changes:
https://github.com/tenstorrent/tt-metal/actions/runs/12992459972

### Problem description
- transpose was filling in non-logical areas with default pad value when
called from reduce

### What's changed
- pass in an appropriate pad value for transpose to use
- also mark a method that should only be used by pool to be deprecated
to deter other uses

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12992465641
- [ ] Blackhole Post commit (if applicable)
- [ ] Model regression CI testing passes (if applicable)
- [ ] Device performance regression CI testing passes (if applicable)
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes
yieldthought pushed a commit that referenced this issue Jan 31, 2025
…eric reduce (#16989)

### Ticket
Link to Github Issues #16720 and #14898

### Problem description
- when specify a dim, output tensor of argmax is not one rank smaller
than input
- transpose seems to insert it's own padding, which occurs for the early
tensor dimensions

### What's changed
- for argmax, change shape of output tensor to have the right rank
- for generic reduce, move pad filling to right before the reduce op is
called
- also update tests and add deprecated to another specialized reduce
function

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12956428377
- [x] Blackhole Post commit (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939182511
- [x] Model regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939185477/job/36091291360
in line with main
https://github.com/tenstorrent/tt-metal/actions/runs/12937069874/job/36084729557
- [x] Device performance regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939183976
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes
yieldthought pushed a commit that referenced this issue Jan 31, 2025
### Ticket
Link to Github Issue #14898 

Subset of previous PR
(#16989) that caused a hang
in (Single-card) Demo tests and got reverted. Verified that this
pipeline passes for this subset of changes:
https://github.com/tenstorrent/tt-metal/actions/runs/12992459972

### Problem description
- transpose was filling in non-logical areas with default pad value when
called from reduce

### What's changed
- pass in an appropriate pad value for transpose to use
- also mark a method that should only be used by pool to be deprecated
to deter other uses

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12992465641
- [ ] Blackhole Post commit (if applicable)
- [ ] Model regression CI testing passes (if applicable)
- [ ] Device performance regression CI testing passes (if applicable)
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant