Failing test scenarios for ttnn.max #14898

bbradelTT · 2024-11-08T15:17:57Z

Internal commits for #16163 have attempted fix that didn't work because of downstream issues that need to be addressed. Should be a basis for new PRs.

Need to look at 3 parts:

add more shapes to tests besides what's listed below
look at failures (with release build) for

FAILED tests/sweep_framework/sweeps/reduction/traces/max_traces.py::test_forge[layout1-dtype0-params2] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/layout/page_config.cpp:109: (heightAlignment % tile_.get_height()) == 0

Fix the following.
For

import pytest
import torch
import ttnn
from tests.ttnn.utils_for_testing import assert_with_pcc
from models.utility_functions import torch_random

@pytest.mark.parametrize("input_shape_and_dim", 
                         [
                             ((2, 32, 64, 64),-4),
                             ((2, 22, 37, 41),-4),
                             ((2, 32, 64, 64),-3),
                             ((2, 22, 37, 41),-3),
                             ((2, 32, 64),-3),
                             ((2, 22, 37),-3)
                         ])
def test_max_unsupported_dim(device, input_shape_and_dim):
    
    input_shape, max_dim = input_shape_and_dim
    
    torch_input_tensor = torch_random(input_shape, -100, 100, dtype=torch.bfloat16)
    torch_output_tensor, _ = torch.max(torch_input_tensor, dim=max_dim, keepdim=True)

    input_tensor = ttnn.from_torch(torch_input_tensor, layout=ttnn.TILE_LAYOUT, device=device)

    output_tensor = ttnn.max(input_tensor, dim=max_dim)
    output_tensor = ttnn.to_layout(output_tensor, ttnn.TILE_LAYOUT)
    output_tensor = ttnn.from_device(output_tensor)

    output_tensor = ttnn.to_torch(output_tensor)

    assert_with_pcc(torch_output_tensor, output_tensor)

All tests except the first fail even after indices are adjusted. Output is something like

PASSED test_mean.py::test_max_unsupported_dim[input_shape_and_dim2]
FAILED test_mean.py::test_max_unsupported_dim[input_shape_and_dim0] - AssertionError: 0.9424668632904696
FAILED test_mean.py::test_max_unsupported_dim[input_shape_and_dim1] - AssertionError: list(expected_pytorch_result.shape)=[1, 22, 37, 41] vs list(actual_pytorch_result.shape)=[1, 22, 64, 64]
FAILED test_mean.py::test_max_unsupported_dim[input_shape_and_dim3] - AssertionError: list(expected_pytorch_result.shape)=[2, 1, 37, 41] vs list(actual_pytorch_result.shape)=[2, 1, 64, 64]
FAILED test_mean.py::test_max_unsupported_dim[input_shape_and_dim4] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/operations/data_movement/pad/pad.cpp:41: rank == 4
FAILED test_mean.py::test_max_unsupported_dim[input_shape_and_dim5] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/operations/data_movement/pad/pad.cpp:41: rank == 4

These failing scenarios need to be fixed.

The text was updated successfully, but these errors were encountered:

### Ticket Link to Github Issue #12662 #14898 #13361 #12170 ### Problem description - padding caused issues for max - keepdim=False errored out ### What's changed - remove the erroring out of keepdim=False and adjust code to handle keepdim=False properly - adding padding within min/max to ensure that it's set up properly has been pushed back to a future PR ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/12432801168 - [x] Blackhole Post commit (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12423085751 - [x] Model regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12423092106 same as main https://github.com/tenstorrent/tt-metal/actions/runs/12422179419/job/34683976776 - [x] Device performance regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12423088573 - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [x] New/Existing tests provide coverage for changes

bbradelTT · 2025-01-22T22:22:43Z

Should be fixed by #16989

…eric reduce (#16989) ### Ticket Link to Github Issues #16720 and #14898 ### Problem description - when specify a dim, output tensor of argmax is not one rank smaller than input - transpose seems to insert it's own padding, which occurs for the early tensor dimensions ### What's changed - for argmax, change shape of output tensor to have the right rank - for generic reduce, move pad filling to right before the reduce op is called - also update tests and add deprecated to another specialized reduce function ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/12956428377 - [x] Blackhole Post commit (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12939182511 - [x] Model regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12939185477/job/36091291360 in line with main https://github.com/tenstorrent/tt-metal/actions/runs/12937069874/job/36084729557 - [x] Device performance regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12939183976 - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [x] New/Existing tests provide coverage for changes

… for generic reduce (#16989)" This reverts commit 83145d2.

### Ticket Link to Github Issue #14898 Subset of previous PR (#16989) that caused a hang in (Single-card) Demo tests and got reverted. Verified that this pipeline passes for this subset of changes: https://github.com/tenstorrent/tt-metal/actions/runs/12992459972 ### Problem description - transpose was filling in non-logical areas with default pad value when called from reduce ### What's changed - pass in an appropriate pad value for transpose to use - also mark a method that should only be used by pool to be deprecated to deter other uses ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/12992465641 - [ ] Blackhole Post commit (if applicable) - [ ] Model regression CI testing passes (if applicable) - [ ] Device performance regression CI testing passes (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [x] New/Existing tests provide coverage for changes

…eric reduce (#16989) ### Ticket Link to Github Issues #16720 and #14898 ### Problem description - when specify a dim, output tensor of argmax is not one rank smaller than input - transpose seems to insert it's own padding, which occurs for the early tensor dimensions ### What's changed - for argmax, change shape of output tensor to have the right rank - for generic reduce, move pad filling to right before the reduce op is called - also update tests and add deprecated to another specialized reduce function ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/12956428377 - [x] Blackhole Post commit (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12939182511 - [x] Model regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12939185477/job/36091291360 in line with main https://github.com/tenstorrent/tt-metal/actions/runs/12937069874/job/36084729557 - [x] Device performance regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12939183976 - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [x] New/Existing tests provide coverage for changes

### Ticket Link to Github Issue #14898 Subset of previous PR (#16989) that caused a hang in (Single-card) Demo tests and got reverted. Verified that this pipeline passes for this subset of changes: https://github.com/tenstorrent/tt-metal/actions/runs/12992459972 ### Problem description - transpose was filling in non-logical areas with default pad value when called from reduce ### What's changed - pass in an appropriate pad value for transpose to use - also mark a method that should only be used by pool to be deprecated to deter other uses ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/12992465641 - [ ] Blackhole Post commit (if applicable) - [ ] Model regression CI testing passes (if applicable) - [ ] Device performance regression CI testing passes (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [x] New/Existing tests provide coverage for changes

…eric reduce (#16989) ### Ticket Link to Github Issues #16720 and #14898 ### Problem description - when specify a dim, output tensor of argmax is not one rank smaller than input - transpose seems to insert it's own padding, which occurs for the early tensor dimensions ### What's changed - for argmax, change shape of output tensor to have the right rank - for generic reduce, move pad filling to right before the reduce op is called - also update tests and add deprecated to another specialized reduce function ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/12956428377 - [x] Blackhole Post commit (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12939182511 - [x] Model regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12939185477/job/36091291360 in line with main https://github.com/tenstorrent/tt-metal/actions/runs/12937069874/job/36084729557 - [x] Device performance regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12939183976 - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [x] New/Existing tests provide coverage for changes

### Ticket Link to Github Issue #14898 Subset of previous PR (#16989) that caused a hang in (Single-card) Demo tests and got reverted. Verified that this pipeline passes for this subset of changes: https://github.com/tenstorrent/tt-metal/actions/runs/12992459972 ### Problem description - transpose was filling in non-logical areas with default pad value when called from reduce ### What's changed - pass in an appropriate pad value for transpose to use - also mark a method that should only be used by pool to be deprecated to deter other uses ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/12992465641 - [ ] Blackhole Post commit (if applicable) - [ ] Model regression CI testing passes (if applicable) - [ ] Device performance regression CI testing passes (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [x] New/Existing tests provide coverage for changes

bbradelTT added op_cat: reduces P1 labels Nov 8, 2024

bbradelTT self-assigned this Nov 8, 2024

bbradelTT mentioned this issue Nov 8, 2024

[Bug Report] ttnn.max - Unsupported dim #13186

Closed

bbradelTT mentioned this issue Dec 19, 2024

#12662: add keepdim fixes to reduce #16163

Merged

6 tasks

bbradelTT mentioned this issue Jan 22, 2025

#16720: and #14898 update output dims for argmax and move pad for generic reduce #16989

Merged

6 tasks

bbradelTT added a commit that referenced this issue Jan 24, 2025

#14898: add max test case

f008ef7

tt-rkim pushed a commit that referenced this issue Jan 26, 2025

Revert "#16720: and #14898 update output dims for argmax and move pad…

8d8c7c2

… for generic reduce (#16989)" This reverts commit 83145d2.

bbradelTT added a commit that referenced this issue Jan 27, 2025

#14898: pass in pad value to transpose in reduce

228eff4

bbradelTT mentioned this issue Jan 27, 2025

#14898: pass in pad value to transpose in reduce #17142

Merged

6 tasks

bbradelTT added a commit that referenced this issue Jan 27, 2025

#14898: make pool_sum deprecated to deter usage

34e9347

bbradelTT closed this as completed Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing test scenarios for ttnn.max #14898

Failing test scenarios for ttnn.max #14898

bbradelTT commented Nov 8, 2024 •

edited

Loading

bbradelTT commented Jan 22, 2025

Failing test scenarios for ttnn.max #14898

Failing test scenarios for ttnn.max #14898

Comments

bbradelTT commented Nov 8, 2024 • edited Loading

bbradelTT commented Jan 22, 2025

bbradelTT commented Nov 8, 2024 •

edited

Loading