Remove padding along non height/width dims for transpose #14308

TT-BrianLiu · 2024-10-25T21:12:56Z

Need to update transpose hc (and other variants that break apart tiles) to natively output non-padded new batch dims.

bbradelTT · 2024-11-07T17:06:09Z

@ntarafdar
This will block a lot of reduce work. Could you please triage this issue and provide an estimate for when it could be completed.

sjameelTT · 2024-11-07T17:34:02Z

Is reduce reliant on transpose atm? I am starting to look at this issue today.

Edit: as well, is the expectation that the new padding is always all 0s?

bbradelTT · 2024-11-07T18:35:07Z

@sjameelTT yes, reduce relies on transpose. There are calls to ttnn::transpose in ttnn/cpp/ttnn/operations/reduction/generic/generic_reductions.cpp

sjameelTT · 2024-11-07T19:06:33Z

I see, are dimensions currently mismatching or is pcc bad in reduce? I'm curious because the user of ttnn::transpose on main right now should not see any unnecessary padding at the moment. Unless there's a bug, they should see the correct shape on the output, and shouldn't need to do any processing on the input.

input shape for transpose_hc:
[N, C, H[H_padded], W[W_padded]]
output:
[N, H, C[C_padded], W[W_padded]]

If this isn't true then that's a bug.

bbradelTT · 2024-11-07T21:39:08Z

I'm not sure of the exact problem.

I added logging:

--- a/ttnn/cpp/ttnn/operations/data_movement/transpose/transpose.cpp
+++ b/ttnn/cpp/ttnn/operations/data_movement/transpose/transpose.cpp
@@ -171,6 +171,7 @@ ttnn::Tensor ExecuteTranspose::invoke(
     const int64_t& dim1,
     const int64_t& dim2,
     const std::optional<MemoryConfig>& memory_config_arg) {
+    tt::log_debug(tt::LogOp, "tranpose it {} dim1 {} dim2 {}", input_tensor, dim1, dim2);

and the input parameters that are giving me errors are:

tranpose it Tensor(storage=DeviceStorage(memory_config=MemoryConfig(memory_layout=TensorMemoryLayout::INTERLEAVED,buffer_type=BufferType::DRAM,shard_spec=std::nullopt)),shape=ttnn.Shape([1[32], 128, 10[32], 32]),dtype=DataType::BFLOAT16,layout=Layout::TILE,tile=Tile(tile_shape={32, 32},face_shape={16, 16},num_faces=4)) dim1 0 dim2 1

It looks like transpose calls reshape

                     Op | DEBUG    | Started   C++ ttnn operation: ttnn::reshape
                     Op | DEBUG    | it Tensor(storage=DeviceStorage(memory_config=MemoryConfig(memory_layout=TensorMemoryLayout::INTERLEAVED,buffer_type=BufferType::DRAM,shard_spec=std::nullopt)),shape=ttnn.Shape([128, 1[32], 10[32], 32]),dtype=DataType::BFLOAT16,layout=Layout::TILE,tile=Tile(tile_shape={32, 32},face_shape={16, 16},num_faces=4)) ns ttnn.Shape([128, 1, 10[32], 32])

The error and stack are:

                 Always | FATAL    | 4194304 != 131072
Traceback (most recent call last):
  File "/home/bbradel/reduce.py", line 18, in <module>
    y4=ttnn.sum(x1,0)
  File "/localdev/bbradel/tt-metal/ttnn/ttnn/decorators.py", line 329, in __call__
    return self.function(*function_args, **function_kwargs)
RuntimeError: TT_ASSERT @ ../ttnn/cpp/ttnn/tensor/tensor_ops.cpp:345: input_tensor.volume() == new_padded_shape.volume()
info:
4194304 != 131072
backtrace:
 --- /localdev/bbradel/tt-metal/ttnn/ttnn/_ttnn.so(+0x685e5b) [0x7fed91644e5b]
 --- void tt::assert::tt_assert<unsigned int, unsigned long>(char const*, int, char const*, bool, char const*, fmt::v11::basic_format_string<char, fmt::v11::type_identity<unsigned int const&>::type, fmt::v11::type_identity<unsigned long const&>::type>, unsigned int const&, unsigned long const&)
 --- tt::tt_metal::tensor_ops::tensor_reshape(tt::tt_metal::Tensor const&, ttnn::types::Shape const&)
 --- tt::tt_metal::Tensor::reshape(ttnn::types::Shape const&) const
 --- ttnn::operations::data_movement::ReshapeViewOperation::invoke(tt::tt_metal::Tensor const&, ttnn::types::Shape const&)
 --- /localdev/bbradel/tt-metal/ttnn/ttnn/_ttnn.so(+0x80940c) [0x7fed917c840c]
 --- /localdev/bbradel/tt-metal/ttnn/ttnn/_ttnn.so(+0x808ed0) [0x7fed917c7ed0]
 --- /localdev/bbradel/tt-metal/ttnn/ttnn/_ttnn.so(+0x8054b2) [0x7fed917c44b2]
 --- ttnn::operations::data_movement::ExecuteTranspose::invoke(unsigned char, tt::tt_metal::Tensor const&, long const&, long const&, std::__1::optional<tt::tt_metal::MemoryConfig> const&)
 --- ttnn::operations::data_movement::ExecuteTranspose::invoke(tt::tt_metal::Tensor const&, long const&, long const&, std::__1::optional<tt::tt_metal::MemoryConfig> const&)
 --- /localdev/bbradel/tt-metal/ttnn/ttnn/_ttnn.so(+0x93a3fe) [0x7fed918f93fe]
 --- /localdev/bbradel/tt-metal/ttnn/ttnn/_ttnn.so(+0x93a360) [0x7fed918f9360]
 --- /localdev/bbradel/tt-metal/ttnn/ttnn/_ttnn.so(+0x93a257) [0x7fed918f9257]
 --- /localdev/bbradel/tt-metal/ttnn/ttnn/_ttnn.so(+0x93a060) [0x7fed918f9060]
 --- /localdev/bbradel/tt-metal/ttnn/ttnn/_ttnn.so(+0x939fd7) [0x7fed918f8fd7]
 --- /localdev/bbradel/tt-metal/ttnn/ttnn/_ttnn.so(+0x90dd31) [0x7fed918ccd31]
 --- ttnn::operations::data_movement::detail::permute_impl(tt::tt_metal::Tensor const&, tt::tt_metal::SmallVector<unsigned int, 8ul> const&, tt::tt_metal::MemoryConfig const&)
...

sjameelTT · 2024-11-08T14:22:45Z

I think that's because the input N is padded, which we don't expect in transpose. That should be fixed independent of this change, though I'm curious about the use-case for padded N.

bbradelTT · 2024-11-12T17:44:16Z

Is there an issue where that would be fixed?

It's for a reduce call. I'm hoping most of the padding can go away with the new tensor layout work.

sjameelTT · 2024-11-12T19:26:48Z

do you have a python test I could use to test this out? I am curious as to how we even get to a tensor with padded N. What is the op call that creates a padded N value?

bbradelTT · 2024-11-12T19:45:14Z

do you have a python test I could use to test this out? I am curious as to how we even get to a tensor with padded N. What is the op call that creates a padded N value?

I don't. I was trying to fix reduce with the inputs provided in #13361 (comment) but there were too many issues and I postponed the effort until the new tensor layout is far enough along.

…ding value parameter (#15224) ### Ticket #14308 #14790 #11650 ### Problem description - Transpose along height and channel for tiled layout is currently padding unaware, meaning that we need to pad channels to a multiple of TILE_HEIGHT, transpose, then slice off any excess on the new channels/height - This results in excess copies along both channel and height that can be optimized out - As well, the choice of padding depends on the subsequent ops, some ops can skip padding with specific values to save on performance, and others may want other values padded (-inf, +inf, 1) - Blackhole transpose HC on tiled requires a scratch pad to be able to deal with the 64B alignment requirement for noc_async_reads, which don't align with the size of each face line for normal tiles (32B). - Tranpose is also dependent on reshape to be able to account for what is the padding and what is actual data after padding and then transposing - BFLOAT8 transpose is unnecessary typecasted when it's supported for both transpose CN and transpose WH ### What's changed - Add a new multicore transpose hc tiled interleaved kernel that is padding aware. This skips extra copies on the padded H values and generates new padding for the channel dimension once it becomes tile height. - The new height is padded by default to 0, but you can use the new pad_value parameter to set it to other values. If you explicitly set it to None, it will not set a specific value and just copy the real data into the buffer. - Shift tile creation workload to writer kernel to avoid the need for a scratch pad on Blackhole ((ake each tile and write out each value to its new position, rather than read in a bunch of values to create each new tile that's written out) - Remove direct reshape call in transpose - Re-enable BFLOAT8 transpose WH/CN - Add some unit tests for community test cases (these are old issues that got solved unintentionally a few weeks ago) - Add tricky unit tests - Make the same changes to permute ### TODO - In theory these should work with tiny tiles due to the way I've written it, but I haven't tested that yet. ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/11918161415 - [x] Blackhole Post commit (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/11916983288/job/33211841677 (failure matches main) - [x] Model regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/11902351038/job/33207292238 - [x] Device performance regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/11918150453/job/33215940950 (matches failure on main) - [x] New/Existing tests provide coverage for changes

ntarafdar · 2024-11-21T16:10:08Z

@sjameelTT is this completed with your merge?

bbradelTT assigned ntarafdar Nov 7, 2024

bbradelTT added P1 op_cat: TM labels Nov 7, 2024

ntarafdar assigned sjameelTT and unassigned ntarafdar Nov 7, 2024

sjameelTT added a commit that referenced this issue Nov 12, 2024

#14308: add native h-c padding handling to transpose_hc kernel

23e05c0

sjameelTT added a commit that referenced this issue Nov 18, 2024

#14308: add native h-c padding handling to transpose_hc kernel

0152000

sjameelTT added a commit that referenced this issue Nov 19, 2024

#14308: add native h-c padding handling to transpose_hc kernel

d4ee500

sjameelTT mentioned this issue Nov 19, 2024

Add a padding-aware, interleaved, tiled transpose HC with a fused padding value parameter #15224

Merged

5 tasks

sjameelTT added a commit that referenced this issue Nov 20, 2024

#14308: add native h-c padding handling to transpose_hc kernel

0fe9999

ntarafdar closed this as completed Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove padding along non height/width dims for transpose #14308

Remove padding along non height/width dims for transpose #14308

TT-BrianLiu commented Oct 25, 2024

bbradelTT commented Nov 7, 2024

sjameelTT commented Nov 7, 2024 •

edited

Loading

bbradelTT commented Nov 7, 2024

sjameelTT commented Nov 7, 2024

bbradelTT commented Nov 7, 2024

sjameelTT commented Nov 8, 2024

bbradelTT commented Nov 12, 2024

sjameelTT commented Nov 12, 2024

bbradelTT commented Nov 12, 2024

ntarafdar commented Nov 21, 2024

Remove padding along non height/width dims for transpose #14308

Remove padding along non height/width dims for transpose #14308

Comments

TT-BrianLiu commented Oct 25, 2024

bbradelTT commented Nov 7, 2024

sjameelTT commented Nov 7, 2024 • edited Loading

bbradelTT commented Nov 7, 2024

sjameelTT commented Nov 7, 2024

bbradelTT commented Nov 7, 2024

sjameelTT commented Nov 8, 2024

bbradelTT commented Nov 12, 2024

sjameelTT commented Nov 12, 2024

bbradelTT commented Nov 12, 2024

ntarafdar commented Nov 21, 2024

sjameelTT commented Nov 7, 2024 •

edited

Loading