-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove padding along non height/width dims for transpose #14308
Comments
@ntarafdar |
Is reduce reliant on transpose atm? I am starting to look at this issue today. Edit: as well, is the expectation that the new padding is always all 0s? |
@sjameelTT yes, reduce relies on transpose. There are calls to |
I see, are dimensions currently mismatching or is pcc bad in reduce? I'm curious because the user of input shape for transpose_hc: If this isn't true then that's a bug. |
I'm not sure of the exact problem. I added logging:
and the input parameters that are giving me errors are:
It looks like transpose calls reshape
The error and stack are:
|
I think that's because the input N is padded, which we don't expect in transpose. That should be fixed independent of this change, though I'm curious about the use-case for padded N. |
Is there an issue where that would be fixed? It's for a reduce call. I'm hoping most of the padding can go away with the new tensor layout work. |
do you have a python test I could use to test this out? I am curious as to how we even get to a tensor with padded N. What is the op call that creates a padded N value? |
I don't. I was trying to fix reduce with the inputs provided in #13361 (comment) but there were too many issues and I postponed the effort until the new tensor layout is far enough along. |
…ding value parameter (#15224) ### Ticket #14308 #14790 #11650 ### Problem description - Transpose along height and channel for tiled layout is currently padding unaware, meaning that we need to pad channels to a multiple of TILE_HEIGHT, transpose, then slice off any excess on the new channels/height - This results in excess copies along both channel and height that can be optimized out - As well, the choice of padding depends on the subsequent ops, some ops can skip padding with specific values to save on performance, and others may want other values padded (-inf, +inf, 1) - Blackhole transpose HC on tiled requires a scratch pad to be able to deal with the 64B alignment requirement for noc_async_reads, which don't align with the size of each face line for normal tiles (32B). - Tranpose is also dependent on reshape to be able to account for what is the padding and what is actual data after padding and then transposing - BFLOAT8 transpose is unnecessary typecasted when it's supported for both transpose CN and transpose WH ### What's changed - Add a new multicore transpose hc tiled interleaved kernel that is padding aware. This skips extra copies on the padded H values and generates new padding for the channel dimension once it becomes tile height. - The new height is padded by default to 0, but you can use the new pad_value parameter to set it to other values. If you explicitly set it to None, it will not set a specific value and just copy the real data into the buffer. - Shift tile creation workload to writer kernel to avoid the need for a scratch pad on Blackhole ((ake each tile and write out each value to its new position, rather than read in a bunch of values to create each new tile that's written out) - Remove direct reshape call in transpose - Re-enable BFLOAT8 transpose WH/CN - Add some unit tests for community test cases (these are old issues that got solved unintentionally a few weeks ago) - Add tricky unit tests - Make the same changes to permute ### TODO - In theory these should work with tiny tiles due to the way I've written it, but I haven't tested that yet. ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/11918161415 - [x] Blackhole Post commit (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/11916983288/job/33211841677 (failure matches main) - [x] Model regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/11902351038/job/33207292238 - [x] Device performance regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/11918150453/job/33215940950 (matches failure on main) - [x] New/Existing tests provide coverage for changes
@sjameelTT is this completed with your merge? |
Need to update transpose hc (and other variants that break apart tiles) to natively output non-padded new batch dims.
The text was updated successfully, but these errors were encountered: