[Runtime] Change default alignment to 64 bytes. #12564

jwfromm · 2022-08-23T21:01:58Z

One change made in #5252 (which added support for Hexagon to the runtime) was increasing the byte alignment from 64 to 128. This can cause problems when interacting with dlpack. For example tests/python/contrib/test_dlpack.py has a high chance of failing when run locally due to torch returning tensors with 64 rather than 128 byte alignment. I'm not sure why it doesnt fail in CI, perhaps the consistency of the environment always returns an appropriately aligned tensor.

Changing the default alignment allows interoperability with both torch and newer versions of numpy that support dlpack. I've slightly modified the torch test to run multiple times to make sure its behavior is consistent.

@kparzysz-quic do you think there are any drawbacks to decreasing the bit alignment requirement?

cc @areusch

jwfromm · 2022-08-23T21:02:34Z

@tqchen might be worth having you take a look to see if there are any potential problems with this change.

junrushao · 2022-08-23T23:26:42Z

python/tvm/tir/schedule/schedule.py

-                A = T.match_buffer(a, (16, 16), align=128, offset_factor=1)
-                B = T.match_buffer(b, (16, 16), align=128, offset_factor=1)
-                C = T.match_buffer(c, (16, 16), align=128, offset_factor=1)
+                A = T.match_buffer(a, (16, 16), align=64, offset_factor=1)
+                B = T.match_buffer(b, (16, 16), align=64, offset_factor=1)
+                C = T.match_buffer(c, (16, 16), align=64, offset_factor=1)


mma does use align=128, so let's revert this change here :-)

Ah make sense, reverted. Is there anywhere else we'd want 128 byte alignment? I just did a global search and replace.

not super familiar with non-cuda cases. probably @Hzfengsy @spectrometerHBH could help!

Not sure about other backends. But it's fine if we can pass the tests

masahi · 2022-08-24T03:45:41Z

python/tvm/tir/tensor_intrin/cuda.py

            offset_factor=16,
            scope="wmma.matrix_b",
        )
        C = T.match_buffer(
-            c, (m_dim, n_dim), out_dtype, align=128, offset_factor=16, scope="wmma.accumulator"
+            c, (m_dim, n_dim), out_dtype, align=64, offset_factor=16, scope="wmma.accumulator"


I think all CUDA code want 128 bit alignment in practice. cc @vinx13

Yeah I'll update this spot as well.

u99127 · 2022-08-24T12:40:44Z

One change made in #5252 (which added support for Hexagon to the runtime) was increasing the byte alignment from 64 to 128. This can cause problems when interacting with dlpack. For example tests/python/contrib/test_dlpack.py has a high chance of failing when run locally due to torch returning tensors with 64 rather than 128 byte alignment. I'm not sure why it doesnt fail in CI, perhaps the consistency

Nit: Bits and bytes conflict in commit message vs title. Is it 128byte alignment or 128bit alignment 😄 ...

kparzysz-quic · 2022-08-24T14:15:11Z

Hexagon HVX vectors should be aligned at 128-byte boundary. If not, we'd have to use unaligned loads/stores. I think we have some code that relies on the current alignment.

This should really be a target option, maybe it's time to refactor this code...

jwfromm · 2022-08-24T15:18:54Z

@kparzysz-quic Fortunately Hexagon does use its own target specific alignment kHexagonAllocAlignment, so it would not be affected by this change. For other hardware targets like cuda, we could do something similar where we define k{TARGET}AllocAlignment in src/runtime/{TARGET}/{TARGET}_common.h and use kAllocAlignment as a fallback default.

kparzysz-quic · 2022-08-24T15:44:24Z

Fortunately Hexagon does use its own target specific alignment kHexagonAllocAlignment, so it would not be affected by this change.

I guess it's ok then.

tqchen · 2022-08-24T15:55:10Z

There are two parts of the perspective that comes into the story:

P0: The alignment of the allocator
P1: The alignment assumption of the compiler

P1 affects how compiler generates code, e.g. if we assume that the code is aligned to 128bytes, then compiler can generate aligned store/load to make efficient use of the code. P0 affects the array we allocate, we always need to allocate memory that is larger than P1 assumption made by the compiler.

Right now the two are roughly tied together, with the exception of kHexagonAllocAlignment, which enforces P0.

The PyTorch issue was caused by P1 -- we are making assumption of 128 byte alignment while pytorch arrays are commonly aligned to 64 bytes.

Currently, P1 is supplied by the buffer construct, which do not (yet) depend on the target. Agree with @kparzysz-quic that minimum alignment requirement can be part of the target, and we need to find good ways to set them onto the buffer.

As a short workaround, we can detect P1 settings(of buffers) in early stage of TIR compilation and amend them for devices like HVX, or provide such assumptions in HVX build. Note that our runtime already satisfies these assumptions(P0) as per kHexagonAllocAlignment, we just need to make sure compiler can take benefit of them

u99127 · 2022-08-24T16:01:14Z

One change made in #5252 (which added support for Hexagon to the runtime) was increasing the byte alignment from 64 to 128. This can cause problems when interacting with dlpack. For example tests/python/contrib/test_dlpack.py has a high chance of failing when run locally due to torch returning tensors with 64 rather than 128 byte alignment. I'm not sure why it doesnt fail in CI, perhaps the consistency of the environment always returns an appropriately aligned tensor.

Changing the default alignment allows interoperability with both torch and newer versions of numpy that support dlpack. I've slightly modified the torch test to run multiple times to make sure its behavior is consistent.

@kparzysz-quic do you think there are any drawbacks to decreasing the bit alignment requirement?

cc @areusch

CC @ekalda, @lhutton1 - What's the impact of changing the alignment from 128 to 64 in the context of Ethos-U ? Is this likely to have a meaningful performance or correctness issue in this case ?

Ramana

jwfromm · 2022-08-24T16:15:21Z

Thanks for that context @tqchen. Would you like to see any of your suggested improvements in this PR or do you think this is an appropriate first step?

ekalda · 2022-08-24T16:54:55Z

I don't think this will cause problems for Ethos-U since its accesses are 16 byte aligned (but let's see if the tests pass). Regarding to why this failure is not visible in CI, could it be related to this issue? #12529

tqchen · 2022-08-24T17:08:16Z

The previous comment #3163 (comment) would make it work for temp data allocations(change

tvm/src/tir/transforms/ir_utils.h

Line 166 in 8146a9b

inline int GetTempAllocaAlignment(DataType type, int32_t const_size) {

to depend on Target and return different value in different setting).

For buffer declarations, the main issue is that Buffer creation is not target dependent as of now, as a result, https://github.com/apache/tvm/blob/main/src/tir/ir/buffer.cc#L570 defaults to the global value.

We can introduce a minimum_data_alignment attr to target, and modify buffer->data_alignment when the user run things through build.

Note all of these only affect the default behavior. We can always explicitly declare the buffer with the correct alignment assumption being made for a target.

Josh Fromm added 2 commits August 23, 2022 13:55

Change default alignment to 64 bits.

80df13b

Run dlpack test a few times.

83e063f

jwfromm requested a review from kparzysz-quic August 23, 2022 21:01

github-actions bot requested a review from areusch August 23, 2022 21:06

Update alignment in tests.

37233ad

junrushao reviewed Aug 23, 2022

View reviewed changes

Josh Fromm added 2 commits August 23, 2022 16:32

Revert mma alignment change.

3226430

Change default printing of buffer.

db83034

masahi reviewed Aug 24, 2022

View reviewed changes

jwfromm changed the title ~~[Runtime] Change default alignment to 64 bits.~~ [Runtime] Change default alignment to 64 bytes. Aug 24, 2022

Fix cuda alignment.

73fbb7f

Revert crt change.

f9db209

jwfromm closed this Aug 24, 2022

jwfromm mentioned this pull request Aug 24, 2022

[Runtime] Change default alignment to 64 bytes. #12586

Merged

Johnson9009 mentioned this pull request Mar 20, 2024

[Bug] NDArray::IsAligned Can't Work for OpenCL, Vulkan Device API, should it is removed? #16753

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Runtime] Change default alignment to 64 bytes. #12564

[Runtime] Change default alignment to 64 bytes. #12564

jwfromm commented Aug 23, 2022 •

edited

Loading

jwfromm commented Aug 23, 2022

junrushao Aug 23, 2022

jwfromm Aug 23, 2022

junrushao Aug 24, 2022

Hzfengsy Aug 24, 2022

masahi Aug 24, 2022 •

edited

Loading

jwfromm Aug 24, 2022

u99127 commented Aug 24, 2022

kparzysz-quic commented Aug 24, 2022

jwfromm commented Aug 24, 2022

kparzysz-quic commented Aug 24, 2022

tqchen commented Aug 24, 2022 •

edited

Loading

u99127 commented Aug 24, 2022

jwfromm commented Aug 24, 2022

ekalda commented Aug 24, 2022

tqchen commented Aug 24, 2022 •

edited

Loading

[Runtime] Change default alignment to 64 bytes. #12564

[Runtime] Change default alignment to 64 bytes. #12564

Conversation

jwfromm commented Aug 23, 2022 • edited Loading

jwfromm commented Aug 23, 2022

junrushao Aug 23, 2022

Choose a reason for hiding this comment

jwfromm Aug 23, 2022

Choose a reason for hiding this comment

junrushao Aug 24, 2022

Choose a reason for hiding this comment

Hzfengsy Aug 24, 2022

Choose a reason for hiding this comment

masahi Aug 24, 2022 • edited Loading

Choose a reason for hiding this comment

jwfromm Aug 24, 2022

Choose a reason for hiding this comment

u99127 commented Aug 24, 2022

kparzysz-quic commented Aug 24, 2022

jwfromm commented Aug 24, 2022

kparzysz-quic commented Aug 24, 2022

tqchen commented Aug 24, 2022 • edited Loading

u99127 commented Aug 24, 2022

jwfromm commented Aug 24, 2022

ekalda commented Aug 24, 2022

tqchen commented Aug 24, 2022 • edited Loading

jwfromm commented Aug 23, 2022 •

edited

Loading

masahi Aug 24, 2022 •

edited

Loading

tqchen commented Aug 24, 2022 •

edited

Loading

tqchen commented Aug 24, 2022 •

edited

Loading