Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: standard math operations on vector types fail to build with GNU g++ #74

Closed
Oblomov opened this issue Apr 12, 2024 · 8 comments
Closed

Comments

@Oblomov
Copy link

Oblomov commented Apr 12, 2024

Problem Description

The standard math operators don't work on vector types as defined by amd_hip_vector_types.h when using g++ as host compiler.

Operating System

Debian trixie/sid

CPU

AMD Ryzen 7 PRO 6850H with Radeon Graphics

GPU

AMD Instinct MI300X, AMD Instinct MI300A, AMD Instinct MI250X, AMD Instinct MI250, AMD Instinct MI210, AMD Instinct MI100, AMD Radeon Pro W7900, AMD Radeon Pro W6800, AMD Radeon Pro V620, AMD Radeon Pro VII, AMD Radeon RX 7900 XTX, AMD Radeon RX 7900 XT, AMD Radeon VII

ROCm Version

ROCm 6.0.0

ROCm Component

HIP

Steps to Reproduce

A simple test case reproducing the issue is:

#include <hip/hip_vector_types.h>
#include <iostream>

int main() {
	const float4 v1{1.0f, 2.0f, 3.0f, 4.0f};
	const float4 v2{4.0f, 3.0f, 2.0f, 1.0f};
	float4 mul = v1*v2;
	std::cout << mul.x << " " << mul.y << " " << mul.z << " " << mul.w << std::endl;
}

This builds correctly with hipcc or clang++, but not with g++, where the following error is shown instead:

g++ -Wall -Wextra  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-6.0.3/include -I/opt/rocm-6.0.3/lib/llvm/lib/clang/17.0.0     vecmul.cc  -lm -o vecmul
In file included from /opt/rocm-6.0.3/include/hip/hip_vector_types.h:33,
                 from vecmul.cc:1:
/opt/rocm-6.0.3/include/hip/amd_detail/amd_hip_vector_types.h: In instantiation of ‘HIP_vector_type<T, rank>& HIP_vector_type<T, rank>::operator*=(const HIP_vector_type<T, rank>&) [with T = float; unsigned int rank = 4]’:
/opt/rocm-6.0.3/include/hip/amd_detail/amd_hip_vector_types.h:551:39:   required from ‘constexpr HIP_vector_type<float, 4> operator*(HIP_vector_type<float, 4>, const HIP_vector_type<float, 4>&)’
vecmul.cc:7:18:   required from here
/opt/rocm-6.0.3/include/hip/amd_detail/amd_hip_vector_types.h:544:18: error: invalid operands of types ‘HIP_vector_base<float, 4>::Native_vec_’ {aka ‘float [4]’} and ‘const float*’ to binary ‘operator*’
  544 |             data *= x.data;
      |             ~~~~~^~~~~~~~~
/opt/rocm-6.0.3/include/hip/amd_detail/amd_hip_vector_types.h:544:18: note:   in evaluation of ‘operator*=(using HIP_vector_base<float, 4>::Native_vec_ = float [4] {aka float [4]}, const float*)’
make: *** [Makefile:11: vecmul] Error 1

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

This is most likely due to the way __NATIVE_VECTOR__ gets defined, which in this case becomes

        #define __NATIVE_VECTOR__(n, T) T[n]

The operators need to be redefined to something like

        __HOST_DEVICE__
        HIP_vector_type& operator*=(const HIP_vector_type& x) noexcept
        {
            #if __has_attribute(ext_vector_type)
                data *= x.data;
            #else
                for (auto i = 0u; i < rank; ++i)
                    data[i] *= x.data[i];
            #endif
            return *this;
        }

and similarly for all other operators.

Some notes:

  • this issue has been present since HIP version 4.2 at least and is still present in 6.0.3

  • I had previously submitted a patch to fix this and a number of other issues to the old ROCm/hipamd repository.

  • this is independent from the specific GPU, it's a header-level issue.

@cjatin
Copy link
Contributor

cjatin commented Apr 17, 2024

can you share you g++ --version and g++ -dumpmachine

@Oblomov
Copy link
Author

Oblomov commented Apr 19, 2024

Version:

g++ (Debian 13.2.0-23) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Machine:

x86_64-linux-gnu

This supports __has_attribute (IIRC this was first introduced in GCC 5), but does not support ext_vector_type_attribute, as easily verified by

printf '#if defined(__has_attribute)\n#pragma message("has __has_attribute")\n#if __has_attribute(ext_vector_type)\n#pragma message("has ext_vector_type attribute")\n#else\n#pragma message("NO ext_vector_type_attribute")\n#endif\n#endif\n' | g++ -x c++ -c - -o /dev/null

A possible alternative to the #ifdef I mentioned, as a solution, would be to use something like __attribute__((vector_size (sizeof(T)*N)))
for GCC.

@ppanchad-amd
Copy link

Hi @Oblomov, internal ticket has been created to investigate your issue. Thanks!

@tcgu-amd
Copy link

Hi @Oblomov thank you for reaching out! Is there a specific use case you have in mind that would require g++ instead of clang? The solution you provided would work, but it would treat the vectors as simple vectors instead of SIMD vectors, which can result in significant performance gaps. Hence, it is hard for us to justify supporting g++ at all.

@Oblomov
Copy link
Author

Oblomov commented Sep 25, 2024

Hello @tcgu-amd, thanks for getting back to me on this issue.

I first came across the issue while adding support for AMD HIP to GPUSPH (you can find some of the preliminary work in that direction in the spheric2022 branch; there has been a lot of work done since, but the branch hasn't been published yet; I can give you access to the private repository where the development is ongoing if necessary; at the moment, we're forced to ship a patched version of the vector math header because of these and other issues).

GPUSH is a very complex project with several optional dependencies that aren't always well-supported by device compilers, so the host and device side of the software are normally built with separate compilers (the default host compiler, which is usually g++, is normally used for the host part, and nvcc or hipcc for the device part, although both these options can be overridden).

Some vector types are used on host too (even if not for computationally intensive operations), so the header needs to compile successfully regardless of compiler usage. I would argue that this should be the case regardless of my specific use case, and even if this leads to suboptimal performance on host. I would like to stress that I'm referring here specifically to the host side, since the device side would still be compiled with clang-based hipcc and thus still use the ext_vector_type definition.

That being said, I have just run some simple tests and gcc does autovectorize the loops, thereby treating the vector types as SIMD types. This is probably due to the extrema being constexpr on an array of correctly-aligned data.

A trivial example is this:

#include <hip/hip_vector_types.h>
#include <iostream>

__attribute__((noinline))
float4 doit(float4 const& v1, float4 const& v2)
{
	return v1*v2;
}

int main(int argc, char* argv[]) {
	const float c1 = argc > 1 ? atof(argv[1]) : 1.0f;
	const float c2 = argc > 2 ? atof(argv[2]) : 2.0f;
	const float c3 = argc > 3 ? atof(argv[3]) : 3.0f;
	const float c4 = argc > 4 ? atof(argv[4]) : 4.0f;
	const float4 v1{c1, c2, c3, c4};
	const float4 v2{c4, c3, c2, c1};
	float4 mul = doit(v1,v2);
	std::cout << mul.x << " " << mul.y << " " << mul.z << " " << mul.w << std::endl;
}

with the header patched as suggested in the issue description. Looking at the disassembly of the doit function, it is something like:

00000000000013e0 <_Z4doitRK15HIP_vector_typeIfLj4EES2_>:
    13e0:       66 0f 6f 07             movdqa (%rdi),%xmm0
    13e4:       0f 28 16                movaps (%rsi),%xmm2
    13e7:       0f 59 d0                mulps  %xmm0,%xmm2
    13ea:       0f 29 54 24 e8          movaps %xmm2,-0x18(%rsp)
    13ef:       f3 0f 7e 4c 24 f0       movq   -0x10(%rsp),%xmm1
    13f5:       f3 0f 7e 44 24 e8       movq   -0x18(%rsp),%xmm0
    13fb:       c3                      ret
    13fc:       0f 1f 40 00             nopl   0x0(%rax)

@tcgu-amd
Copy link

@Oblomov Thanks for adding the context! I was not aware that gcc autovectorizes loops; this is neat. I will create an internal ticket to investigate adding this patch to the source code.

@tcgu-amd
Copy link

@Oblomov Good news! Seems like the functionality you suggested is already implemented! It was part of this commit d4799b2 and has been present since rocm 6.1.x https://github.com/ROCm/clr/blob/rocm-6.1.x/hipamd/include/hip/amd_detail/amd_hip_vector_types.h#L553. I just ran your reproducer on 6.2 and it was working. Would you mind trying on 6.2 (or 6.1) and let us know the result? Thanks!

@Oblomov
Copy link
Author

Oblomov commented Sep 25, 2024

Hello @tcgu-amd thank you very much, I have tested 6.1.0 and 6.2.1 and the issue seems to be fixed indeed. Thank you very much, I'm closing the issue since I consider it solved for me.

@Oblomov Oblomov closed this as completed Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants