Skip to content
/ hipcl Public
forked from cpc/hipcl

OpenCL/SPIR-V implementation of HIP

License

Notifications You must be signed in to change notification settings

Kerilk/hipcl

This branch is 11 commits ahead of, 1 commit behind cpc/hipcl:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

dd39656 · Sep 21, 2021
Feb 9, 2021
Feb 8, 2021
Jan 17, 2020
Apr 15, 2020
Sep 21, 2021
Sep 21, 2021
Jun 29, 2021
Sep 21, 2021
Jul 26, 2019
Jun 29, 2021
Jul 26, 2019
Apr 15, 2020
Feb 9, 2021

Repository files navigation

HIPCL library


What is HIP?

Heterogeneous-compute Interface for Portability, or HIP, is a C++ runtime API and kernel language that allows developers to write code that runs on both AMD and NVidia GPUs. CUDA applications can be converted to HIP in a largely automated fashion.

What is HIPCL

HIPCL is a library that allows applications using the HIP API to be run on devices which support OpenCL and SPIR-V, thus providing a portability path from CUDA to OpenCL. HIPCL development is led by Customized Parallel Computing group of Tampere University, Finland.

Building HIPCL


There are a few extra install/usage options documented in 'doc' directory.

HIPCL has some prerequisites to build:

  • LLVM + patched Clang
  • LLVM-SPIRV translator tool from Khronos
  • An OpenCL implementation with (at least partial) 2.x support; HIPCL requires Shared Virtual Memory and clCreateProgramWithIL() support

Clang + LLVM

You'll need to build a patched Clang that can compile HIP source code to ELF+SPIR-V fat binaries.

Download LLVM + Clang:

git clone https://github.com/llvm-mirror/llvm.git
cd llvm
git checkout -b release_80 origin/release_80
cd tools
git clone https://github.com/cpc/hipcl-clang.git clang
cd clang
git checkout -b release_80 origin/release_80

Build+install LLVM/Clang:

cmake -DCMAKE_INSTALL_PREFIX=<llvm_install_dir> [other cmake flags] llvm-git-directory
make
sudo make install

LLVM-SPIRV Translator

download, build+install the LLVM-SPIRV translator:

git clone https://github.com/KhronosGroup/SPIRV-LLVM-Translator.git
cd SPIRV-LLVM-Translator
git checkout -b release_80 origin/llvm_release_80
mkdir build; cd build
cmake -DLLVM_DIR=<llvm_install_dir>/lib/cmake/llvm ..
make llvm-spirv
sudo cp tools/llvm-spirv/llvm-spirv <llvm_install_dir>/bin/

Known supported OpenCL implementations

At least Intel's "NEO" OpenCL implementation supports 2.x and SPIR-V on Intel GPUs.

It's also possible to use a sufficiently recent (2019/07+) POCL, but it must be built with LLVM-SPIRV support:

git clone https://github.com/pocl/pocl.git
cd pocl
mkdir build; cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr \
      -DWITH_LLVM_CONFIG=<llvm_install_dir>/bin/llvm-config \
      -DLLVM_SPIRV=<llvm_install_dir>/bin/llvm-spirv \
      ..
make
sudo make install

The last step (sudo make install) is optional - it's possible to use Pocl from build directory (by exporting some env variables: POCL_BULDING=1 and OCL_ICD_VENDORS=<pocl-build-dir>/ocl-vendors). Note that -DCMAKE_INSTALL_PREFIX=/usr implies system-wide installation. See https://github.com/pocl/pocl/blob/master/doc/sphinx/source/install.rst for details.

Whatever you end up using, make sure that clinfo lists your chosen OpenCL implementation.

Build HIPCL library

build+install the HIPCL library:

git clone https://github.com/cpc/hipcl.git
cd hipcl
mkdir build ; cd build;
cmake -DCMAKE_INSTALL_PREFIX=<hipcl_install_dir> \
      -DCMAKE_CXX_COMPILER=<llvm_install_dir>/bin/clang++ \
      -DCMAKE_C_COMPILER=<llvm_install_dir>/bin/clang \
      ..
make

CMAKE_INSTALL_PREFIX defaults to /opt/hipcl. The samples directory contains some examples; these can be run from build directory, individually or via ctest.

make install will create <hipcl_install_dir>/{lib/libhipcl.so, share/kernellib.bc, include/hip} and copy the examples to <hipcl_install_dir>/bin/samples directory.

Note that CMake removes RPATH at make install time, which means that the samples installed into <hipcl_install_dir>/bin will look for libhipcl.so in the default system library paths (/usr/lib and such).

Using HIPCL library

HIPCL provides a CMake export target named hip::hipcl. Using it from CMake is therefore straightforward:

find_package(HIP REQUIRED CONFIG PATHS "${HIPCL_INSTALL_PREFIX}")
target_link_libraries(your-executable hip::hipcl)

This will automatically add all required flags. Note that you must compile your project with CMAKE_CXX_COMPILER set to the Clang built in the first step.

For using outside CMake, there is a ${HIPCL_INSTALL_PREFIX}/bin/hipcl_config binary which prints the required flags. Manually you can build using this command:

<llvm_install_dir>/bin/clang++ -pthread -fPIE -O2 -g -std=c++11 `hipcl_config -C` -o binary source.cc -Wl,-rpath,<hipcl_install_prefix>/lib -L<hipcl_install_prefix>/lib -lhipcl

To see what compilation commands are actually run, and get the intermediate files (including the SPIR-V), add -v --save-temps to the compilation flags. Intermediate files will be saved into the current working directory. The SPIR-V that ends up embedded in the ELF binary is in a file named "a.out-hip-spir64-unknown-unknown-sm_20".

CUDA conversion example

To convert a CUDA source to HIP source, use the hipify-clang tool from AMD's HIP repository: https://github.com/ROCm-Developer-Tools/HIP/tree/master/hipify-clang

Usage:

hipify-clang [hipify args] -- [clang cuda args]

E.g.

./hipify-clang -inplace -print-stats example.cu -- -x cuda --cuda-path=/usr/local/cuda-8.0 -I /usr/local/cuda-8.0/samples/common/inc

This should produce a source with CUDA API translated to HIP API calls. To build a HIPCL executable from this source, see above Using HIPCL library.

Frequently encountered issues


  • clEnqueueSVMMemCopy() failed with error -5 - this appears to be a driver bug on Intel GPUs, occurs when one tries to memcpy from read-only data stored in ELF to SVM memory. SVMMemCopy from other sources (stack / heap) works without issues.

  • programs may take a long time to start. This is because there Clang inserts startup hooks which register SPIR-V binaries; HIPCL at this point compiles each, and for each program built, creates all kernels. This can take a long time on some implementations.

  • HIPCL reports the global memory size from OpenCL as available memory, but unlike CUDA, it's not possible to allocate all of that memory in a single block; HIPCL is limited by CL_DEVICE_MAX_MEM_ALLOC_SIZE.

Known HIPCL-Clang issues


  • Using HIP_DYNAMIC_SHARED() macro outside a function scope is not yet supported. Doing so will likely result in error: Assertion FuncSet.size() <= 1 && "more than one function uses dynamic mem variable!"' failed.

  • There are unfortunately still some unresolved compiler bugs present in the HIPCL patches to Clang, so compilation may fail, especially when HIPCL is compiled with -O0 flag.

Known libhipcl issues


Some of these are simply not yet implemented, some are missing because they would require an OpenCL extension.

Device Side / Math Library

OpenCL Extension required:

  • __fsqrt_rd and various intrinsics for add/sub/div/mul with predefined rounding mode (currently these are mapped to OpenCL variants with default rounding mode)
  • __shfl and friends are only available on Intel via cl_intel_subgroups extension.

Host runtime API

Implemented with caveats:

  • hipEventElapsedTime() can return imprecise values

  • hipModuleLaunchKernel accepts the "extra" argument, but the size of pointed to memory (HIP_LAUNCH_PARAM_BUFFER_SIZE) must be exactly the sum of sizes of individual arguments - no padding is allowed. Otherwise it's impossible to figure out how to set the OpenCL kernel arguments.

  • A certain amount of Device properties are impossible to get via OpenCL API. Values reported by HIPCL are completely made up.

Not implemented and/or require extension to OpenCL:

  • hipError_t hipModuleLoadData(hipModule_t* module, const void* image);

  • hipError_t hipModuleLoadDataEx(hipModule_t *module, const void *image,...);

    This API is not possible to implement with SPIR-V binaries, because there is no size parameter (only a void* pointer), and SPIR-V binaries don't have their size embedded. It might be possible to implement with disassembled text format of SPIR-V.

  • hipSetDeviceFlags(unsigned flags)

    The flags change how the runtime waits for results (yield thread to OS or busy waiting / spinning)

  • hipError_t hipStreamCreateWithPriority(hipStream_t* stream, unsigned int flags, int priority);

  • hipError_t hipStreamCreateWithFlags(hipStream_t* stream, unsigned int flags);

  • hipError_t hipEventCreateWithFlags(hipEvent_t* event, unsigned flags);

  • hipError_t hipDeviceSetCacheConfig(hipFuncCache_t cacheConfig);

  • hipError_t hipDeviceGetCacheConfig(hipFuncCache_t* cacheConfig);

  • hipError_t hipDeviceGetSharedMemConfig(hipSharedMemConfig* pConfig);

  • hipError_t hipDeviceSetSharedMemConfig(hipSharedMemConfig config);

  • hipError_t hipFuncSetCacheConfig(const void* func, hipFuncCache_t config);

  • hipError_t hipCtxSetCacheConfig(hipFuncCache_t cacheConfig);

  • hipError_t hipCtxSetSharedMemConfig(hipSharedMemConfig config);

  • hipError_t hipPointerGetAttributes(hipPointerAttribute_t* attributes, const void* ptr);

  • hipError_t hipExtMallocWithFlags(void** ptr, size_t sizeBytes, unsigned int flags);

  • hipError_t hipGetDeviceProperties(hipDeviceProp_t *prop, int deviceId);

  • hipError_t hipStreamQuery(hipStream_t stream)

  • hipError_t hipModuleGetGlobal(hipDeviceptr_t* dptr, size_t* bytes, hipModule_t hmod, const char* name);

  • hipError_t hipHostGetFlags(unsigned int* flagsPtr, void* hostPtr);

  • hipError_t hipFuncGetAttributes(hipFuncAttributes* attr, const void* func);

  • hipError_t hipDeviceGetPCIBusId(char* pciBusId, int len, int device);

  • hipError_t hipDeviceGetByPCIBusId(int* device, const char* pciBusId);

  • hipError_t hipSetDeviceFlags(unsigned flags);

Symbol API Not Implemented
  • hipError_t hipMemcpyToSymbolAsync(void*, const void*, size_t, size_t, hipMemcpyKind, hipStream_t, const char*);
  • hipError_t hipMemcpyFromSymbol(void*, const void*, size_t, size_t, hipMemcpyKind, const char*);
  • hipError_t hipMemcpyFromSymbolAsync(void*, const void*, size_t, size_t, hipMemcpyKind, hipStream_t, const char*);
  • hipError_t hipGetSymbolAddress(void** devPtr, const void* symbolName);
  • hipError_t hipGetSymbolSize(size_t* size, const void* symbolName);
  • hipError_t hipMemcpyToSymbol(void*, const void*, size_t, size_t, hipMemcpyKind, const char*);
  • hipError_t hipMemcpyToSymbol(const void* symbolName, const void* src, size_t sizeBytes, size_t offset __dparm(0), hipMemcpyKind kind __dparm(hipMemcpyHostToDevice));
Peer2Peer Functions Are Not Implemented Yet
  • hipError_t hipDeviceCanAccessPeer(int* canAccessPeer, int deviceId, int peerDeviceId);
  • hipError_t hipDeviceEnablePeerAccess(int peerDeviceId, unsigned int flags);
  • hipError_t hipDeviceDisablePeerAccess(int peerDeviceId);
  • hipError_t hipMemcpyPeer(void* dst, int dstDeviceId, const void* src, int srcDeviceId, size_t sizeBytes);
  • hipError_t hipMemcpyPeerAsync(void* dst, int dstDeviceId, const void* src, int srcDevice, size_t sizeBytes, hipStream_t stream __dparm(0));
PROFILER Not implemented
  • hipError_t hipProfilerStart();
  • hipError_t hipProfilerStop();
API CALLBACKs Not implemented
  • hipError_t hipRegisterApiCallback(uint32_t id, void* fun, void* arg);
  • hipError_t hipRemoveApiCallback(uint32_t id);
  • hipError_t hipRegisterActivityCallback(uint32_t id, void* fun, void* arg);
  • hipError_t hipRemoveActivityCallback(uint32_t id);
TEXTURES not implemented

About

OpenCL/SPIR-V implementation of HIP

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 79.5%
  • C 17.8%
  • CMake 2.7%