Skip to content

πŸƒβ€β™‚οΈ Run arbitrary OpenCL kernels

License

Notifications You must be signed in to change notification settings

ChrisCummins/cldrive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

217c2c2 Β· Dec 4, 2023
May 13, 2020
Apr 2, 2020
Apr 2, 2020
Apr 25, 2020
Apr 6, 2020
Apr 2, 2020
Apr 2, 2020
Mar 14, 2020
Feb 13, 2020
Feb 26, 2020
Jan 7, 2020
Feb 28, 2018
Dec 4, 2023
May 7, 2020
Apr 2, 2020
Apr 25, 2020

Repository files navigation

cldrive - Run arbitrary OpenCL kernels

cldrive is a tool for running arbitrary OpenCL kernels to record their runtimes and outputs. It reads OpenCL kernels from an input file, and for each, generates random inputs (parameterized by a given size), runs the kernel and records its execution time and outputs. It was developed as part of my work on Deep Learning benchmark synthesis, and has been used in the following publications:

  1. Cummins, C., Petoumenos, P., Zang, W., & Leather, H. (2017). Synthesizing Benchmarks for Predictive Modeling. CGO. IEEE.
  2. Cummins, C., Petoumenos, P., Wang, Z., & Leather, H. (2017). End-to-end Deep Learning of Optimization Heuristics. PACT. IEEE.
  3. Ben-Nun, T., Jakobovits, A. S., & Hoefler, T. (2018). Neural Code Comprehension: A Learnable Representation of Code Semantics. NeurIPS.
  4. Cummins, C., Petoumenos, P., Murray, A., & Leather, H. (2018). Compiler Fuzzing through Deep Learning. ISSTA.
  5. Goens, A., Brauckmann, A., Ertel, S., Cummins, C., Leather, H., & Castrillon, J. (2019). A Case Study on Machine Learning for Synthesizing Benchmarks. MAPL.
  6. Cummins, C. (2020). Deep Learning for Compilers. University of Edinburgh.

Build

See INSTALL.md for instructions on setting up the build environment.

Then build cldrive using:

$ bazel build -c opt //gpu/cldrive

This will build an optimized cldrive binary and print its path.

Usage

$ cldrive --srcs=<opencl_sources> --envs=<opencl_devices>

Where <opencl_sources> if a comma separated list of absolute paths to OpenCL source files, and <opencl_devices> is a comma separated list of fully-qualified OpenCL device names. To list the available device names use --clinfo. Use --help to see the full list of options.

Example

For example, given a file:

$ cat kernel.cl
kernel void my_kernel(global int* a, global int* b) {
    int tid = get_global_id(0);
    a[tid] += 1;
    b[tid] = a[tid] * 2;
}

and available OpenCL devices:

$ cldrive --clinfo
GPU|NVIDIA|GeForce_GTX_1080|396.37|1.2
CPU|Intel|Intel_Xeon_CPU_E5-2620_v4_@_2.10GHz|1.2.0.25|2.0

To run the kernel 5 times on both devices using 4096 work items divided into work groups of size 1024:

$ cldrive --srcs=$PWD/kernel.cl --num_runs=5 \
    --gsize=4096 --lsize=1024 \
    --envs='GPU|NVIDIA|GeForce_GTX_1080|396.37|1.2','CPU|Intel|Intel_Xeon_CPU_E5-2620_v4_@_2.10GHz|1.2.0.25|2.0'
OpenCL Device, Kernel Name, Global Size, Local Size, Transferred Bytes, Runtime (ns)
I 2019-02-26 09:54:10 [gpu/cldrive/libcldrive.cc:59] clBuildProgram() with options '-cl-kernel-arg-info' completed in 1851 ms
GPU|NVIDIA|GeForce_GTX_1080|396.37|1.2, my_kernel, 4096, 1024, 65536, 113344
GPU|NVIDIA|GeForce_GTX_1080|396.37|1.2, my_kernel, 4096, 1024, 65536, 57984
GPU|NVIDIA|GeForce_GTX_1080|396.37|1.2, my_kernel, 4096, 1024, 65536, 64096
GPU|NVIDIA|GeForce_GTX_1080|396.37|1.2, my_kernel, 4096, 1024, 65536, 73696
GPU|NVIDIA|GeForce_GTX_1080|396.37|1.2, my_kernel, 4096, 1024, 65536, 73632
I 2019-02-26 09:54:11 [gpu/cldrive/libcldrive.cc:59] clBuildProgram() with options '-cl-kernel-arg-info' completed in 76 ms
CPU|Intel|Intel_Xeon_CPU_E5-2620_v4_@_2.10GHz|1.2.0.25|2.0, my_kernel, 4096, 1024, 65536, 105440
CPU|Intel|Intel_Xeon_CPU_E5-2620_v4_@_2.10GHz|1.2.0.25|2.0, my_kernel, 4096, 1024, 65536, 55936
CPU|Intel|Intel_Xeon_CPU_E5-2620_v4_@_2.10GHz|1.2.0.25|2.0, my_kernel, 4096, 1024, 65536, 63296
CPU|Intel|Intel_Xeon_CPU_E5-2620_v4_@_2.10GHz|1.2.0.25|2.0, my_kernel, 4096, 1024, 65536, 56192
CPU|Intel|Intel_Xeon_CPU_E5-2620_v4_@_2.10GHz|1.2.0.25|2.0, my_kernel, 4096, 1024, 65536, 55680

By default, cldrive prints a CSV summary of kernel stats and runtimes to stdout, and logging information to stderr. The raw information produced by cldrive is described in a set of protocol buffers //gpu/cldrive/proto:cldrive.proto. To print cldrive.Instances protos to stdout, use argumet --output_format=pbtxt to print text format protos, or --output_format=pb for binary format.

License

Copyright 2016-2020 Chris Cummins chrisc.101@gmail.com.

Released under the terms of the GPLv3 license. See LICENSE for details.