The latest benchmark is executed at 19:04:09 of Sunday, 4/02/2023 (GMT+9)
First, as a matrix arithmetic library, I measured benchmarks compared to NumPy as impartial as possible..
Also, cl-waffe is also a deep learning framework. Benchmakrs compared to PyTorch is available.
The main purpose of this benchmark is to measure the difference between cl-waffe and numpy/pytorch. Not to indicate which is a superior and inferior. For example, we can learn in what scale cl-waffe is useful or cl-waffe is slow.
From this result, we can learn:
With regard to smaller matrices, cl-waffe is slow, because bottlenecks in library calls, have room for optimisation.
With regard to larger matrices, cl-waffe performs almost the same. This is because the overheads can be ignored, and their differences stem from their backends, rather than libraries.
machine-type | machine-version | software-version | software-type |
---|---|---|---|
X86-64 | Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz | 21.5.0 | Darwin |
all benchmark is working on single-float(np.float32)
- Working on SBCL [2.3.3]
- cl-waffe (latest, 19:04:09 of Sunday, 4/02/2023 (GMT+9))
cl-user::*lla-configuration*
(LIBRARIES (/usr/local/Cellar/openblas/0.3.22/lib/libblas.dylib))
-
Working on Python 3.9.12
-
numpy version: 1.23.5, works on MKL
import numpy as np
np.show_config()
blas_armpl_info:
NOT AVAILABLE
blas_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/Users/hikettei/opt/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/Users/hikettei/opt/anaconda3/include']
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/Users/hikettei/opt/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/Users/hikettei/opt/anaconda3/include']
lapack_armpl_info:
NOT AVAILABLE
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/Users/hikettei/opt/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/Users/hikettei/opt/anaconda3/include']
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/Users/hikettei/opt/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/Users/hikettei/opt/anaconda3/include']
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL
export OPENBLAS_NUM_THREADS=4
export MKL_NUM_THREADS=4
Multiplying K*K Matrices for N times.
Applying broadcasting-add to A[K, K, 1] and B[1, K, K] for N times
Computes (!aref (!randn `(,K ,K)) '(200 400) t) for N times.
Computes denselayer (defined as out = (!relu (!add (!matmul weight x) bias))
) for N times.
coming soon...