Separable Convolution by CUDA and TVM
CUDA and TVM code for separable convolution (depthwise convolution + 1x1 convolution).
TVM scheduler for normal convolution with the tensor layout in the form of NHWC, currently missing in the TVM repository.
Improved runtime of TVM’s depthwise convolution scheduler by replacing filterreuse with input reuse.