Support Quantilized Fully Connected #5

lihaofd · 2018-10-19T02:00:26Z

Description

In this PR, it created quantilized fully connected op by using int8 gemm
@pengzhao-intel, @TaoLv , @ciyongch

Feature changes

New features

Support quantilized fully connected op by using int8 gemm
Support int8 bias by using beta offset

Unit-test changes

Update testcase test_quantized_fc in tests/python/quantization/test_quantization.py
Check consistency with original mx.sym.FullyConnected implementation.

Checklist

Passed code style checking (make lint).
All changes have test coverage.
Code is well-documented.

…loc/delete in each run

TaoLv · 2018-10-19T02:54:05Z

src/operator/quantization/mkldnn/mkldnn_quantized_fully_connected.cc

+namespace mxnet {
+namespace op {
+
+namespace qfc {


please use a more descriptive name instead of qfc.

TaoLv · 2018-10-19T02:56:58Z

src/operator/quantization/mkldnn/mkldnn_quantized_fully_connected.cc

+
+struct QuantizedSumInitKernelWithBias {
+  //  init sum data with bias for matrix b (n)
+  MSHADOW_XINLINE static void Map(int i, int32_t *out,


Previously in the optimization for embedding, we thought that Map function will have a low efficiency for omp. Do you have any profiling data here?

in ut testing, map way is faster(2-3X) than omp

Interesting. I thought kernel launch is also using omp.

TaoLv · 2018-10-19T02:59:22Z

src/operator/quantization/mkldnn/mkldnn_quantized_fully_connected.cc

+  //  shift data from int8(from -128 to 127) to uint8 (from 0 to 255)
+  int shift = 128;
+  Tensor<cpu, 1, uint8_t> shiftdata =
+    ctx.requested[qfc::kTempSpace].get_space_typed<cpu, 1, uint8_t>(


Any profiling data for this line?

TaoLv · 2018-10-19T03:00:06Z

src/operator/quantization/mkldnn/mkldnn_quantized_fully_connected.cc

+  Tensor<cpu, 1, uint8_t> shiftdata =
+    ctx.requested[qfc::kTempSpace].get_space_typed<cpu, 1, uint8_t>(
+      Shape1(m * k), s);
+  Kernel<QuantizedShiftKernel, cpu>::Launch(s, m * k, data.data().dptr<int8_t>(),


int8_t -> SrcType?

TaoLv · 2018-10-19T03:02:28Z

src/operator/quantization/mkldnn/mkldnn_quantized_fully_connected.cc

+
+NNVM_REGISTER_OP(_contrib_quantized_fully_connected)
+.set_attr<FComputeEx>("FComputeEx<cpu>",
+    MKLDNNQuantizedFullyConnectedForward<int8_t>)


only supports int8? How about the quantized FC for GPU?

.set_attr("FCompute", QuantizedFullyConnectedForwardGPU<int8_t, int32_t, int32_t>); in quantized_fully_connected.cu

TaoLv · 2018-10-19T03:02:45Z

src/operator/quantization/quantized_fully_connected.cc

@@ -79,6 +79,22 @@ bool QuantizedFullyConnectedType(const nnvm::NodeAttrs& attrs,
  return true;
 }

+bool QuantizedFullyConnectedStorageType(const nnvm::NodeAttrs& attrs,
+                              const int dev_mask,


TaoLv · 2018-10-19T07:59:17Z

src/operator/quantization/mkldnn/mkldnn_quantized_fully_connected.cc

+namespace mxnet {
+namespace op {
+
+namespace quantilizedfc {


To align with the definition in fp32 fully_connected-inl.h, suggest to use quantized_fullc here.

TaoLv · 2018-10-19T08:03:33Z

src/operator/quantization/mkldnn/mkldnn_quantized_fully_connected.cc

+namespace op {
+
+namespace quantilizedfc {
+enum QuantilizedfcOpResource {kTempSpace};


QuantizedFullyConnectedOpResource

TaoLv · 2018-10-19T08:04:43Z

src/operator/quantization/mkldnn/mkldnn_quantized_fully_connected.cc

+
+struct QuantizedSumInitKernelWithBias {
+  //  init sum data with bias for matrix b (n)
+  MSHADOW_XINLINE static void Map(int i, int32_t *out,


Interesting. I thought kernel launch is also using omp.

TaoLv · 2018-10-19T08:05:33Z

src/operator/quantization/mkldnn/mkldnn_quantized_fully_connected.cc

+      out[i] = bias[i] * float_for_one_bias_quant /
+               float_for_one_out_quant;
+    } else {
+      LOG(INFO) << "WARNING: QuantizedBiasAddKernel float_for_one_out_quant is 0 !";


what's QuantizedBiasAddKernel?

TaoLv · 2018-10-23T06:18:49Z

src/operator/quantization/mkldnn/mkldnn_quantized_fully_connected.cc

@@ -139,6 +141,9 @@ void MKLDNNQuantizedFullyConnectedForward(const nnvm::NodeAttrs& attrs,
                     out.data().dptr<int32_t>(),
                     n,
                     &oc);
+  #else
+    LOG(FATAL) << "s8u8s32 is not supported by the BLAS library";


Maybe need change to "s8u8s32 is only supported by MKL BLAS library"?

Li, Hao H added 8 commits October 16, 2018 13:43

support quantilized fc in cpu

1c20a87

fix typo bug and register resource for shift data buffer to avoid mal…

6aaa4c1

…loc/delete in each run

add comment for s8u8 input

d62481d

optimized shift

4080092

fix lint issue

e03378b

optimized for bias and offset

ffc8064

fix typo bug

cf9c074

fix resource registration issue

f84668c

TaoLv reviewed Oct 19, 2018

View reviewed changes

fix typo bug

d4350e5

TaoLv reviewed Oct 19, 2018

View reviewed changes

Li, Hao H added 4 commits October 19, 2018 16:17

fix typo bug

a3f16e4

optimize omp for sum and copyoffset

ae6c940

optimize for sum

18d04bc

add pre micro for s8u8 mklml check

e9e49c1

TaoLv reviewed Oct 23, 2018

View reviewed changes

fix typo bug

e8bf13f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Quantilized Fully Connected #5

Support Quantilized Fully Connected #5

lihaofd commented Oct 19, 2018 •

edited

Loading

TaoLv Oct 19, 2018

lihaofd Oct 19, 2018

TaoLv Oct 19, 2018

lihaofd Oct 19, 2018 •

edited

Loading

TaoLv Oct 19, 2018

TaoLv Oct 19, 2018

TaoLv Oct 19, 2018

lihaofd Oct 19, 2018

TaoLv Oct 19, 2018

lihaofd Oct 19, 2018

TaoLv Oct 19, 2018

lihaofd Oct 19, 2018

TaoLv Oct 19, 2018

lihaofd Oct 19, 2018

TaoLv Oct 19, 2018

lihaofd Oct 19, 2018

TaoLv Oct 19, 2018

TaoLv Oct 19, 2018

lihaofd Oct 19, 2018

TaoLv Oct 23, 2018

Support Quantilized Fully Connected #5

Are you sure you want to change the base?

Support Quantilized Fully Connected #5

Conversation

lihaofd commented Oct 19, 2018 • edited Loading

Description

Feature changes

New features

Unit-test changes

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lihaofd Oct 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lihaofd commented Oct 19, 2018 •

edited

Loading

lihaofd Oct 19, 2018 •

edited

Loading