Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Quantilized Fully Connected #5

Open
wants to merge 14 commits into
base: master2
Choose a base branch
from
Open

Support Quantilized Fully Connected #5

wants to merge 14 commits into from

Conversation

lihaofd
Copy link
Owner

@lihaofd lihaofd commented Oct 19, 2018

Description

In this PR, it created quantilized fully connected op by using int8 gemm
@pengzhao-intel, @TaoLv , @ciyongch

Feature changes

New features

  • Support quantilized fully connected op by using int8 gemm
  • Support int8 bias by using beta offset

Unit-test changes

  • Update testcase test_quantized_fc in tests/python/quantization/test_quantization.py
  • Check consistency with original mx.sym.FullyConnected implementation.

Checklist

  • Passed code style checking (make lint).
  • All changes have test coverage.
  • Code is well-documented.

namespace mxnet {
namespace op {

namespace qfc {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use a more descriptive name instead of qfc.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


struct QuantizedSumInitKernelWithBias {
// init sum data with bias for matrix b (n)
MSHADOW_XINLINE static void Map(int i, int32_t *out,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously in the optimization for embedding, we thought that Map function will have a low efficiency for omp. Do you have any profiling data here?

Copy link
Owner Author

@lihaofd lihaofd Oct 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in ut testing, map way is faster(2-3X) than omp

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I thought kernel launch is also using omp.

// shift data from int8(from -128 to 127) to uint8 (from 0 to 255)
int shift = 128;
Tensor<cpu, 1, uint8_t> shiftdata =
ctx.requested[qfc::kTempSpace].get_space_typed<cpu, 1, uint8_t>(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any profiling data for this line?

Tensor<cpu, 1, uint8_t> shiftdata =
ctx.requested[qfc::kTempSpace].get_space_typed<cpu, 1, uint8_t>(
Shape1(m * k), s);
Kernel<QuantizedShiftKernel, cpu>::Launch(s, m * k, data.data().dptr<int8_t>(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int8_t -> SrcType?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


NNVM_REGISTER_OP(_contrib_quantized_fully_connected)
.set_attr<FComputeEx>("FComputeEx<cpu>",
MKLDNNQuantizedFullyConnectedForward<int8_t>)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only supports int8? How about the quantized FC for GPU?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.set_attr("FCompute", QuantizedFullyConnectedForwardGPU<int8_t, int32_t, int32_t>); in quantized_fully_connected.cu

@@ -79,6 +79,22 @@ bool QuantizedFullyConnectedType(const nnvm::NodeAttrs& attrs,
return true;
}

bool QuantizedFullyConnectedStorageType(const nnvm::NodeAttrs& attrs,
const int dev_mask,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

namespace mxnet {
namespace op {

namespace quantilizedfc {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To align with the definition in fp32 fully_connected-inl.h, suggest to use quantized_fullc here.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

namespace op {

namespace quantilizedfc {
enum QuantilizedfcOpResource {kTempSpace};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QuantizedFullyConnectedOpResource

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


struct QuantizedSumInitKernelWithBias {
// init sum data with bias for matrix b (n)
MSHADOW_XINLINE static void Map(int i, int32_t *out,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I thought kernel launch is also using omp.

out[i] = bias[i] * float_for_one_bias_quant /
float_for_one_out_quant;
} else {
LOG(INFO) << "WARNING: QuantizedBiasAddKernel float_for_one_out_quant is 0 !";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's QuantizedBiasAddKernel?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -139,6 +141,9 @@ void MKLDNNQuantizedFullyConnectedForward(const nnvm::NodeAttrs& attrs,
out.data().dptr<int32_t>(),
n,
&oc);
#else
LOG(FATAL) << "s8u8s32 is not supported by the BLAS library";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe need change to "s8u8s32 is only supported by MKL BLAS library"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants