-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Quantilized Fully Connected #5
base: master2
Are you sure you want to change the base?
Conversation
…loc/delete in each run
namespace mxnet { | ||
namespace op { | ||
|
||
namespace qfc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use a more descriptive name instead of qfc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
struct QuantizedSumInitKernelWithBias { | ||
// init sum data with bias for matrix b (n) | ||
MSHADOW_XINLINE static void Map(int i, int32_t *out, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously in the optimization for embedding, we thought that Map
function will have a low efficiency for omp. Do you have any profiling data here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in ut testing, map way is faster(2-3X) than omp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I thought kernel launch is also using omp.
// shift data from int8(from -128 to 127) to uint8 (from 0 to 255) | ||
int shift = 128; | ||
Tensor<cpu, 1, uint8_t> shiftdata = | ||
ctx.requested[qfc::kTempSpace].get_space_typed<cpu, 1, uint8_t>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any profiling data for this line?
Tensor<cpu, 1, uint8_t> shiftdata = | ||
ctx.requested[qfc::kTempSpace].get_space_typed<cpu, 1, uint8_t>( | ||
Shape1(m * k), s); | ||
Kernel<QuantizedShiftKernel, cpu>::Launch(s, m * k, data.data().dptr<int8_t>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int8_t -> SrcType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
NNVM_REGISTER_OP(_contrib_quantized_fully_connected) | ||
.set_attr<FComputeEx>("FComputeEx<cpu>", | ||
MKLDNNQuantizedFullyConnectedForward<int8_t>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only supports int8? How about the quantized FC for GPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.set_attr("FCompute", QuantizedFullyConnectedForwardGPU<int8_t, int32_t, int32_t>); in quantized_fully_connected.cu
@@ -79,6 +79,22 @@ bool QuantizedFullyConnectedType(const nnvm::NodeAttrs& attrs, | |||
return true; | |||
} | |||
|
|||
bool QuantizedFullyConnectedStorageType(const nnvm::NodeAttrs& attrs, | |||
const int dev_mask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
namespace mxnet { | ||
namespace op { | ||
|
||
namespace quantilizedfc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To align with the definition in fp32 fully_connected-inl.h, suggest to use quantized_fullc
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
namespace op { | ||
|
||
namespace quantilizedfc { | ||
enum QuantilizedfcOpResource {kTempSpace}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QuantizedFullyConnectedOpResource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
struct QuantizedSumInitKernelWithBias { | ||
// init sum data with bias for matrix b (n) | ||
MSHADOW_XINLINE static void Map(int i, int32_t *out, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I thought kernel launch is also using omp.
out[i] = bias[i] * float_for_one_bias_quant / | ||
float_for_one_out_quant; | ||
} else { | ||
LOG(INFO) << "WARNING: QuantizedBiasAddKernel float_for_one_out_quant is 0 !"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's QuantizedBiasAddKernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -139,6 +141,9 @@ void MKLDNNQuantizedFullyConnectedForward(const nnvm::NodeAttrs& attrs, | |||
out.data().dptr<int32_t>(), | |||
n, | |||
&oc); | |||
#else | |||
LOG(FATAL) << "s8u8s32 is not supported by the BLAS library"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe need change to "s8u8s32 is only supported by MKL BLAS library"?
Description
In this PR, it created quantilized fully connected op by using int8 gemm
@pengzhao-intel, @TaoLv , @ciyongch
Feature changes
New features
Unit-test changes
Checklist