Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nnvm RNN op with icc and mkldnn memory cache #10

Open
wants to merge 4 commits into
base: master2
Choose a base branch
from
Open

Conversation

lihaofd
Copy link
Owner

@lihaofd lihaofd commented Jan 22, 2019

Description

(Brief description on what this PR is about)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

Copy link

@TaoLv TaoLv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed part of the change. Will review the rest once get time. Ping @ciyongch.

namespace op {

namespace mkldnn_rnn_enum {
enum RNNModeType {kRnnRelu, kRnnTanh, kLstm, kGru};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's already defined in operator/rnn-inl.h.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}

template <typename DType>
mkldnn::memory::data_type GetMKLDNNDataType() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks strange and unsafe for me. Please refer the example in https://en.cppreference.com/w/cpp/language/typeid .

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a function "get_mkldnn_type()" in mkldnn_base-inl.h, can we just reuse that function instead of creating a similar one?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return algo;
}

void ReorderForWeight(mkldnn::memory src,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ReorderForWeight? Looks like a general reorder function, not only for weight.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

void ReorderForWeight(mkldnn::memory src,
mkldnn::memory dst) {
auto r = mkldnn::reorder(src, dst);
stream(stream::kind::eager).submit({r}).wait();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's stream? If it's mkldnn::stream, you need call MKLDNNStream::Get().

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


auto dst_desc = mkldnn::memory::desc(dst_cds, mkldnn_dtype, dst_format);
auto concat_pd = mkldnn::concat::primitive_desc(dst_desc, concat_dimension, srcs_pd);
auto dst = mkldnn::memory(concat_pd.dst_primitive_desc());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will allocate temporal memory here...

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

mkldnn::memory::dims dst_cds,
mkldnn::memory::data_type mkldnn_dtype,
int concat_dimension,
std::vector<DType*> srcs_data) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary to have DType here. Use 'void*' then you needn't have this function as a template function.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

for (size_t i = 0; i < srcs_cds.size(); i++) {
auto desc = memory::desc(srcs_cds[i], mkldnn_dtype, src_format);
auto mpd = memory::primitive_desc(desc, cpu_engine);
auto src_memory = mkldnn::memory({desc, cpu_engine}, srcs_data[i]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atuo src_memory = mkldnn::memory(mpd, srcs_data[i]);

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

const int T,
const int N,
int I,
const int H,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why T, N, H are const but L, D, I are not const? Any special consideration?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}

template <typename DType>
mkldnn::memory::data_type GetMKLDNNDataType() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a function "get_mkldnn_type()" in mkldnn_base-inl.h, can we just reuse that function instead of creating a similar one?

@@ -39,6 +39,7 @@
#include "./math_functions-inl.h"
#include "./operator_common.h"
#include "./rnn_impl.h"
#include "./nn/mkldnn/mkldnn_rnn_impl.h"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "#if MXNET_USE_MKLDNN ==1" for mkldnn related header file.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

param_.Init(kwargs);
#if MXNET_USE_MKLDNN == 1
template<typename DType>
static RNNOp<DType> &GetMKLDNNRNNOp(const RNNParam &param,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move MKLDNN functions to mkldnn related files.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many of class and struct are shared here, if moving all to mkldnn related file, it will create many repeated codes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree to move MKLDNN code to mkldnn_rnn_impl.h. Please refer what we did for other operators.

std::shared_ptr<RNNOp<DType>> op(new RNNOp<DType>(param));
auto ins_ret = ops.insert(std::pair<RNNSignature, std::shared_ptr<RNNOp<DType> > >(key, op));
CHECK(ins_ret.second);
it = ins_ret.first;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer mkldnn_convolution.cc to use AddToCache() function here.
Why not add data/output here?

if (param_.mode == rnn_enum::kLstm) {
CHECK_EQ(in_shape->size(), 4U) << "Input:[data, parameters, state, cell_state]";
} else {
CHECK_EQ(in_shape->size(), 3U) << "Input:[data, parameters, state]";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to make the error info more comprehensible, such as expected 4 inputs ..... but got x inputs?

ConcatData(mkldnn::memory::format::ldgoi, mkldnn::memory::format::ldgoi,
{weights_iter_r_tz_0, weights_iter_r_tz_0}, weights_iter_tz_0,
mkldnn_dtype, 1, srcs_data1);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combine these two if closure into one.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


auto user_src_layer_memory_0 = mkldnn::memory(
{ user_src_layer_md_0, cpu_engine }, x_0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the way SetNewMem as current MKLDNN ops?

}
}
// go to next L - 1 layers.
// If D = 2, do it layer by layer. If D = 1, fused L - 1 layers

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the case of I != H, I think the two stage is reasonable due to MKLDNN API restriction.
For the case of I == H, do you think it's able to combine the two stage into one? If so, only one primitive is needed.

memory::dims dst_layer_tz_0 = {T, N, D * H};
memory::dims src_iter_tz_0 = {1, D, nstates, N, H}; // ldsnc
memory::dims dst_iter_tz_0 = {1, D, nstates, N, H}; // ldsnc
int offset1 = 0, offset2 = 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments for the variables naming, please use more meaningful names. maybe change back_xxx to l2r_xxx or r2l_xxx?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in non_mkldnn path, the naming of GRU/LSTM/vRNN are based on back_xxx, it is better to keep them same

back_b_ptr = b_ptr + single_b_size * 2;
}
DType* back_wx_0 = back_w_ptr;
DType* back_wh_0 = back_w_ptr + I * H * ngates;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move all the assignment of back_xxx to if (D == 2) closure, and with default value here is enough.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will trying it, but some of them have ordered definition and can't move them all

@TaoLv
Copy link

TaoLv commented Jan 25, 2019

Cannot see your fix. Please push them here. @lihaofd

@lihaofd
Copy link
Owner Author

lihaofd commented Jan 25, 2019

@TaoLv I am still fixing the code, if I pushed the code now, many of comment's relate code can't be found easily, I will push the code fix after I fix them all.

@TaoLv
Copy link

TaoLv commented Jan 25, 2019

I see. So it's "will fix" rather than "fixed".

if ((*in_type)[i] == -1) {
(*in_type)[i] = dtype;
} else {
UNIFORM_TYPE_CHECK((*in_type)[i], dtype, ListArguments(param_)[i]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May exceed the boundary of list if i > 3.

DispatchMode wanted_mode = DispatchMode::kFCompute;
#if MXNET_USE_MKLDNN == 1
wanted_mode = DispatchMode::kFComputeEx;
#endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't work for gpu.

param_.Init(kwargs);
#if MXNET_USE_MKLDNN == 1
template<typename DType>
static RNNOp<DType> &GetMKLDNNRNNOp(const RNNParam &param,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree to move MKLDNN code to mkldnn_rnn_impl.h. Please refer what we did for other operators.

const std::vector<NDArray> &inputs,
const std::vector<OpReqType> &req,
const std::vector<NDArray> &outputs) {
RNNParam& param = (RNNParam&)nnvm::get<RNNParam>(attrs.parsed);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many code in this function should be moved to MKLDNNRNNForward.


auto prim_desc_0 = mkldnn::rnn_forward::primitive_desc(layer_desc_0, cpu_engine);
auto dst_layer_memory_0 = mkldnn::memory(prim_desc_0.dst_layer_primitive_desc());
auto dst_iter_memory_0 = mkldnn::memory(prim_desc_0.dst_iter_primitive_desc());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will allocate memory.

auto prim_desc = mkldnn::rnn_forward::primitive_desc(layer_desc, cpu_engine);
auto dst_layer_memory = mkldnn::memory(prim_desc.dst_layer_primitive_desc());
dst_layer_memory.set_data_handle(y);
auto dst_iter_memory = mkldnn::memory(prim_desc.dst_iter_primitive_desc());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will allocate memory.

auto prim_desc = mkldnn::rnn_forward::primitive_desc(layer_desc, cpu_engine);
auto dst_layer_memory = mkldnn::memory(prim_desc.dst_layer_primitive_desc());
dst_layer_memory.set_data_handle(y);
auto dst_iter_memory = mkldnn::memory(prim_desc.dst_iter_primitive_desc());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory allocation again.

user_src_iter_memory, (*weight_bias_memory)[3],
(*weight_bias_memory)[4], (*weight_bias_memory)[5],
dst_layer_memory, dst_iter_memory, null_memory_));
stream(stream::kind::eager).submit(rnn_net2).wait();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MKLDNNStream

}

template <typename DType>
void MKLDNNRNNForward(bool state_outputs,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is too long to review and maintain. Can you split it into several functions and combine them together here. I guess some blocks can also be shared when we implement backward.

auto dst = mkldnn::memory(concat_pd.dst_primitive_desc());

auto c = mkldnn::concat(concat_pd, inputs, dst);
stream(stream::kind::eager).submit({c}).wait();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MKLDNNStream

@TaoLv
Copy link

TaoLv commented Jan 25, 2019

No need to wait for all comments addressed. Github will help to maintain the review history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants