-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
benchmark/python/sparse/dot.py
Outdated
if rhs_density > 1 or rhs_density < 0: | ||
raise ValueError("rhs_density has to be between 0 and 1") | ||
raise ValueError("Value other than csr for lhs not supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check and the error statement don't seem to match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Fixed.
src/operator/tensor/dot-inl.h
Outdated
* \brief CPU Kernel of PopulateCsrForNNC | ||
* Parallelization by individual rows | ||
*/ | ||
struct PopulateCsrForNNC { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add brief description on what this kernel is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
src/operator/tensor/dot-inl.h
Outdated
@@ -231,6 +231,12 @@ inline bool DotForwardInferStorageType(const nnvm::NodeAttrs& attrs, | |||
dispatched = storage_type_assign(&out_stype, kDefaultStorage, | |||
dispatch_mode, DispatchMode::kFComputeEx); | |||
} | |||
if (!dispatched && lhs_stype == kDefaultStorage && rhs_stype == kCSRStorage && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the implementation only available on CPU? No fallback on GPU ctx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a check for CPU. will fallback to default storage for gpu
src/operator/tensor/dot-inl.h
Outdated
const OpReqType req, NDArray* ret) { | ||
if (kNullOp == req) return; | ||
CHECK_EQ(rhs.storage_type(), kCSRStorage); | ||
if (!rhs.storage_initialized()) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we set the result to be ZerosCsrImpl
before return
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
src/operator/tensor/dot-inl.h
Outdated
inline void DotDnsCsrCsrImpl(const OpContext& ctx, const cpu& cpu_dev, | ||
const TBlob& lhs, const NDArray& rhs, | ||
const OpReqType req, NDArray* ret) { | ||
if (kNullOp == req) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is kAddTo
and kWriteInplace
not checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this. Fixed.
src/operator/tensor/dot-inl.h
Outdated
return; | ||
} | ||
|
||
dim_t num_threads = mxnet_op::get_num_threads<cpu>(num_rows_l); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const for both
src/operator/tensor/dot-inl.h
Outdated
s, num_rows_l, nnc_idx, indptr_out, col_idx_out, nnc, num_rows_l); | ||
mxnet_op::Kernel<mxnet_op::set_zero, cpu>::Launch(s, nnz, data_out); | ||
|
||
if (nnc == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't nnc
never be 0 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should nnc never be 0 ? This is possible when number of non zero columns are zero(matrix with all zeros) in the rhs. In this case we return the output correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you already checked rhs.storage_initialized()
in line 922?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the if and also added some documentation for storage_initialized
src/operator/tensor/dot-inl.h
Outdated
s, num_rows_l, nnc_idx, indptr_out, col_idx_out, nnc, num_rows_l); | ||
mxnet_op::Kernel<mxnet_op::set_zero, cpu>::Launch(s, nnz, data_out); | ||
|
||
if (nnc == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you already checked rhs.storage_initialized()
in line 922?
src/operator/tensor/dot-inl.h
Outdated
// dns, csr -> csr | ||
if (dev_mask == mshadow::cpu::kDevMask) { | ||
dispatched = storage_type_assign(&out_stype, kCSRStorage, dispatch_mode, | ||
DispatchMode::kFComputeEx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is output stype consistent on cpu and gpu? The output stype should be consistent to avoid confusion to users (see https://github.com/apache/incubator-mxnet/blob/d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/operator/tensor/matrix_op-inl.h#L400-L418)
The only difference is that on GPU, it performs fallback. If the output stype infers sparse, then it first produce dense output, then cast it to sparse. The fallback is handled in executor already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. A few minor comments
include/mxnet/ndarray.h
Outdated
@@ -305,7 +305,10 @@ class NDArray { | |||
bool fresh_out_grad() const; | |||
/*! \return updated grad state in entry_ */ | |||
void set_fresh_out_grad(bool state) const; | |||
// returns true if a sparse ndarray's aux_data and storage are initialized | |||
/*! \brief Returns true if a sparse ndarray's aux_data and storage are initialized | |||
* Returns false if the indices array shape is inconsistent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Returns false if the indices array shape is inconsistent" -> it actually throws an exception without returning false
const auto dispatch_ex = invalid_ctx ? DispatchMode::kFComputeFallback | ||
: DispatchMode::kFComputeEx; | ||
dispatched = storage_type_assign(&out_stype, kCSRStorage, dispatch_mode, | ||
dispatch_ex); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. we should log storage fallback as long as dispatch mode is dispatch_fallback:
https://github.com/apache/incubator-mxnet/blob/d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/operator/elemwise_op_common.h#L79-L81
Maybe I should move this logic to the common path instead of letting developers specify that in operators
https://github.com/apache/incubator-mxnet/blob/master/src/executor/infer_graph_attr_pass.cc#L45-L54
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes moving the logic to common path would be nice. I see multiple places where we don't have this check. For example, https://github.com/apache/incubator-mxnet/blob/d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/operator/tensor/elemwise_unary_op_basic.cc#L65 and https://github.com/apache/incubator-mxnet/blob/d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/operator/tensor/elemwise_binary_scalar_op_basic.cc#L68 . These also need to be fixed right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We can fix that in a separate PR.
@@ -1248,10 +1273,12 @@ def test_sparse_dot_zero_output(lhs_shape, trans_lhs, rhs_num_cols): | |||
test_dot_csr(lhs_shape, (lhs_shape[0], 1), 'default', True, lhs_d, rhs_d) # (vector kernel) | |||
test_dot_csr(lhs_shape, (lhs_shape[1], rnd.randint(5, 10)), 'default', False, lhs_d, rhs_d) # test gpu SpMM | |||
test_dot_csr(lhs_shape, (lhs_shape[0], rnd.randint(5, 10)), 'default', True, lhs_d, rhs_d) # (scalar kernel) | |||
test_dot_dns_csr(lhs_shape, (lhs_shape[1], rnd.randint(500, 1000)), lhs_d, lhs_d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
randint(50,200) is large (and slow) enough for testing. No need to increase the dim to 1000.
src/operator/tensor/dot-inl.h
Outdated
using namespace mshadow::expr; | ||
using nnvm::dim_t; | ||
|
||
/*Initialize data structures*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space after /*
src/operator/tensor/dot-inl.h
Outdated
const CType start_idx = i * nnc; | ||
nnvm::dim_t cur = 0; | ||
indptr_out[i] = start_idx; | ||
if (i == static_cast<int>(num_rows_l - 1)) indptr_out[i + 1] = indptr_out[i] + nnc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are adding large array support in the future, it's more appropriate to cast i up to dim_t
instead of cast num_rows_l down to int
.
@eric-haibin-lin Thank you for reviewing! I have made the necessary changes. |
Is the operator documentation not updated? |
Added dot(dns, csr) = csr to operator doc |
* Add operator for dot(dns, csr) = csr * Fix whitespace * Add comments * Add comments and fix error message * Fixes for dot dns csr * Fixes * Remove non required statements * Add fallback for GPU * Remove unused if * Fix comments and casting * Add operator to the documentation
* Add operator for dot(dns, csr) = csr * Fix whitespace * Add comments * Add comments and fix error message * Fixes for dot dns csr * Fixes * Remove non required statements * Add fallback for GPU * Remove unused if * Fix comments and casting * Add operator to the documentation
* Add operator for dot(dns, csr) = csr * Fix whitespace * Add comments * Add comments and fix error message * Fixes for dot dns csr * Fixes * Remove non required statements * Add fallback for GPU * Remove unused if * Fix comments and casting * Add operator to the documentation
Description
Adds operator for dot(dns, csr) = csr. Backward pass will fallback to default implementations.
The performance is better than dot(dns, dns) for sparsity less than 0.5% (c4.8xlarge). Below are the results for tests on c4.8xlarge with OMP_NUM_THREADS set to 32.
Checklist
Essentials
make lint
)Changes