-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
moved library import order to after ndarray/symbol
re-registered ops from mx.nd.op to mx.nd
cef93e8
to
8e12588
Compare
@wkcn while this PR is not quite done yet, it would be great to get some early feedback since the design/implementation has changed since our initial discussion. Let me know what you think, thanks! |
@wkcn 1.6 code freeze is tomorrow, so are you ok with this one not going into the 1.6 release? It is because none of us have time to maintain it until late November. After code freeze then we can merge it on Friday, and user can use nightly build to access this feature. |
@rondogency No problem : ) |
Hi @samskalicky and @rondogency , is it ready to merge this PR after CI passes? |
Yes! We're soooooo ready to merge :) Thanks @zachgk for reruning the unix_cpu job! |
I will merge this PR after the CI passes. |
@wkcn Big thank to Jackie for the merging work! |
|
||
def check_platform(): | ||
return platform.machine() not in ['x86_64', 'AMD64'] | ||
|
||
@unittest.skipIf(check_platform(), "not all machine types supported") | ||
@unittest.skipIf(is_cd_run(), "continuous delivery run - ignoring test") | ||
def test_library_loading(): | ||
def test_custom_op(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has a strong assumption that the case will be called from mxnet root folder. Otherwise, the libsample_lib.so will not be found.
$ cd tests/python/unittest/
$ nosetests -v test_extensions:test_custom_op
test_extensions.test_custom_op ... ERROR
======================================================================
ERROR: test_extensions.test_custom_op
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/lvtao/miniconda3/envs/mxnet/lib/python3.6/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/home/lvtao/Workspace/mxnet-official/tests/python/unittest/test_extensions.py", line 41, in test_custom_op
raise MXNetError("library %s not found " % lib)
mxnet.base.MXNetError: library libsample_lib.so not found
----------------------------------------------------------------------
Ran 1 test in 0.005s
FAILED (errors=1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use mx.libinfo.find_lib_path to find the library?
https://github.com/apache/incubator-mxnet/blob/93228649340bcacb8056d47d8f6f8a78a9805ae4/python/mxnet/libinfo.py#L26
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha I will fix it in the next PR
Add random number generator support for custom operator libraries. Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow: mx.random.seed(128) r1 = mx.nd.some_custom_random_op(data) mx.random.seed(128) r2 = mx.nd.some_custom_random_op(data) assert (r1 == r2) This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet This is a continuation of the custom operator project #15921 and #17270
Add random number generator support for custom operator libraries. Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow: mx.random.seed(128) r1 = mx.nd.some_custom_random_op(data) mx.random.seed(128) r2 = mx.nd.some_custom_random_op(data) assert (r1 == r2) This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet This is a continuation of the custom operator project apache#15921 and apache#17270
…18069) * Dynamic subgraph compile support (#17623) This PR adds support for passing the NDArrays from the existing optimize_for API down to the reviewSubgraph function in an external library. It also adds a new API for HybridBlock called optimize_for that can partition the model without running a forward pass. Feature changes Adds new API to HybridBlock optimize_for that partitions the model but does not call the cachedOp Modifies the subgraph library example to optionally require args to be provided Adds annotation on subgraph inputs for the name of the original param so that inputs can be mapped and passes annotations to input nodes of subgraphs Adds support for tensors in MKLDNN format, calls Reorder2Default New tests Adds a new test to partition operators that directly consume params add a new model to test where ops to be partitioned have args/params Bug Fixes fixes bug in passing ids vector by value instead of by reference fixes bug in passing copies of attributes instead of by reference fixes bug where _cached_graph was not updated after partitioning fixes memory leak where user-specified attributes on subgraph ops were not freed if subgraph was rejected fixes problem incorrectly indexing into shape/dtype maps when annotating the graph Docs Updates the README doc with the latest changes described above * Adding sparse support to MXTensor for custom operators (#17569) * Added enum for sparse storage * Add structure for Dense and Sparse * redesign the data structure for MXSparse * pull out aux data from sparse NDArray * Added more sparse arguments to API interface * Passed sparse from c_api to lib_api.h and set in MXTensor * Fix indent * fix segfault * Fix NDArray to MXTensor errors * Add a sample of sparse(CSR) transpose * Make CSR transpose temporarily work by hardcoding * Fixed sparse output size(Refined) * Add tests for symbolic and stateful ops * Added a sample for row sparse transpose * Added real row sparse transpose * Fix output size issue by adding lambda for CheckAndAlloc() * Fix mixed storage formats error * Added infer storage type function * resolve comments * Set inferSType as optional function * Resolve comments * Add error messages * Resolve comments * verify transpose ops results * fix sanity check * update MX_LIBRARY_VERSION to 5 * Custom Operator Random Number Generator Support (#17762) Add random number generator support for custom operator libraries. Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow: mx.random.seed(128) r1 = mx.nd.some_custom_random_op(data) mx.random.seed(128) r2 = mx.nd.some_custom_random_op(data) assert (r1 == r2) This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet This is a continuation of the custom operator project #15921 and #17270 Co-authored-by: guanxinq <[email protected]> Co-authored-by: Ziyi Mu <[email protected]>
* Dynamic subgraph compile support (#17623) This PR adds support for passing the NDArrays from the existing optimize_for API down to the reviewSubgraph function in an external library. It also adds a new API for HybridBlock called optimize_for that can partition the model without running a forward pass. Feature changes Adds new API to HybridBlock optimize_for that partitions the model but does not call the cachedOp Modifies the subgraph library example to optionally require args to be provided Adds annotation on subgraph inputs for the name of the original param so that inputs can be mapped and passes annotations to input nodes of subgraphs Adds support for tensors in MKLDNN format, calls Reorder2Default New tests Adds a new test to partition operators that directly consume params add a new model to test where ops to be partitioned have args/params Bug Fixes fixes bug in passing ids vector by value instead of by reference fixes bug in passing copies of attributes instead of by reference fixes bug where _cached_graph was not updated after partitioning fixes memory leak where user-specified attributes on subgraph ops were not freed if subgraph was rejected fixes problem incorrectly indexing into shape/dtype maps when annotating the graph Docs Updates the README doc with the latest changes described above * Adding sparse support to MXTensor for custom operators (#17569) * Added enum for sparse storage * Add structure for Dense and Sparse * redesign the data structure for MXSparse * pull out aux data from sparse NDArray * Added more sparse arguments to API interface * Passed sparse from c_api to lib_api.h and set in MXTensor * Fix indent * fix segfault * Fix NDArray to MXTensor errors * Add a sample of sparse(CSR) transpose * Make CSR transpose temporarily work by hardcoding * Fixed sparse output size(Refined) * Add tests for symbolic and stateful ops * Added a sample for row sparse transpose * Added real row sparse transpose * Fix output size issue by adding lambda for CheckAndAlloc() * Fix mixed storage formats error * Added infer storage type function * resolve comments * Set inferSType as optional function * Resolve comments * Add error messages * Resolve comments * verify transpose ops results * fix sanity check * update MX_LIBRARY_VERSION to 5 * Custom Operator Random Number Generator Support (#17762) Add random number generator support for custom operator libraries. Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow: mx.random.seed(128) r1 = mx.nd.some_custom_random_op(data) mx.random.seed(128) r2 = mx.nd.some_custom_random_op(data) assert (r1 == r2) This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet This is a continuation of the custom operator project #15921 and #17270 Co-authored-by: guanxinq <[email protected]> Co-authored-by: Ziyi Mu <[email protected]>
Description
Enhancements to dynamic library loading to support custom operators in libraries.
Initially, this project was proposed on the CWiki, however the design has evolved since the initial proposal. The current design is described below.
Design
The goal of this PR to to enable operators to be implemented in separate libraries and loaded at runtime
The main constraint is to maintain a low-level C-types only boundary between MXNet and the library to simplify the building and compiling of external libraries.
Working backwards from the user, users register operators with easy-to-use function prototypes like:
Users'
Forward
(ie.FCompute
) functions are called by a helper function_opCallFCompute
that converts the C-types passed across the library boundary to the familiar STL types. This function is implemented in thelib_api.h
header file that users compile with their library.In MXNet's C API, the
_opCallFCompute
function is found in the library. A lambda functionfcomp_conv
is created for each operator loaded from the library to convert from MXNet-types to C-types. Then these C-types are passed to_opCallFCompute
.The same design is used for: parseAttrs, inferShape, inferType, etc.
Finally, an operator is re-registered in MXNet with the lambda function like:
Once the C API returns back to Python in the
load
function inlibrary.py
, we regenerate the Python bindings and re-register the operator shortcuts to mx.nd and mx.sym.After the
load
function returns back to the user's Python code, they can then use their operators just like any other operate:Current Features
Future/Next-steps
(to be done in a separate PR)
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments