Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Partitioning Gluon HybridBlocks #15969

Merged
merged 20 commits into from
Feb 6, 2020

Conversation

samskalicky
Copy link
Contributor

@samskalicky samskalicky commented Aug 22, 2019

Description

Adds partitioning support for Gluon HybridBlocks. This is a continuation of the partitioning support for Symbol #15886

Design

In Gluon, a HybridBlock contains a Symbol after hybridizing and executing a forward pass. The Symbol is contained and managed within the block. The partitioning logic will be integrated into the hybridize flow.

There are many ways to create a Gluon Hybrid block and after this process, users call the hybridize() function to start the flow. We add two new arguments to support partitioning: backend which is a string corresponding to the subgraph_backend name, and opt_args which is a map of arguments that should be passed to the subgraph_property during partitioning. These values are stored until used during the first inference call. Heres an example specifying these new arguments:

net = create()
net.hybridize(backend='default', opt_args={excluded_ops=['BatchNorm']}) 

Notice that in the above example, the new arguments have the same value as the example in #15886. These arguments will ultimately be passed to a call to the optimize_for API.

In the Gluon, the hybridize flow starts before the first inference. The Symbol object is created in the _build_cache function:
https://github.com/apache/incubator-mxnet/blob/bd67723da96e6d36e72c9a42535a4fe68f234a71/python/mxnet/gluon/block.py#L933-L934
We'll add a new line of code to partition it and pass the new arguments from the hybridize call:

def _build_cache(self, *args):
        data, out = self._get_graph(*args)
        if self.backend:
                out = out.optimize_for(self.backend, **self.opt_args)

This supports the partitioning flow without shape/type propagation. Some backends do not need shapes and types so there is no reason to require it for all backends. Other backends will require shapes and types in order to partition the model correctly (examples being backends that only support float16 and not float32, or only support small shapes and not large ones).

For the partitioning with with shape/type propagation we can get the args to the model from the parameters in the Gluon block. By default, the initialization of Gluon parameters may be delayed. If the parameters are not initialized yet, we'll continue with the flow shown in the code snippet above that does not infer shapes/types.

In Gluon users can force initialization (see this guide) and if all parameters are initialized after calling hybridize and setting the backend name, we will pass the arguments from the Gluon parameters into the optimize_for API to infer shapes/types before partitioning. This gives the user the control over partitioning in the same way that they do for Symbol API. Heres a code snippet to produce the arg array and pass it to optimize_for:

arg_array = []
try:
    for name in out.list_arguments():
        if name in data_names.keys():
            arg_array.append(args[data_names[name]])
        else:
            arg_array.append(params.get(name))
except DeferredInitializationError:
    arg_array = None
except RuntimeError:
    arg_array = None
out = out.optimize_for(self._backend, arg_array, ctx, **self._backend_args)

The context will be gathered from the inputs to the model like this:

ctx = args[0].context

Context is required to infer storage types.

Note

Partitioning is done as part of the hybridize flow, when building the cachedOp. So if shapes change between infer calls the graph is not re-partitioned.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Refactor subgraph tests, returns input names and shapes (needed for gluon). improves test architecture to be more clear about what we're testing (subgraph API, optimize for, gluon).

@samskalicky samskalicky requested a review from szha as a code owner August 22, 2019 01:43
@leezu
Copy link
Contributor

leezu commented Aug 22, 2019

Should this be run prior to training or prior to exporting the HybridBlock? Could/Should it be run automatically?

Edit: Based on offline discussion, automatic optimization could be run if the backend can be detected automatically. We would not want to automatically export an optimized symbol.

@samskalicky
Copy link
Contributor Author

Waiting on #15886 to be merged to re-use optimize_for API call on symbol

@anirudhacharya
Copy link
Member

@mxnet-label-bot add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Aug 26, 2019
@samskalicky samskalicky mentioned this pull request Jan 7, 2020
4 tasks
@guanxinq guanxinq force-pushed the cached_op_partition branch from abea125 to f90b9ab Compare January 17, 2020 23:32
@guanxinq guanxinq force-pushed the cached_op_partition branch from f90b9ab to d286167 Compare January 21, 2020 18:47
@guanxinq guanxinq force-pushed the cached_op_partition branch from d286167 to 4b3d076 Compare January 21, 2020 18:50
@samskalicky
Copy link
Contributor Author

Thanks @guanxinq for the latest update! I think we need to call optimize for again here too when we create a SymbolBlock, otherwise the partitioning wont happen:
https://github.com/apache/incubator-mxnet/blob/3d18974fdc990b7def2401fae8e46fb0b030442f/python/mxnet/gluon/block.py#L1340
Because once self._cached_graph is set the previous code wont be executed in the _get_graph function

@samskalicky
Copy link
Contributor Author

python/mxnet/gluon/block.py Outdated Show resolved Hide resolved
@guanxinq guanxinq force-pushed the cached_op_partition branch 2 times, most recently from b6636da to 7228343 Compare January 29, 2020 22:55
python/mxnet/gluon/block.py Outdated Show resolved Hide resolved
python/mxnet/gluon/block.py Outdated Show resolved Hide resolved
@guanxinq guanxinq force-pushed the cached_op_partition branch from 90daf39 to 80dfaed Compare February 4, 2020 21:54
Copy link
Contributor

@mseth10 mseth10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Good job @guanxinq @samskalicky

@mseth10
Copy link
Contributor

mseth10 commented Feb 5, 2020

@leezu your comments have been addressed. can you please review again?

Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in python/mxnet/gluon/block.py LGTM

Copy link
Contributor

@guanxinq guanxinq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request for documentation. Otherwise looks good to me

@@ -1040,7 +1052,12 @@ def register_child(self, block, name=None):
super(HybridBlock, self).register_child(block, name)
self._clear_cached_op()

def hybridize(self, active=True, **kwargs):
def hybridize(self, active=True, backend=None, backend_args=None, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm. This is specific for hybridblock? Can we add documentation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on prior discussion with @samskalicky, documentation should describe what happens if input shapes change in subsequent forward calls. (Ie. currently no repartitioning is triggered).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the description for hybridblock hybridize().

Copy link
Member

@eric-haibin-lin eric-haibin-lin Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see it - did you push?

Copy link
Contributor

@guanxinq guanxinq Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed. Could you help review the description?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the concept of SubgraphBackendRegistry, PostPartition, etc are new and not very straightforward to users. Is it possible to also add a link to any tutorial that teaches user how to register a subgraph backend?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we plan to add a tutorial as part of our next PR and link it to the example. I have put together the TODO list for the next PR in this github issue #17532 .

tests/python/unittest/test_subgraph_op.py Show resolved Hide resolved
python/mxnet/gluon/block.py Show resolved Hide resolved
python/mxnet/gluon/block.py Outdated Show resolved Hide resolved
if backend_args is None:
self._backend_args = {}
else:
self._backend_args = backend_args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to enforce users to pass a dictionary (since user may pass a string), so we need to add a check below before assign it to _backend_args

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add something like

if isinstance(backend_args, dict)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still need an else block.

else:
    self._backend_args = {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, we don't need else as it is initialized to {}.

@guanxinq guanxinq force-pushed the cached_op_partition branch from 71c88a7 to 5aaff0b Compare February 5, 2020 22:37
Whether to turn hybrid on or off.
backend : str
The name of backend, as registered in `SubgraphBackendRegistry`, default None
backend_args : dict of optional arguments, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: optional twice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@guanxinq guanxinq force-pushed the cached_op_partition branch 2 times, most recently from b332edd to 966a383 Compare February 5, 2020 23:08
but slower.
"""
""" Please refer description of HybridBlock hybridize().
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you get rid of trailing whitespaces

@guanxinq guanxinq force-pushed the cached_op_partition branch from 966a383 to 14dcc14 Compare February 5, 2020 23:47
@eric-haibin-lin eric-haibin-lin dismissed their stale review February 6, 2020 18:37

agree to address doc issue in a future PR

@eric-haibin-lin eric-haibin-lin merged commit 9993738 into apache:master Feb 6, 2020
@samskalicky samskalicky mentioned this pull request Feb 13, 2020
4 tasks
zheyuye pushed a commit to zheyuye/incubator-mxnet that referenced this pull request Feb 19, 2020
* stub for optimizing Gluon block

* Init commit for Gluon hybridblocks partition(sample test included)

* Added tests for Gluon and refactored tests

* call optimize_for in _build_cache

* Pass in 4 paras for gluon optimize_for

* Fixed auxiliary state issue, args issue and added 2 tests.

* Fixed auxiliary state issue, args issue and added 2 tests.

* changed parameter check

* refactored param init since needed for partitioning

* fixed whitespace

* fixed flattened args

* fixed sanity & updated tests

* fixed whitespace

* added context support in tests

* Fix python2 errors

* clean code remove cargs

* Add hybridblock hybridize() description

Co-authored-by: guanxinq <[email protected]>
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request May 29, 2020
* stub for optimizing Gluon block

* Init commit for Gluon hybridblocks partition(sample test included)

* Added tests for Gluon and refactored tests

* call optimize_for in _build_cache

* Pass in 4 paras for gluon optimize_for

* Fixed auxiliary state issue, args issue and added 2 tests.

* Fixed auxiliary state issue, args issue and added 2 tests.

* changed parameter check

* refactored param init since needed for partitioning

* fixed whitespace

* fixed flattened args

* fixed sanity & updated tests

* fixed whitespace

* added context support in tests

* Fix python2 errors

* clean code remove cargs

* Add hybridblock hybridize() description

Co-authored-by: guanxinq <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants