Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Add a contrib operator for Constant #15993

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

vandanavk
Copy link
Contributor

@vandanavk vandanavk commented Aug 24, 2019

Description

MXNet doesn't have a constant op for sym API (similar to gluon.constant). The closest is to map it to BlockGrad as discussed in #6087. It would be useful to have a constant operator in general, and it is also required to convert Pytorch models to MXNet through ONNX.

Adding support for this operator here.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Add a constant operator in contrib, add relevant tests
  • Add ONNX conversion for contrib.constant, and relevant tests

Comments

@vandanavk vandanavk requested a review from szha as a code owner August 24, 2019 18:56
@vandanavk
Copy link
Contributor Author

@mxnet-label-bot add [Operator, pr-awaiting-review, ONNX]

Copy link
Contributor

@ZhennanQin ZhennanQin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance optimization purpose, I think it's better to have constant against with variable. Make constant as a operator is an option, but the question is, how to set the value for it? Especially when the value is not a broad casted value.

@vandanavk
Copy link
Contributor Author

For performance optimization purpose, I think it's better to have constant against with variable. Make constant as a operator is an option, but the question is, how to set the value for it? Especially when the value is not a broad casted value.

One of the main reasons for adding this as an operator was for ONNX conversion. I tried blockgrad with constant initializer - but it dint convert well.
Could you give an example for the second part of your question - the one about when the value is not broadcasted.

executor = constant_op.simple_bind(ctx=cpu())
executor.forward(is_train=True)
executor.outputs
[ -1. 2.]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the most confusing part. I assume v1 is the shape of constant output, but where to specify the output value? On the other words, why the output is [-1, 2]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1 is the value we want to store. since this operator is a constant, the output is equal to the input ([-1,2])

@ZhennanQin
Copy link
Contributor

I mean, If we did some constant folding optimization in the graph, and generate a new tensor which is not filled with same values. For example,

scaled_weight = weight * scale, 
out = convolution(weight = scaled_weight)

For this code, that scale multiply will execute many times during each iteration of inference. Because weight and scale are all constant, so we can fold them before inference, and use a constant node to replace the folded result in the graph. The value holding by constant node is vary, which can't be generated with simple pattern.

@anirudhacharya
Copy link
Member

can you reference this issue in the PR description - #6087

@vandanavk
Copy link
Contributor Author

vandanavk commented Jan 17, 2020

I mean, If we did some constant folding optimization in the graph, and generate a new tensor which is not filled with same values. For example,

scaled_weight = weight * scale, 
out = convolution(weight = scaled_weight)

For this code, that scale multiply will execute many times during each iteration of inference. Because weight and scale are all constant, so we can fold them before inference, and use a constant node to replace the folded result in the graph. The value holding by constant node is vary, which can't be generated with simple pattern.

@ZhennanQin which operator is currently used in this scenario?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
ONNX Operator pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants