Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Numpy] Add sampling method for bernoulli #16638

Merged
merged 9 commits into from
Nov 12, 2019

Conversation

xidulu
Copy link
Contributor

@xidulu xidulu commented Oct 26, 2019

Description

Native numpy does not support sampling directly from Bernoulli distribution, in order to do so, users have to use np.random.binomial and set n = 1 manually.
However, I think a separate implementation for Bernoulli is worth consideration for the following reasons:

  1. Sampling from a bernoulli, unlike binomial, is very easy to implement.
  2. random.bernoulli could be temporarily used as a ''work around'' for binomial sampling, see https://github.com/pytorch/pytorch/blob/071971476d7431a24e527bdc181981678055a95d/torch/distributions/binomial.py#L102 for more details. ( also, some discussions: torch.distributions.Binomial.sample() uses a massive amount of memory pytorch/pytorch#20343 )

This pull request also fixes an existing bug in np.random.uniform, where scalar tensor would cause infer shape failure.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@xidulu xidulu requested a review from szha as a code owner October 26, 2019 13:13
@haojin2 haojin2 assigned haojin2 and sxjscience and unassigned haojin2 Oct 28, 2019
@haojin2 haojin2 requested a review from sxjscience October 28, 2019 03:39
@haojin2 haojin2 added the Numpy label Oct 28, 2019
@xidulu
Copy link
Contributor Author

xidulu commented Nov 4, 2019

@reminisce
For some reasons, operators under npx.random cannot have access to npi ops, could you help me resolve this issue ?

@xidulu
Copy link
Contributor Author

xidulu commented Nov 5, 2019

Correctness for distribution has been briefly verified by hand.
With the following code:

In [9]: (npx.random.bernoulli(prob=prob, size=(1000000, 10, 10)).mean(0) - prob).mean()
Out[9]: array(-3.0148094e-05, ctx=gpu(0))

In [10]: (npx.random.bernoulli(prob=prob, size=(1000000, 10, 10)).var(0) - prob * (1 - prob)).mean()
Out[10]: array(-9.472111e-06, ctx=gpu(0))

In [11]: logit = np.log(prob) - np.log(1 - prob)

In [12]: (npx.random.bernoulli(logit=logit, size=(1000000, 10, 10)).mean(0) - prob).mean()
Out[12]: array(-3.4588957e-05, ctx=gpu(0))

In [13]: (npx.random.bernoulli(logit=logit, size=(1000000, 10, 10)).var(0) - prob * (1 - prob)).mean()
Out[13]: array(1.1445168e-05, ctx=gpu(0))

@codecov-io
Copy link

Codecov Report

Merging #16638 into master will decrease coverage by 0.05%.
The diff coverage is 32.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16638      +/-   ##
==========================================
- Coverage      67%   66.94%   -0.06%     
==========================================
  Files         264      271       +7     
  Lines       29507    29980     +473     
  Branches     4357     4440      +83     
==========================================
+ Hits        19770    20071     +301     
- Misses       8490     8637     +147     
- Partials     1247     1272      +25
Impacted Files Coverage Δ
python/mxnet/base.py 69.77% <100%> (-0.16%) ⬇️
python/mxnet/numpy_extension/__init__.py 100% <100%> (ø) ⬆️
python/mxnet/symbol/numpy_extension/__init__.py 100% <100%> (ø) ⬆️
python/mxnet/ndarray/numpy_extension/__init__.py 100% <100%> (ø) ⬆️
python/mxnet/symbol/numpy_extension/random.py 24% <24%> (ø)
python/mxnet/ndarray/numpy_extension/random.py 24% <24%> (ø)
python/mxnet/numpy_extension/random.py 88.88% <75%> (-11.12%) ⬇️
python/mxnet/gluon/contrib/cnn/conv_layers.py 41.6% <0%> (-25.49%) ⬇️
python/mxnet/contrib/quantization.py 61.39% <0%> (-0.49%) ⬇️
...hon/mxnet/gluon/contrib/estimator/event_handler.py 73.56% <0%> (-0.35%) ⬇️
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29e467b...70ae946. Read the comment docs.

@xidulu xidulu changed the title [WIP] [Numpy] Add sampling method for bernoulli [Numpy] Add sampling method for bernoulli Nov 6, 2019
@xidulu xidulu changed the title [Numpy] Add sampling method for bernoulli [Numpy] Sampling method for bernoulli Nov 7, 2019
@xidulu xidulu changed the title [Numpy] Sampling method for bernoulli [Numpy] Add sampling method for bernoulli Nov 7, 2019
@xidulu
Copy link
Contributor Author

xidulu commented Nov 9, 2019

@sxjscience
Could you take a look at my implementation? Thx :)

Copy link
Contributor

@haojin2 haojin2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for 1 minor style problem. @sxjscience Any more comments?

@sxjscience sxjscience merged commit 02f4f05 into apache:master Nov 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants