Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-744] Sphinx error reduction #11916

Closed

Conversation

aaronmarkham
Copy link
Contributor

@aaronmarkham aaronmarkham commented Jul 27, 2018

Description

This PR reduces the number of Sphinx errors and warnings from 1628 to 251 on a full Sphinx build.
It reduces the warnings from a Sphinx toctree being missing from 170 to 0 on incremental Sphinx builds.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)

  • Changes are complete (i.e. I finished coding on this PR)
    This PR helps cleanup some of problems to pave the way for a better understanding of how the parts work together, and how they might be updated or replaced.

  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
    Note: while this might provide a simple preview and help test the left nav and toctree features, I'd advise looking at a full site build with the versions dropdown.

Changes

  1. docs/get_started folder is deleted: it only held a redirect which already being handled by .htaccess. Removing it reduces warnings/errors.
  2. docs/architecture/release_note_0_9.md is deleted: it was the only legacy release note. Release notes are held on the releases page. Removing it reduces warnings/errors.
  3. Sphinx's docs/conf.py updates:
    • exclude_patterns : added these to reduce warnings/errors: ['build_version_doc' (working dir), 'api/python/model.md' (depricated), 'README.md' (not needed for Sphinx's indexing)]
    • update config to the working Sphinx version 1.5.6
    • update config to 2018 copyright
    • update config to have correct project URL
  4. I disabled/removed some functionality it docs/_static/js/sidebar.js. I was able to find some info about it by digging through old commits and PRs, but many of the PRs and commits have no description or commit message. I can see that the functionality may have been of some benefit before where it magically modified the output of Sphinx. However, since it wasn't documented, this caused me hours of frustration when Sphinx's documented behavior was being nullified by this client-side activity.
    • Replaying the previous sidebar layout and merging with the existing one plays havoc with the UI after one has gone through and configured the toctree for each folder. This is why I removed it. It is possible if the original authors took a look, they could update it, but for reasons I outline later, I think it would be better to remove it.
    • Adding Clojure to the nav can come in a separate PR. But since I was in the sidebar already, I added it.
  5. Added new index pages where ones were missing or modified existing ones. This is generally a good practice for SEO, and drastically improves the warnings/errors situation.
    • I used the :hidden: directive for tutorials and faq as they already have custom indexes, but were the source of many warnings/errors.
    • I skipped modifying index pages that did not result in warnings/errors.
    • I used the :glob: pattern where I could to auto-index a folder. This works best when there are no subfolders to index. You can see this mostly in the tutorials section. :
.. toctree::
   :glob:
    *

Comments

  • As we gear up for foreign translations of the site, I really can't see how undocumented, custom client-side overrides are going to be maintainable. We should work within the features of the site generator. I recommend we try to fix the site at it's core and as part of a regular build pipeline, and remove these kind of "magical" features.
  • There is additional complexity with the versions dropdown that seems to be covered in the sidebar logic. Disabling this client-side hack might result in uncovering more issues and entanglements. I'd appreciate any SMEs to assist figuring out solutions that can help simply things and get us ready for internationalization of the site.

Usage

These changes gets triggered in a regular make docs, but you can run just the Sphinx section of the docs pipeline by commenting out these in docs/mxdoc.py:

#app.connect("builder-inited", generate_doxygen)
#app.connect("builder-inited", build_scala_docs)
#app.connect("builder-inited", build_clojure_docs)

Then run (from /docs):

make html

@aaronmarkham aaronmarkham requested a review from szha as a code owner July 27, 2018 17:00
@aaronmarkham
Copy link
Contributor Author

@ThomasDelteil @thomelane @sandeep-krishnamurthy @kevinthesun @piiswrong @mli @nswamy @marcoabreu - you all use this, or have used it... please let me know if you have any suggestions.

@szha
Copy link
Member

szha commented Jul 27, 2018

AWESOME!

@aaronmarkham aaronmarkham force-pushed the sphinx_error_reduction branch from 7b3aa5e to 0b4c45d Compare July 27, 2018 17:43
@aaronmarkham aaronmarkham changed the title [MXNET-371] Sphinx error reduction [MXNET-744] Sphinx error reduction Jul 27, 2018
@ThomasDelteil
Copy link
Contributor

Thanks for cleaning this up @aaronmarkham

@aaronmarkham
Copy link
Contributor Author

I found an issue with the versions and the sidebar.js. The sidebar is missing in the API pages with older versions (non-master). Let's hold off merging until I can figure out what magic it was doing.

@aaronmarkham
Copy link
Contributor Author

UPDATE:
Part of the problem is changes I make here are not viewable in the full site build pipeline as that checks out from specific branches on the main repo and not my fork. Then each legacy version only uses the configurations it has and not any updates I make here.
So... while I can fix the current build, the builds for every other version are still buggy as hell. 😖

I'm working on the tooling now so that you can supply logic from your fork/branch and supply that as an arbitrary version in the build. This process also is belabored by every version and every API build which makes for test cycles of about 40 minutes. I'm updating the build tools to turn on/off documentation sets via a config file instead of how they're hard-coded in the Sphinx plugin file, mxdoc.py. I've got it down to 18 minutes for all versions.

This has led to certain realizations.
💡 We're probably going to have to refactor the docs build process, so that we have better control.
💡 Each version should be built separately and maintained separately, rather than an overlapping pipeline.
💡 Each version should be built with configurations held in master, or...
💡 We're probably going to have to make some commits to the old version branches to fix legacy bugs, so that we're not doing hot patches during every build.
💡 Maybe we can cache old versions and not have to rebuild every time. Then builds can be done in a few minutes.
💡 And finally, maybe I can actually debug whatever is going on with this Sphinx theme! Or if anyone really knows Sphinx theme customization and how this code works, they can help address why the left navigation behaves unexpectedly now that index files and table of contents are being supplied as per Sphinx's documentation. Basically, it seems to copy state from previous views and when you toggle between versions --> shenanigans.

Any help would be appreciated. 🙏

@aaronmarkham aaronmarkham force-pushed the sphinx_error_reduction branch from 7c4dd09 to e5b3c41 Compare August 6, 2018 19:04
@aaronmarkham
Copy link
Contributor Author

I rebased this on top of #11990, so I get the benefit of the docs build tools refactor.
Also, the Jenkinsfile in this PR will need to be reset to production settings once everything is approved.

@aaronmarkham aaronmarkham force-pushed the sphinx_error_reduction branch from 068481e to 396a009 Compare August 7, 2018 15:47
@aaronmarkham
Copy link
Contributor Author

Using the new build pipeline, I created a preview website with the versions dropdown to show what happens with the toctree updates. Note that the dropdown show my fork's branch, which was used to create one of the versions of the website, and as the default version. Here's a screenshot showing the left side nav is unexpectedly different. Not bad really. Just not what we had before. It now lists the APIs which is coming from the toctree.

2018-08-07_13-37-39

Here's the view if you switch to the 1.2.1 version:
2018-08-07_13-40-55

I'm not sure how to make the updated version where all of the toctrees are in place, to look like the old version. Or, maybe the old version was always buggy, but we didn't realize it since it worked despite all of the Sphinx errors.

If no one is opposed to this new behavior for the left navigation, then I would say this PR is ready.

@aaronmarkham aaronmarkham force-pushed the sphinx_error_reduction branch from 3f079da to 89fd3be Compare August 7, 2018 21:13
@anirudh2290 anirudh2290 added Website pr-awaiting-review PR is waiting for code review labels Aug 9, 2018
@szha szha removed their request for review August 9, 2018 18:57
@lebeg
Copy link
Contributor

lebeg commented Aug 15, 2018

Looks good

@aaronmarkham
Copy link
Contributor Author

Need to rebase and test after recent updates! Let me do that before merging. Thanks!

@aaronmarkham aaronmarkham force-pushed the sphinx_error_reduction branch 2 times, most recently from 181faa2 to 8692481 Compare August 16, 2018 21:12
@aaronmarkham
Copy link
Contributor Author

This is viewable here: http://34.201.8.176/
Looks good to me.
I should have noted earlier that this only fixes the errors in master and versions going forward. There are still a ton of errors when we build the old versions of the docs and website. I think this patch could be applied to each old branch with good success if that's of interest.

add title, fix render errors, fix inaccurate text
adding tutorial index pages to whitelist

added custom fork feature

adding settings to turn off/on doc sets

using custom fork directory for artifacts

automate upstream branch refresh

switched to boolean types and added debug messaging

build will copy current config files to each version build

build will copy current config files to each version build

stashing config files before checking out new version

put mxnet.css as artifact to be copied during build

fix formatting issues in h tags

refactored to build each version in a different folder

grab latest README from local fork

using settings.ini for document sets per version

fix R doc config for mxnet root

matching conf.py updates to current and excluding 3rdparty folder

align R doc gen bug fix with other PR 11970

pass the current tag in the make args and set to default if empty

fix bug for default version and add BUILD_VER to make html call

turning off scala docs for versions less than 1.2.0

turning off r docs until CI can handle it

enabling new docs build capability in CI

failover to fetching remote branch

Remove stale Keras-MXNet tests from MXNet repo (apache#11902)

Disable flaky cpp test (apache#12056)

Adjusting tolerance level and removing fixed seed for tests: test_ifft, test_fft (apache#12010)

* adjusting tolerance level and removing fixed seed

* CI retrigger

* removing status

[MXNET-774] Flaky test in test_executor.py:test_bind (apache#12016)

* fix test bind, remove fixed seed

* add tracking info

* remove tracking info

fix flaky test_quantization.test_get_optimal_thresholds (apache#12004)

removed fixed seed 1234 (apache#12072)

tested with 100k runs, no failures

improve error message of cudnn operators (apache#11886)

Fix for undefined variable errors (apache#12037)

* Undefined name in initializer

* Fix undefined name in test_mkldnn

* Fix for undefined names in examples

Fix undefined_variable lint errors in examples (apache#12052)

* Fix lint errors in dqn example

* Fix lint error in gluon example

* Fix undefined error in autoencoder example

MXNET-776 [Perl] Better documentation/bug fixes. (apache#12038)

* MXNET-776
1) Several new metric classes.
2) Improved documentation.
3) Bugfixes.

* added links and fixed a typo.

Redesign Jenkinsfiles (apache#12000)

* Rework Jenkinsfile

* Add functionality to assign node labels dynamically

* Extract functions into util file

* Change all Jenkinsfiles to use utils

* Make a new commit...

* Address review comments 1

* Address review comments 2

fix unidirectional model's parameter format (apache#12055)

* fix unidirectional model's parameter format

* Update rnn_layer.py

Fix syntax errors in Jenkinsfiles (apache#12095)

[MXAPPS-581] Straight Dope nightly fixes. (apache#11934)

Enable 3 notebooks that were failing tests after making updates to the
Straight Dope book. We also add pandas required by one of these
notebooks.

Fix jenkinsfile syntax errors (apache#12096)

remove fixed seed for test_triplet_loss (apache#12011)

got rid of fixed seed for test_optimizer/test_operator_gpu.test_ftml (apache#12003)

[MXNET-696] Fix undefined variable errors (apache#11982)

* Fix undefined error in image segmentation

ctx is used undefined. Setting the default ctx to cpu and
editing the comment to let the user know that it can be
changed to GPU as required.

* Fix undefined names in SSD example

maskUtils is disabled. Remove code referencing it.
Initializing start_offset.

got rid of fixed seed for test_optimizer/test_operator_gpu.test_nag (apache#11981)

Fix flaky test for elementwise_sum (apache#11959)

Re-enabling test_operator.test_binary_math_operators (apache#11712) (apache#12053)

Test passes on CPU and GPU (10000 runs)

update docs to explain CPU incompatibilities (apache#11931)

removed fixed from test_optimizer.test_signum (apache#12088)

Add missing object to tests/nightly/model_backwards_compatibility_check/JenkinsfileForMBCC (apache#12108)

Add GetName function in Symbol class for cpp pack (apache#12076)

Add unique number of parameters to summary output in Gluon Block (apache#12077)

* add unique parameters in summary output

* rebuild

Update fully_connected.cc documentation (apache#12097)

[MXNET-244] Update RaspberryPI instructions (apache#11562)

* Update RaspberryPI instructions

[MXNET-749] Correct usages of `CutSubgraph` in 3 control flow operators (apache#12078)

* Fix cut graph

* Copy only when necessary

* Add unittest for while_loop

* Add unittest for foreach

* Add unittest for cond

* Avoid magic number: 0 => kUndefinedStorage

[MXNET-703] TensorRT runtime integration (apache#11325)

* [MXNET-703] TensorRT runtime integration

Co-authored-by: Clement Fuji-Tsang <[email protected]>
Co-authored-by: Kellen Sunderland <[email protected]>

* correctly assign self._optimized_symbol in executor

* declare GetTrtCompatibleSubsets and ReplaceSubgraph only if MXNET_USE_TENSORRT

* add comments in ReplaceSubgraph

* Addressing Haibin's code review points

* Check that shared_buffer is not empty when USE_TENSORRT is set

* Added check that TensorRT binding is for inference only

* Removed redundant decl.

* WIP Refactored TRT integration and tests

* Add more build guards, remove unused code

* Remove ccache report

* Remove redundant const in declaration

* Clean Cmake TRT files

* Remove TensorRT env var usage

We don't want to use environment variables with TensorRT yet, the
logic being that we want to try and have as much fwd compatiblity as
possible when working on an experimental feature.  Were we to add
env vars they would have to be gaurenteed to work in the future until
a major version change.  Moving the functionality to a contrib call
reduces this risk.

* Use contrib optimize_graph instaed of bind

* Clean up cycle detector

* Convert lenet test to contrib optimize

* Protect interface with trt build flag

* Fix whitespace issues

* Add another build guard to c_api

* Move get_optimized_symbol to contrib area

* Ignore gz files in test folder

* Make trt optimization implicit

* Remove unused declaration

* Replace build guards with runtime errors

* Change default value of TensorRT to off

This is change applies to both TensorRT and non-TensorRT builds.

* Warn user when TRT not active at runtime

* Move TensorRTBind declaration, add descriptive errors

* Test TensorRT graph execution, fix bugs

* Fix lint and whitespace issues

* Fix typo

* Removed default value for set_use_tensorrt

* Improved documentation and fixed spacing issues

* Move static exec funcs to util files

* Update comments to match util style

* Apply const to loop element

* Fix a few namespace issues

* Make static funcs inline to avoid compiler warning

* Remove unused inference code from lenet5_train

* Add explicit trt contrib bind, update tests to use it

* Rename trt bind call

* Remove documentation that is not needed for trt

* Reorder arguments, allow position calling

Decrease success rate to make test more stable (apache#12092)

I have added this test back to unit test coverage and decreased success rate even more, to make sure that fails would happen even more rare

Add Clojure to website nav (apache#12075)

* adding clojure to API navigation

* adding clojure to the sidebar

* switched order

Fix flaky tests for quantize and requantize (apache#12040)

[MXNET-703] Use relative path for symbol import (apache#12124)

Fix shared memory with gluon dataloader, add option pin_memory (apache#11908)

* use threading for mp dataloader fetching, allow pin_memory option

* allow pin tuple of data into cpu_pinned

* fix as_in_context if not cpu_pinned

* fix cpu_pinned

* fix unittest for windows, update doc that windows mp is available

* fix pin_memory

* fix lint

* always use simplequeue for data queue

* remove main thread clearing for data_queue

* do not use outside folder as pythonpath but run nosetests inside

* use :MXNET_LIBRARY_PATH= to locate dll

* fix dll path

* correct dll path

reduce a copy for rowsparse parameter.reduce (apache#12039)

GPU Memory Query to C API (apache#12083)

* add support for GPU memory query

* remove lint

take custom dataset into consideration (apache#12093)

[MXNET-782] Fix Custom Metric Creation in R tutorial (apache#12117)

* fix tutorial

* install instructions

* fix typo

[MXAPPS-805] Notebook execution failures in CI. (apache#12068)

* [MXAPPS-805] Notebook execution failures in CI.

* Add a retry policy when starting a notebook executor to handle the failure to
 start a notebook executor (due to a port collision, kernel taking too
 long to start, etc.).

* Change logging level for tests to INFO so that we have more
 informative test output.

* Make retry logic for Jupyter notebook execution specific to the error
message we are looking for to prevent false positives in the retry logic.

rm wrong infertype for AdaptiveAvgPool and BilinearReisze2D (apache#12098)

Document MXNET_LIBRARY_PATH environment variable which was not documented explicitly. (apache#12074)

Generalized reshape_like operator (apache#11928)

* first commit

* fix documentation

* changed static_cast<bool>(end) to end.has_value()
fixed documentation issues

* change begin from int to optional

* test None as lhs

fix cython nnvm include path (apache#12133)

CI scripts refinements. Separate Py2 and Py3 installs cripts. Fix perms. (apache#12125)

 zipfian random sampler without replacement  (apache#12113)

* code compiles

* update doc

* fix bug and add test

* fix lint

update dmlc-core (apache#12129)

Fix quantized graphpass bug (apache#11937)

* fix quantized graphpass bug

* add residual quantization testcase

* handle dtype and backend issues

support selu activation function (apache#12059)

Fix flaky test test_operator_gpu:deformable_conv and deformable_psroi_pooling (apache#12070)

[MXNET-767] Fix flaky test for kl_loss (apache#11963)

* Fix flaky test for kl_loss

* remove comment.

[MXNET-788] Fix for issue apache#11733 pooling op test (apache#12067)

* added support to check_consistency function to generate random numbers for a specific datatype (ie. fp16)
this ensures that for tests that compare results among different precisions, that data is generated in the least precise type and casted to the most precise

changed test_pooling_with_type test case to specify fp16 precision for random input data
renamed the 2nd test_pooling_with_type function to test_pooling_with_type2 so it doesnt redefine the first and both are tested

fixed equation formatting issue in pooling operator description

Added myself to the contributors readme file

* updated from latest in master (had old version of the file)

* shortened lines per lint spec

* renamed default_type argument to rand_type for clarity
updated function docstring with argument description

removed rand_type setting for non-max pooling tests

* cleaned up check_consistency function docstring

Do not show "needs to register block" warning for registered blocks. (apache#12130)

Fix precision issue of test case test_rnnrelu_bidirectional (apache#12099)

* adjust tolerance only for relu for fixing test case bug

* only adjust torence for test_rnnrelu_bidirectional and adjust back on test_rnnrelu_sym

Accelerate the performance of topk for CPU side (apache#12085)

* Accelerate the performance of topk for CPU side

* Add comments for the code changes

Remove unused TensorRT code (apache#12147)

Removing some python code that isn't in the current TensorRT execution paths.
This should make the code more readable and avoid potential linting errors.

Thanks to @vandanavk for pointing out the dead code and @cclauss for a quick
alternative fix.

Co-authored-by: Vandana Kannan <[email protected]>
Co-authored-by: cclauss <[email protected]>

Disable test_io.test_CSVIter (apache#12146)

Fix RAT license checker which is broken in trunk (apache#12148)

Remove obsolete CI folder

set bind flag after bind completes (apache#12155)

Fix MXPredReshape in the c_predict_api (apache#11493)

* Fix MXPredReshape in the c_predict_api.

* Add unittest for the C predict API.

* Fix path in the test.

* Fix for Windows.

* Try again to fix for Windows.

* One more try to fix test on Windows.

* Try again with CI.

* Try importing from mxnet first if cannot find the amalgamation lib.

* Add a log message when libmxnet_predict.so is not found.

* Set specific rtol and atol values.

* Fix missing rtol and atol values.

* Empty commit.

* Try again with CI.

* One more try with CI.

* Retry CI.

[Flaky Test] Fix test_gluon_model_zoo.test_models when MXNET_MKLDNN_DEBUG=1  (apache#12069)

* reorder inputs

* use function flatten vs build in method

* update similar array atoi to 0.01

* fix reorder

* enable MXNET_MKLDNN_DEBUG in CI

* add exclude debug flag

* fix lint

* add warning log for excluded op

* retrigger

RAT check readme updated (apache#12170)

update ndarray stack Doc for apache#11925 (apache#12015)

* update ndarray stack Doc

Add worker_fn argument to multiworker function (apache#12177)

* add worker_fn argument to multiworker function

* fix pylin

Remove fixed seed for test_huber tests (apache#12169)

Removed fixed seed and increased learning rate and tolerance for test_nadam (apache#12164)

documentation changes. added full reference (apache#12153)

* documentation changes. added full reference

* fixing lint

* fixing more lint

* jenkins

* adding the coding line utf-8

Partially enable flaky test for norm operator (apache#12027)

add examples for slicing option (apache#11918)

Module predict API can accept NDArray as input (apache#12166)

* forward and predict can accept nd.array np.array

[MXNET-744] Docs build tools update (apache#11990)

[MXNET-744] Docs build tools update (apache#11990)

[MXNET-696] Fix undefined name errors (apache#12137)

* Fix undefined name error in neural style example

* Fix import exception error

* Fix undefined name in AUCMetric

* Fix undefined name in a3c example

Fix profiler executer when memonger is used (apache#12152)

add handling for grad req type other than kNullOp for indices (apache#11983)

Fix a minor bug in deformable_im2col.cuh (apache#12060)

Function `deformable_col2im_coord ` called deformable_col2im_coord_gpu_kernel but check the deformable_col2im_gpu_kernel.

[MXNet-744] Fix website build pipeline Python 3 issues (apache#12195)

* Fix website build pipeline Python 3 issues (apache#12195)

Fix MKLDNNSum cpp test failure (apache#12080)

bump timeout on Jenkins for docs/website to 120 min (apache#12199)

* bump timeout on Jenkins to 120 min

* add branches to settings using v notation; apply appropiate settings

Fixing typo in python/mxnet/symbol/image.py (apache#12194)

Fixing typo in python/mxnet/symbol/image.py

Fix the topk regression issue (apache#12197) (apache#12202)

* Fix the topk regression issue (apache#12197)

* Add comments

pull changes in from master
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review Website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants