fix parallel read error and add test #112

rscohn2 · 2020-09-30T18:31:46Z

Resolves #111

humitos

This is super cool! Thanks for taking the time to research about this and work in a solution! 😄

humitos · 2020-10-01T07:06:10Z

tests/test_urls.py

@@ -27,6 +28,10 @@ def remove_sphinx_build_output():
            shutil.rmtree(build_path)


+@pytest.mark.sphinx(srcdir=srcdir)
+def test_parallel_build():
+    subprocess.check_call('sphinx-build -j 2 -W -b html tests/examples/parallel-build build', shell=True)


can't we call app.build here instead, with a parallel argument here in some way?

I mean, similar to what we are doing in the rest of the tests.

The other tests rely on SphinxTestApp, which does not have a way to pass parallel=2 through to the Sphinx constructor. I submitted a PR to add it, but we would have to wait for the next sphinx release, and it would not be usable for your testing that runs on older versions of sphinx. You can see a discussion of the problem in a PR for another extension: executablebooks/sphinx-book-theme#225 (comment)

I see. Makes sense. So, should we check stderr or exit code or similar here to be sure that the subprocess didn't failed?

Oh, check call_raises an exception.

Right. And I added the test first and verified that it caught the problem before adding the fix.

humitos · 2020-10-01T07:07:11Z

notfound/extension.py

+    def merge_other(self, app, env, docnames, other):
+        """Merge in specified data regarding docnames from a different `BuildEnvironment`
+        object which coming from a subprocess in parallel builds."""
+        env.metadata.update(other.metadata)


Do you have an example of other collector that works in parallel and implement this method? It would be good to put it here as reference as well.

I could not find an example, so I read the documentation and printed out values while processing documents. Maybe @jakobandersen would be willing to look at it, because he diagnosed the problem here: sphinx-doc/sphinx#8256

I'm not familiar with what this extension is doing, so perhaps the following is irrelevant: technically you should not copy all data from other, but only the data related to the documents in docnames. However, I don't think I ever ran into a case where this in practice didn't mean "copy everything from other", so maybe is really is a guarantee that other at most contains data about those documents.

It will be easier to add the filter than to prove whether or not it is needed. I will submit another PR.

humitos

Thanks again!

tests/test_urls.py

notfound/extension.py

Added some small TODO comments to come back to this in the future and have some context.

rscohn2 · 2020-10-01T14:30:18Z

Thanks for making the effort to share the notfound extension. We use it here: https://spec.oneapi.com/versions/latest/index.html. The PDF is 1800 pages long and parallel read reduces processing time from 4.5 to 3.5 minutes, which helps when we are editing the doc.

samccann · 2020-12-15T20:31:27Z

Hi @humitos - do you have a target date for when this fix will be in an official release? We're hitting this now on the Ansible documentation builds (with 0.5) and running the same doc builds with master here seems to fix it.

I'm not a coder so can't give much help in that regard, but if there's some noncoding help you need for this, let me know. Glad to help out on something we depend on :-)

humitos · 2021-01-04T12:39:38Z

@samccann I just released 0.6 that includes this change. Thanks for contacting us. Let me know if everything works as expected.

samccann · 2021-01-04T17:14:47Z

@humitos thanks yes it all works now!

rscohn2 force-pushed the master branch from 87127d4 to 846652c Compare September 30, 2020 18:42

fix parallel read error and add test

47887ec

rscohn2 force-pushed the master branch from 846652c to 47887ec Compare September 30, 2020 18:48

humitos reviewed Oct 1, 2020

View reviewed changes

rscohn2 mentioned this pull request Oct 1, 2020

parallel read error #111

Closed

humitos approved these changes Oct 1, 2020

View reviewed changes

tests/test_urls.py Show resolved Hide resolved

notfound/extension.py Show resolved Hide resolved

Apply suggestions from code review

be0e512

Added some small TODO comments to come back to this in the future and have some context.

humitos merged commit 842ff84 into readthedocs:master Oct 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix parallel read error and add test #112

fix parallel read error and add test #112

rscohn2 commented Sep 30, 2020

humitos left a comment

humitos Oct 1, 2020

humitos Oct 1, 2020

rscohn2 Oct 1, 2020

humitos Oct 1, 2020

humitos Oct 1, 2020

rscohn2 Oct 1, 2020

humitos Oct 1, 2020

rscohn2 Oct 1, 2020

jakobandersen Oct 1, 2020

rscohn2 Oct 1, 2020

humitos left a comment

rscohn2 commented Oct 1, 2020

samccann commented Dec 15, 2020

humitos commented Jan 4, 2021

samccann commented Jan 4, 2021

fix parallel read error and add test #112

fix parallel read error and add test #112

Conversation

rscohn2 commented Sep 30, 2020

humitos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

humitos left a comment

Choose a reason for hiding this comment

rscohn2 commented Oct 1, 2020

samccann commented Dec 15, 2020

humitos commented Jan 4, 2021

samccann commented Jan 4, 2021