-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix parallel read error and add test #112
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super cool! Thanks for taking the time to research about this and work in a solution! 😄
@@ -27,6 +28,10 @@ def remove_sphinx_build_output(): | |||
shutil.rmtree(build_path) | |||
|
|||
|
|||
@pytest.mark.sphinx(srcdir=srcdir) | |||
def test_parallel_build(): | |||
subprocess.check_call('sphinx-build -j 2 -W -b html tests/examples/parallel-build build', shell=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we call app.build
here instead, with a parallel argument here in some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, similar to what we are doing in the rest of the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other tests rely on SphinxTestApp, which does not have a way to pass parallel=2 through to the Sphinx constructor. I submitted a PR to add it, but we would have to wait for the next sphinx release, and it would not be usable for your testing that runs on older versions of sphinx. You can see a discussion of the problem in a PR for another extension: executablebooks/sphinx-book-theme#225 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Makes sense. So, should we check stderr or exit code or similar here to be sure that the subprocess didn't failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, check call_raises an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. And I added the test first and verified that it caught the problem before adding the fix.
def merge_other(self, app, env, docnames, other): | ||
"""Merge in specified data regarding docnames from a different `BuildEnvironment` | ||
object which coming from a subprocess in parallel builds.""" | ||
env.metadata.update(other.metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example of other collector that works in parallel and implement this method? It would be good to put it here as reference as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find an example, so I read the documentation and printed out values while processing documents. Maybe @jakobandersen would be willing to look at it, because he diagnosed the problem here: sphinx-doc/sphinx#8256
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with what this extension is doing, so perhaps the following is irrelevant: technically you should not copy all data from other
, but only the data related to the documents in docnames
. However, I don't think I ever ran into a case where this in practice didn't mean "copy everything from other
", so maybe is really is a guarantee that other
at most contains data about those documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be easier to add the filter than to prove whether or not it is needed. I will submit another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again!
Added some small TODO comments to come back to this in the future and have some context.
Thanks for making the effort to share the notfound extension. We use it here: https://spec.oneapi.com/versions/latest/index.html. The PDF is 1800 pages long and parallel read reduces processing time from 4.5 to 3.5 minutes, which helps when we are editing the doc. |
Hi @humitos - do you have a target date for when this fix will be in an official release? We're hitting this now on the Ansible documentation builds (with 0.5) and running the same doc builds with master here seems to fix it. I'm not a coder so can't give much help in that regard, but if there's some noncoding help you need for this, let me know. Glad to help out on something we depend on :-) |
@samccann I just released 0.6 that includes this change. Thanks for contacting us. Let me know if everything works as expected. |
@humitos thanks yes it all works now! |
Resolves #111