-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mergeable rustdoc cross-crate info #3662
Conversation
cc @rust-lang/rustdoc |
I'm very in favor of this! In addition to the parallel building use case, I think it will make it much more feasible to support documentation of namespaced crates as a group. We discussed this a bit on Zulip last month. And merging the parts after doc generation is generally a nicer design. For implementation details: I'd like for the cross crate info to be written in JSON, not JS, and for the merge step to generate the JS. The JS as it stands is just a very thin callback wrapper around JSON. We do this so docs can be loaded from file:// URLs (where fetching JSON files is not supported). But we don't need the merge step to understand the JS. Some bikeshedding about the flags: I don't like the names It seems to me that the merge step is effectively a subcommand, except that since rustdoc doesn't have subcommands it's triggered by a combination of Could you add a section to the RFC about compatibility? For compatibility on disk, I'm assuming we won't support merging cross crate info that was generated by disparate versions of rustdoc. As an alternate design, what about this:
|
I would prefer not to put large files in the Instead, I'd design the CLI with two parameters:
|
@jsha, I love that there are more use cases than we initially anticipated! I agree that the intermediate cross-crate info should be written in JSON, and for the merge step to generate JS (and HTML where applicable)! The current proposal / WIP writes intermediate files as JSON ( Here are what the combinations of
I initially conceptualized merging as a subcommand, but you could also see it as an incremental enhancement: if you're calling rustdoc on one of your binaries or libraries, you still might want to link-in the It seems like Based off of @notriddle and @jsha's feedback, I will update the RFC + WIP with these items:
I'll have to work on these suggestions on Monday, but definitely give me more feedback. Thank you for engaging with this proposal! |
Updated the RFC with the items mentioned in my previous comment. I chose the name |
rustdoc -Z unstable-options --crate-name=i --crate-type=lib --edition=2021 --enable-index-page --out-dir=i/target/doc --extern-html-root-url s=$(MERGED) --extern-html-root-url t=$(MERGED) --extern-html-root-url i=$(MERGED) --merge=write-only --include-info-json t=t/target/doc.parts --include-info-json s=s/target/doc.parts --extern t=t/target/libt.rmeta --extern s=s/target/libs.rmeta -L t/target i/src/lib.rs | ||
``` | ||
|
||
Merge the docs with `cp`. This can be avoided if `--out-dir=$(MERGED)` is used for all of the rustdoc calls. We copy here to illustrate that documenting `s` is independent of documenting `t`, and could happen on separate machines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense for the rustdoc merge invocation to do all copying too? This would allow the per-crate rustdoc invocations to avoid copying all shared resources in their own doc dirs, saving disk space. And in general it seems like it would make things easier for the build system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cp
is the stand-in for the step that takes docs from external crates and collects them in a single web root. Depending on the location of the external crates, this step may look very different:
- It may do nothing;
Cargo
generates the docs in a singledoc/
folder. cp
ormv
files from one location on the filesystem to another. This is shown in the example in the RFC.scp
,rsync
,wget --recursive
, a proprietary protocol, etc. This RFC is intended to enable CCI in crates that have been documented on separate machines.
Because of the wide variety of locations that external docs may be, I believe that the build system is best positioned to collect them into a single location.
When you pass the --merge=none
flag to document the external crates, the shared resources are not generated. Only crate-info.json
(--write-info-json
) and the item docs will be generated for these crates. The shared resources are only generated when you pass --merge=auto
(default) or --merge=write-only
, which is only provided when documenting the index or root crate. There are not any redundant files to bloat the disk space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then please update so that it is mentioned that rustdoc will handle the copying internally as we don't want to rely on external tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will update it so that rustdoc does the copying.
I updated the RFC + WIP implementation so that The existing I hacked compiletest to pass |
If there are no further blockers for this, can we put this RFC into FCP? |
|
||
# Guide-level explanation | ||
|
||
In this example, there is a crate `t` which defines a trait `T`, and a crate `s` which defines a struct `S` that implements `T`. Our goal in this demo is for `S` to appear as in implementer in `T`'s docs, even if `s` and `t` are documented independently. This guide will be assuming that we want a crate `i` that serves as our documentation index. See the Unresolved questions section for ideas that do not require an index crate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example, there is a crate `t` which defines a trait `T`, and a crate `s` which defines a struct `S` that implements `T`. Our goal in this demo is for `S` to appear as in implementer in `T`'s docs, even if `s` and `t` are documented independently. This guide will be assuming that we want a crate `i` that serves as our documentation index. See the Unresolved questions section for ideas that do not require an index crate. | |
In this example, there is a crate `t` which defines a trait `T`, and a crate `s` which defines a struct `S` that implements `T`. Our goal in this demo is for `S` to appear as an implementer in `T`'s docs, even if `s` and `t` are documented independently. This guide will be assuming that we want a crate `i` that serves as our documentation index. See the Unresolved questions section for ideas that do not require an index crate. |
│ └── lib.rs.html | ||
├── doc.parts | ||
│ └── t | ||
│ │ └── crate-info.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it bugs me. 😆
│ │ └── crate-info.json | |
│ └── crate-info.json |
|
||
When `read_rendered_cci` is active, rustdoc will look in the `--out-dir` for rendered cross-crate info files. These files will be used as the base. Any new parts that rustdoc generates with its current invocation and any parts fetched with `include-info-json` will be appended to these base files. When it is disabled, the cross-crate info files start empty and are populated with the current crate's info and any crates fetched with `--include-info-json`. | ||
|
||
* `--merge=auto` (`read_rendered_cci && write_rendered_cci`) is the default, and reflects the current behavior of rustdoc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of auto
. Why not read-write
instead? Would be more explicit (took me a while to get why it was different to write-only
😅).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good!
|
||
This RFC does not alter previous compatibility guarantees made about the output of rustdoc. In particular it does not stabilize the presence of the rendered cross-crate information files, their content, or the HTML generated by rustdoc. | ||
|
||
In the same way that the [rustdoc HTML output is unstable](https://rust-lang.github.io/rfcs/2963-rustdoc-json.html#:~:text=The%20HTML%20output%20of%20rustdoc,into%20a%20different%20format%20impractical), the contents of the `crate-info.json` will be considered unstable. Between versions of rustdoc, breaking changes to the content of `crate-info.json` should be expected. Only the presence of a `crate-info.json` file is promised, under `--write-info-json`. Merging cross-crate information generated by disparate versions of rustdoc is not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same way that the [rustdoc HTML output is unstable](https://rust-lang.github.io/rfcs/2963-rustdoc-json.html#:~:text=The%20HTML%20output%20of%20rustdoc,into%20a%20different%20format%20impractical), the contents of the `crate-info.json` will be considered unstable. Between versions of rustdoc, breaking changes to the content of `crate-info.json` should be expected. Only the presence of a `crate-info.json` file is promised, under `--write-info-json`. Merging cross-crate information generated by disparate versions of rustdoc is not supported. | |
In the same way that the [rustdoc HTML output is unstable](https://rust-lang.github.io/rfcs/2963-rustdoc-json.html#:~:text=The%20HTML%20output%20of%20rustdoc,into%20a%20different%20format%20impractical), the content of the `crate-info.json` will be considered unstable. Between versions of rustdoc, breaking changes to the content of `crate-info.json` should be expected. Only the presence of a `crate-info.json` file is promised, under `--write-info-json`. Merging cross-crate information generated by disparate versions of rustdoc is not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You say:
Merging cross-crate information generated by disparate versions of rustdoc is not supported.
But in the description of the content of the file, you don't mention that the version of rustdoc is stored or how it's stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a suggestion about how to version the crate-info.json
files? We could put the semver as a key in the JSON and accept an --info-json-version
flag. Rustdoc would fail if it is passed an --include-info-json
with an incompatible file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a tricky question. Currently the rustdoc JSON output has a simple counter but it's something we're thinking about changing, but we didn't find a new versioning system yet (you can read about it here).
So really not sure what to pick, but this is definitely a blocker for this RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that an incremental single-digit version number makes the most sense. The RFC only discusses the case where crate-info.json files are written by rustdoc, and consumed by the same version rustdoc, so it doesn't seem important to detail what constitutes a breaking change. Also the content of the files is tied to the internal details of rustdoc (including the details of the search index). I'll make rustdoc write an incremental single-digit version to the crate-info.json file, and rustdoc will fail if it's called with an --include-info-json
pointing to a crate-info.json
with a version that does not match its internal version.
* remove 'Unresolved: index crate?' from the unresolved questions section * remove the requirement on the index crate from various sections
I updated the RFC to resolve in favor of not needing an index crate to merge docs in a workspace, as explained above. |
* Adds `--merge=shared|none|finalize` flags * Adds `--parts-out-dir=<crate specific directory>` for `--merge=none` to write cross-crate info file for a single crate * Adds `--include-parts-dir=<previously specified directory>` for `--merge=finalize` to write cross-crate info files * Longer crate names in previous cross-crate-info tests * Unit tests for above
That's the merged search index, though - not each crate's search index.
This is a good point. It would perhaps make sense to consider all of the regular rustdoc outputs to be intermediates, which are merged in the merge step. But maybe that introduces an excessive number of filesystem operations? At any rate, in the interests of unblocking the work I'm willing to concede this point and say the extra flag is okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added my approval on the FCP comment. Also one editorial nit below.
@@ -33,26 +35,20 @@ More details are in the Reference-level explanation. | |||
* `--include-parts-dir=path/to/doc.parts/<crate-name>`: Include cross-crate information from this previously written `doc.parts` directories into a collection that will be written by the current invocation of rustdoc. May only be provided with `--merge=finalize`. May be provided any number of times. | |||
* `--merge=none`: Do not write cross-crate information to the `--out-dir`. The flag `--parts-out-dir` may instead be provided with the destination of the current crate's cross-crate information parts. | |||
* `--merge=shared` (default): Append information from the current crate to any info files found in the `--out-dir`. | |||
* `--merge=finalize`: Write cross-crate information from the current crate and any crates included via `--include-parts-dir` to the `--out-dir`, overwriting conflicting files. | |||
* `--merge=finalize`: Write cross-crate information from the current crate and any crates included via `--include-parts-dir` to the `--out-dir`, overwriting conflicting files. This flag may be used with or without an input crate root, as it can links existing docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `--merge=finalize`: Write cross-crate information from the current crate and any crates included via `--include-parts-dir` to the `--out-dir`, overwriting conflicting files. This flag may be used with or without an input crate root, as it can links existing docs. | |
* `--merge=finalize`: Write cross-crate information from the current crate and any crates included via `--include-parts-dir` to the `--out-dir`, overwriting conflicting files. This flag may be used with or without an input crate root, as it can link existing docs. |
typo in --merge=finalize description: can links
modularize rustdoc's write_shared Refactor src/librustdoc/html/render/write_shared.rs to reduce code duplication, adding unit tests * Extract + unit test code for sorting and rendering JSON, which is duplicated 9 times in the current impl * Extract + unit test code for encoding JSON as single quoted strings, which is duplicated twice in the current impl * Unit tests for cross-crate information file formats * Generic interface to add new kinds of cross-crate information files in the future * Intended to match current behavior exactly, except for a merge info comment it adds to the bottom of cci files * This PR is intended to reduce the review burden from my [mergeable rustdoc rfc](rust-lang/rfcs#3662) implementation PR, which is a [small commit based on this branch](https://github.com/EtomicBomb/rust/tree/rfc). This code is agnostic to the RFC and does not include any of the flags discussed there, but cleanly enables the addition of these flags in the future because it is more modular
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked my FCP box, but have a few editorial comments and requests for clarification (where the answer doesn't make a difference to weather I'm happy with this, as long as it get's written down in the RFC itself)
|
||
In the same way that the [rustdoc HTML output is unstable](https://rust-lang.github.io/rfcs/2963-rustdoc-json.html#:~:text=The%20HTML%20output%20of%20rustdoc,into%20a%20different%20format%20impractical), the content of `doc.parts` will be considered unstable. Between versions of rustdoc, breaking changes to the content of `doc.parts` should be expected. Only the presence of a `doc.parts` directory is promised, under `--parts-out-dir`. Merging cross-crate information generated by disparate versions of rustdoc is not supported. To detect whether `doc.parts` is compatible, rustdoc includes a version number in these files (see New directory: `doc.parts`). | ||
|
||
The implementation of the RFC itself is designed to produce only minimal changes to cross-crate info files and the HTML output of rustdoc. Exhaustively, the implementation is allowed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Editorial: What is this talking about? There was a linked draft implementation in the PR, but that was on the master branch of your fork, which now doesn't have it. It's worth clairifying that this is your WIP implementation (unless it isn't), and linking to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the reference. It was to a different RFC (#2963), but I don't think the reference is relevant anymore.
* full sentences in intro * Clarify Cargo's expected use of `--merge=none|shared|finalize` * Explain why you would use multiple rustdoc invocations with the same --out-dir and merge=none * Remove justification for why rustdoc output can be unstable in general * Permalink specific fuchsia commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `--merge=none`: Do not write cross-crate information to the `--out-dir`. The flag `--parts-out-dir` may instead be provided with the destination of the current crate's cross-crate information parts. | ||
* `--parts-out-dir=path/to/doc.parts/<crate-name>`: Write cross-crate linking information to the given directory (only usable with the `--merge=none` mode). This information allows linking the current crate's documentation with other documentation at a later rustdoc invocation. | ||
* `--include-parts-dir=path/to/doc.parts/<crate-name>`: Include cross-crate information from this previously written `doc.parts` directories into a collection that will be written by the current invocation of rustdoc. May only be provided with `--merge=finalize`. May be provided any number of times. | ||
* `--merge=shared` (default): Append information from the current crate to any info files found in the `--out-dir`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we plan to deprecate --merge=shared
anyway, what do you think about not adding it in the first place? --merge
already has to be an optional flag to avoid breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uniformity and ease of documentation. It's the same reason you're allowed to write edition = 2015
, even though that's the default if you don't specify anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I guess we won't ever be able to remove the shared implementation anyway, so it doesn't matter. And if we somehow did find a way to remove it while maintaining the outward behavior, that wouldn't be a breaking change anyway since it's an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you say we won't ever be able to remove the shared implementation? I've been assuming that after updating Cargo and docs.rs, and providing a decently long deprecation period we could remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm trying to remember what I was thinking ^^. I think I was worried that there might be existing build systems or other tools that were depending on the shared representation, but we never guaranteed stability of it. So perhaps it would be fine to remove after all.
🔔 This is now entering its final comment period, as per the review above. 🔔 |
Change sentence reading "The WIP may change the sorting order..." to "The implementation may change the sorting order..." Co-authored-by: Noah Lev <[email protected]>
modularize rustdoc's write_shared Refactor src/librustdoc/html/render/write_shared.rs to reduce code duplication, adding unit tests * Extract + unit test code for sorting and rendering JSON, which is duplicated 9 times in the current impl * Extract + unit test code for encoding JSON as single quoted strings, which is duplicated twice in the current impl * Unit tests for cross-crate information file formats * Generic interface to add new kinds of cross-crate information files in the future * Intended to match current behavior exactly, except for a merge info comment it adds to the bottom of cci files * This PR is intended to reduce the review burden from my [mergeable rustdoc rfc](rust-lang/rfcs#3662) implementation PR, which is a [small commit based on this branch](https://github.com/EtomicBomb/rust/tree/rfc). This code is agnostic to the RFC and does not include any of the flags discussed there, but cleanly enables the addition of these flags in the future because it is more modular
modularize rustdoc's write_shared Refactor src/librustdoc/html/render/write_shared.rs to reduce code duplication, adding unit tests * Extract + unit test code for sorting and rendering JSON, which is duplicated 9 times in the current impl * Extract + unit test code for encoding JSON as single quoted strings, which is duplicated twice in the current impl * Unit tests for cross-crate information file formats * Generic interface to add new kinds of cross-crate information files in the future * Intended to match current behavior exactly, except for a merge info comment it adds to the bottom of cci files * This PR is intended to reduce the review burden from my [mergeable rustdoc rfc](rust-lang/rfcs#3662) implementation PR, which is a [small commit based on this branch](https://github.com/EtomicBomb/rust/tree/rfc). This code is agnostic to the RFC and does not include any of the flags discussed there, but cleanly enables the addition of these flags in the future because it is more modular
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. |
We should discuss this before stabilization. In the meantime, this RFC needs to get actually merged, so I'll do that now. :) |
Huzzah! The @rust-lang/rustdoc team has decided to accept this RFC. To track further discussion, subscribe to the tracking issue here: |
Rendered
This RFC is to enable merging cross-crate information (like the search index, index.html, source files index) from separate output directories, so that rustdoc can run in parallel in non-cargo build systems.
Currently, rustdoc updates these cross-crate files with the information from a new crate by reading the old version, adding the information for the current crate, and writing the same file back. This causes issues in build systems that disallow multiple build actions targeting the same output files. Cargo supports cross-crate information because it doesn't have this restriction, Buck2 and Bazel do not support cross-crate information, and Fuchsia has a doc step that runs rustdoc serially on 2700 crates.
Supporting parallel rustdoc with CCI as a native step in Buck2, Bazel, and GN (Fuchsia + Chrome) would require mergable cross-crate information, which is what this proposal is about. It adds new flags,
--include-info-json
,--write-info-json
,--merge=none|read-write|write-only
and--include-rendered-docs
, to opt-in to these new features.