-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix some issues with cargo doc
and the new feature resolver.
#9077
Conversation
There are some cases where `cargo doc` will try to document two things with the same crate_name. This attempts to automatically remove some of those duplicates based on some rules: - Prefers dependencies for the target over dependencies for the host (such as proc-macros). - Prefers the "newest" version if it comes from the same source. There are still plenty of situations where there can be collisions, but I'm uncertain on the best way to handle those.
(rust-highfive has picked a reviewer for you, use r? to override) |
Oof. I was gonna review this before heading out for today, but I think it might be best if I left this for next week. FWIW I'll be out until next Tuesday, but I'll get to this then! If this is urgent to fix in nightly, however, feel free to land and I'll review after-the-fact. A quick skim and seems fine to me, but I'd like to review the pruning of the unit graph more closely. |
No worries, and no rush, next week is plenty fast. There is a comment at the start of This feels a bit hacky, but I couldn't really think of a cleaner solution (other than redesigning rust, cargo, and rustdoc from scratch 😜 ). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh dear this is indeed quite subtle. I do think the best solution here is some extra flags to rustdoc, but that's more of a long-term issue than one to solve right now.
Otherwise can you beef up the documentation for each of the cases where duplicates are removed? Explaining in words a small example of what we're deduplicating is sort of what I'm thinking.
This is definitely going to be buggy in the sense that by selecting the newest version that means that broken links could be generated if links are supposed to reference an older version. Again though this is all just an artifact of rustdoc's current design which wasn't intended to cover cases like this (it was designed in a simpler time...)
/// - Different sources. See `collision_doc_sources` test. | ||
/// | ||
/// Ideally this would not be necessary. | ||
fn remove_duplicate_doc(build_config: &BuildConfig, unit_graph: &mut UnitGraph) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW long-term I don't think that this function should exist. Ideally rustdoc would have a feature that we could specify the directory/lookup name and we could pass that in to avoid all duplication issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. However, it is not clear to me how that should work. Because the directories are exposed as URLs, I think it would be unpleasant for those to have Cargo-style hashes in them. There could be some pattern like $NAME-$VERSION and then have extra disambiguators when necessary like $NAME-$VERSION-bin-foo. But there are some collisions that are harder (like when they only differ in the features).
What do you think about having a hash in the URL?
This feels like a bit of a mess, and I'm uncertain how to climb out of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for cargo doc
locally I feel like the hash doesn't matter too much since you'll just be clicking on links anyway (assuming we generate some well-known landing page which we don't already do today). For everything else though (like doc.rust-lang.org or docs.rs) I wouldn't want Cargo-style hashes in URLs.
This function is likely gonna be here for a long long time which is by no means the end of the world. I think it'll take some more thinking to figure out what it might look like to not have this function.
for unit_deps in unit_graph.values_mut() { | ||
unit_deps.retain(|unit_dep| !units.iter().any(|unit| *unit == unit_dep.unit)); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly worried about the performance of this since this could be in the worst case O(n^2). Would it be possible to maintain a set of units to remove and they're all processed together at the very end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yea. Pushed a change to filter them once.
} | ||
} | ||
// Prefer newer versions over older. | ||
let mut source_map: HashMap<(InternedString, SourceId, CompileKind), Vec<Unit>> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the key here perhaps be BTreeMap<Version, Unit>
? (to avoid the extra sort later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so (per the other comment, there can be multiple entries with the same version).
let newest_version = units.last().unwrap().pkg.version().clone(); | ||
let (to_remove, keep_units): (Vec<Unit>, Vec<Unit>) = units | ||
.into_iter() | ||
.partition(|unit| unit.pkg.version() < &newest_version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given a source it's not possible to have multiple crates of the same name at the same version, right?
Is it possible for keep_units
to have length greater than 1? (I'm probably missing something...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be very rare, but it is possible.
One example is cargo doc --bins --lib
. This will try to document both the library and the binary, which have the same name and thus end up colliding.
I'm not sure how that should behave. Should it be an error? Should it ignore the binary? Should rustdoc put those in separate directories?
There are a bunch of collision opportunities, and this isn't intended to cover all of them, just a few that seemed reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm nah that's a good point. I don't know of a better thing to do here other than warn really at this time unfortunately.
@bors: r+ |
📌 Commit c5e3f17 has been approved by |
☀️ Test successful - checks-actions |
Fix panic with doc collision orphan. As I feared, the collision removal added in #9077 caused problems due to orphans left in the unit graph. Ironically, it was the collision warning detection code which failed, as it iterates over all the keys of the graph. The solution is to remove all orphans from the graph after collisions have been removed. Fixes #9136
Fix panic with doc collision orphan. As I feared, the collision removal added in rust-lang#9077 caused problems due to orphans left in the unit graph. Ironically, it was the collision warning detection code which failed, as it iterates over all the keys of the graph. The solution is to remove all orphans from the graph after collisions have been removed. Fixes rust-lang#9136
This contains two related commits fixing some issues with
cargo doc
and the new feature resolver.The first commit fixes a panic. The old code was using
UnitFor::new_normal
when computing doc units, but this was wrong. That essentially circumvents the new feature resolver, and breaks determining the correct features to use. I don't remember exactly what I was thinking when I wrote that, other than "rustdoc shouldn't care", but it does matter.Unfortunately changing that makes collisions worse because it aggravates #6313, so the second commit adds some logic for removing some collisions automatically. Specifically:
There are still plenty of situations where there can be collisions, but I'm uncertain on the best way to handle those.
Fixes #9064