-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coerce major wildcards #13535
base: master
Are you sure you want to change the base?
Coerce major wildcards #13535
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @epage (or someone else) some time within the next two weeks. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
r? @Eh2406 |
I will try and make time to think about this more carefully as soon as I can. In the meantime I'm gonna quote something Ed routinely says:
Specifically in this case I'm pretty sure some of these tests would already pass before this PR. Which is fine, more tests for the existing functionality are always helpful. But it's not clear as a reviewer if you expected there to be a behavior change. |
Understood, I will force-push the commit tree to update it as soon as I'm done brainstorming more tests. |
Yes, |
4a6bc65
to
bf9754b
Compare
Ok I fixed the commit tree. Here is a summary of the tests New tests with current implementation
New tests with proposed implementation
There is a regression however:
This is because we actually fixed an unneccessary duplicate dependency within that test. To address this, I changed the assertion to remove the duplicate dependency and then added a new test that replicates the old test by explicitelly pinning the requirements to generate duplicate dependencies (see this commit):
|
@Nikita240 to clarify what @Eh2406 asked, each commit should be passing tests. This helps to test the tests and makes changes in behavior clearer to reviewers. From my quick look at the commits and with your last message, it sounds like this wasn't done. #13505 is an example of this |
Ahh, I misunderstood. Thanks |
bf9754b
to
ecd68bb
Compare
Rebuilt the tree again 👍🏻 |
I got a chance to look this over. My brings a little bit exploding with different ways I want to respond to this PR. Sorry if this comment ends up all over the place. So before I forget, thank you for working on this very tricky issue! Edit: Got sucked into a meeting and so this comment is unaware of the past two hours of progress, although I don't think it affects things that much. I'm going to try and respond from most detailed out toward most general. The most detailed feedback is about how things are named and where comments go, which I'm largely going to save till after we discussed the bigger questions. Except to thank you for working on tests. This stuff is tricky and hard to get right, I deeply appreciate your use of tests. At the next level the critical question here is what property is this trying to uphold? If we can define it strictly enough we can add a proptest and discover all kinds of fun counterexamples. Even without that level of strictness we can add an Specifically things get tricky with regard to what order operations are evaluated in. Most concretely this code moves checking Similarly in these small examples the wildcard requirements will get evaluated after the specific requirements, because we have a heuristic that requirements that match fewer versions get processed first. But this is not always true. We could have a situation the direct dependency is wildcard, only after we've evaluated that to be discovered that a transitive dependency adds a specific requirement. So I suspect that whatever property were trying to uphold this will only make it "heuristically" true and not "mathematically" true. Nothing wrong with that, the resolver involves a lot of load bearing heuristics, but a different kind of discussion. In the last level of abstraction is a question of design, is this "Doesn't unnecessarily include newer versions" a property we even want to cargos resolver to have?
|
I appreciate that. We can always work that out once there is consensus on the direction to take this.
We should indeed define this. Here is my attempt: "When a dependency is explicitly requested to have a non-SemVer-compatible range, it's compatibility across SemVer boundaries should be resolved in the same exact way that a SemVer compatible range would be. For example:
I share all of these concerns with you. I think it would be wise for me to refactor the code to ensure that these heuristics are strictly upheld (and to make it difficult to break in the future). My goal with the current implementation was to get working code with a minimal change-set to make it simpler to grok what the proposal does. Refactoring the code to be more strict would require a lot of code changes, probably a few levels up from the current function, so it would have been a huge diff to try to process.
This is the main question we have to answer. My opinion:You have two possible conditions to optimize for:
Which one of these a user actually wants is likely based on whether or not the dependency is private or public, which as far as I understand one of the goals of the public-private-dependencies RFC. However, until we have that, here are some assertions we can make:
Hence, I think the following logic is a fair compromise:
I basically view |
This seems pretty convincing to me, but I started a conversation with the rest of the cargo team on zulip.
New better abstractions would certainly help hear. The existing ones are not great. That being said if we find a way to make sure this "always" happens then we have proven
I think this is a necessary but insufficient condition for the properties we care about. As both of those examples are handled "the same exact way" currently, they match the largest version that makes the system work. (On a side note I will be out for a few days and may not be able to respond promptly until next week.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just sharing some thoughts before you dig into polishing the implementation.
@@ -260,16 +260,8 @@ fn activate_deps_loop( | |||
.conflicting(&resolver_ctx, &dep) | |||
.is_some(); | |||
|
|||
let mut remaining_candidates = RemainingCandidates::new(&candidates); | |||
|
|||
// `conflicting_activations` stores all the reasons we were unable to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment has a lot more to do with how conflicting_activations
influences the entire loop, so I think it should stay And not move with the ConflictMap::new();
.
/// `Candidate` is the candidate to attempt to activate, and the `bool` is | ||
/// an indicator of whether there are remaining candidates to try of if | ||
/// we've reached the end of iteration. | ||
fn next(&mut self) -> Option<(Summary, bool)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if next does not take additional arguments, then it should probably be a proper iterator. ( Or document why it can't implement that trait)
let mut conflicting_activations = ConflictMap::new(); | ||
|
||
let mut activated_candidates: Vec<Summary> = Vec::with_capacity(candidates.len()); | ||
let mut non_activated_candidates: Vec<Summary> = Vec::with_capacity(candidates.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused why we need two vectors here (that get concatenated) instead of just sorting one vec. and if it's just a sort, can it be combined with the existing sort at
pub fn sort_summaries( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not quite a sort. We're actually just splitting the vector into two buckets, while preserving the order of the elements.
While this can technically be done in place, you'd still need a tmp array to copy out segments. Here is the theoretical performance comparison:
N = activated candidates
M = total candidates
Current implementation with split arrays
item memcopies = 2 * M
array mallocs = 3
Current implementation if optimized
item memcopies = 2 * M - N
array mallocs = 1
In-place implementation
item memcopies = N * (M + 2)
array mallocs = 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what the optimal looks like imo:
pub trait PrioritizeVecExt<T> {
fn prioritize_by_predicate(&mut self, predicate: impl Fn(&T) -> bool) -> &mut Self;
}
impl<T: Clone> PrioritizeVecExt<T> for Vec<T> {
fn prioritize_by_predicate(&mut self, predicate: impl Fn(&T) -> bool) -> &mut Self {
let mut non_priority = Vec::with_capacity(self.len());
let mut i = 0;
for j in 0..self.len() {
if predicate(&self[j]) {
self[i] = self[j].clone();
i += 1;
} else {
non_priority.push(self[j].clone());
}
}
for k in 0..non_priority.len() {
self[i + k] = non_priority[k].clone();
}
self
}
}
#[test]
fn test_prioritize_by_predicate() {
let mut v = vec![1, 2, 3, 4, 5, 6, 7, 8, 9];
v.prioritize_by_predicate(|&x| x % 2 == 0);
assert_eq!(v, vec![2, 4, 6, 8, 1, 3, 5, 7, 9]);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but:
candidates.sort_by_key(|c| cx.activated.contains(c))
is a lot shorter to read.- It takes a lot of pointer (
Summary
) copying to compete with the cost of two allocations. - We already have code
sort_summaries
sorting this array.
Overall we should merge whichever solution is the easiest to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that's neat. I didn't realize sort_by_key
would maintain segment order when used with a boolean predicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the sort functions maintain insertion order anytime the predicates are equal. They each come with a _unstable
variant that is a little faster but does not uphold this property. (And yes, it's unfortunate that unstable
has a specific meaning in rust and a different specific meaning in sorting algorithms.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok after poking around a bit I realized we cannot filter & sort in-place. The outer loop relies on the original candidates
vector to generate the Error here:
debug!("no candidates found");
Err(errors::activation_error(
&resolver_ctx,
registry.registry,
&parent,
&dep,
&conflicting_activations,
&candidates,
gctx,
))
That means if we're pre-filtering the array, we must alloc a new one and copy the pointers over.
On the topic of using sort_by_key()
, after playing around with it I actually think the way it's currently done is better because we are able to both sort and filter based on the result of one comparison. If we did it as filter().sort_by_key()
we'd be doing an unnecessary extra comparison on each item instead of relying on the else condition.
As to how to remove the extra allocation I'm currently doing; I could make the container be something like a VecDeque, where the "prioritized" candidates are pushed to the back and the "non-prioritized" candidates are pushed to the front, then use a chained iterator that iterates in reverse until it reaches the "partition point", after which it will iterate over the front elements normally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resolver is pretty fast these days. On one hand that took a lot of focus on micro-optimizations in these hot loops. On the other hand that means we can afford to write things "slower but more readable". Let's figure out what we want before geeking out too much about how to get all the performance out of it?
@@ -774,18 +757,50 @@ impl RemainingCandidates { | |||
// semver-compatible versions of a crate. For example we can't | |||
// simultaneously activate `foo 1.0.2` and `foo 1.2.0`. We can, | |||
// however, activate `1.0.2` and `2.0.0`. | |||
// | |||
// Here we throw out our candidate if it's *compatible*, yet not | |||
// equal, to all previously activated versions. | |||
if let Some((a, _)) = cx.activations.get(&b_id.as_activations_key()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I brought this up in prose, but I want to market is a concern. What ensures that a relevant package is not added to activations between new
and next
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing preventing it, but it seemingly does not matter.
The old implementation would take a &mut ConflictMap
in the next()
method, but it would ONLY append new conflicts. It would not perform checks against or remove elements. As a result, it does not matter if we move the appending of conflicts to new()
and do them preemptively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not worried about ConflictMap
getting corrupted, although maybe I should have been. If an item is missing from ConflictMap
that did in fact prevent a candidate version from being tried that would cause big problems. Similarly, if we backtrack and construct a new ConflictMap
it needs to know about all of the conflicts that were identified during the call to new
.
The situation I was worried about is next
returning a candidate that conflicts with a candidate in activations
?
Something like:
new
is called for<=1.2
. Because there is no version inactivations
self.prioritized_candidates
is[1.2.0, 1.1.1, 1.1.0, 1.0.0, 0.9.0 .... ]
- Any number of things happen including discovering new dependencies and backtracking and whatever.
new
is called for>=1, <2
. Because there is no version inactivations
self.prioritized_candidates
is[1.999.999, 1.999.998, ... , 1.1.0, 1.0.0]
next
is called on the second one suggesting candidate1.999.999
- since there are no problems with it that version is selected and added to
activations
. - Any number of things happen including discovering new dependencies and backtracking and whatever.
next
is called on the first one incorrectly suggesting candidate1.2.0
💥
I am back. Let me know how I can help. |
☔ The latest upstream changes (presumably #14583) made this pull request unmergeable. Please resolve the merge conflicts. |
☔ The latest upstream changes (possibly 081d7ba) made this pull request unmergeable. Please resolve the merge conflicts. |
What does this PR try to resolve?
Fixes
#13534#9029This PR implements a change that will make major wildcards in sub-dependencies be coerced by a more restrictive dependency to avoid unnecessary duplicate dependencies.
It does this by presorting the
remaining_candidates
array to prioritize candidates that already have activations (if you'd like an easier diff to understand the change, see the first commit.How should we test and review this PR?
There are unit tests for the new behavior.
Additional information
There is a bit of a special case here that needs to be talked about. When there are multiple possible activated candidates that could be used to coerce the wildcard, a choice needs to be made.
I did the simplest and most intuitive implementation, which is to always choose the candidate with the highest SemVer.
An argument could be made for an alternative: If the root crate has a direct dependency on the package we are coercing, always pick whatever the root activates. This is nice because it seemingly allows for maximum compatibility to the end user, but feels very inconsistent as it will inevitably favor binaries over libraries.