-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source::{fuzzy_}query{_vec} can say "try again" #8985
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
d2d905c
to
b9e814d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for this!
FWIW to get the full value out of this change I think we will want to remove Source::update
. That should no longer be needed at all since sources will naturally report that they need to be updated if, during a query, they discover that blocking work needs to be done. This would involve refactoring the registry, for example, to report that if any unlocked dependency was queried and an update hasn't been done yet that an update needs to be done. The thinking then is that for the HTTP inded source it would be more fine-grained about whether a package is ready or not.
As to some of your questions:
CargoResult<Poll<_>> vs Poll<CargoResult<()>> neither is clearly more ergonomic to work with?
I like CargoResult<Poll<_>>
because the usage of ?
is still "obviously correct"
Is there some use case that makes query pull its weight, or should we only have the query_vec version?
This was added for performance reasons at some point in the past to avoid allocating lots of intermediate vectors. I'm not sure if this is still the case but it didn't seem like it was too onerous to handle here?
What to do when we get a Poll::Pending? Currently we bizzy weight, but that is not grate. What can we do short of async and await.
I wrote down a few comments here and there, but the main gist of what I'm thinking is that we go back to the source and say "hey you said pending earlier, please resolve that pending status right now in a blocking fashion".
There are a lot of places that want to "just weight till it is ready", should we find a way to have that code in only one or two places? Should we have a query_weight that is defaulted to call query in a loop?
I was thinking the same thing about possibly having a helper method on the trait which does the wait for you. I think it'd be fine to add that, although we would want to use it sparingly within Cargo.
Should we move some of the loops up the stack for better parallelism?
I only found one location (patches) that I think should get moved up, but other than that the loops seemed reasonable to me (to either add blocking or ignore the pending status).
Should we move some of the loops down the stack to do less redundant work?
I didn't find anything along these lines
Can we break some of the caches up, so that there is more sharing between runs? (Yes, but maybe it can be left for later.)
Yeah I was initially hoping we could preserve the entire resolver and benefit from lots of prepopulated non-pending caches, but that may not end up being the case. I don't think this is a huge worry though given the speed of the resolver and the typical time it takes to fetch something from the network.
There are 2 places where we want expect for Poll, should we start an extension trait to add a method?
Yeah seems fine to me!
}) | ||
.collect::<Vec<_>>(); | ||
} else { | ||
// TODO: dont hot loop for it to be Ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I imagine this bottoms out in something like self.wait_for_sources_to_be_ready()
. Each source would already be registered in some internal map of PackageRegistry
if we're blocked on it, and then we'd ask each source, in sequence, "do your blocking thing now".
source.query(dep, callback)?; | ||
source.query(dep, callback)? | ||
}; | ||
if pend.is_pending() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where I'd imagine that a record of this dependency's source id is recorded in an internal map for us to later iterate over and ask the source to block on things.
@@ -32,7 +33,7 @@ pub struct RegistryQueryer<'a> { | |||
/// specify minimum dependency versions to be used. | |||
minimal_versions: bool, | |||
/// a cache of `Candidate`s that fulfil a `Dependency` | |||
registry_cache: HashMap<Dependency, Rc<Vec<Summary>>>, | |||
registry_cache: HashMap<Dependency, Poll<Rc<Vec<Summary>>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit surprised by this where it's caching not ready signals. I was hoping that we could reuse the resolver across iterations perhaps and benefit from mostly prepopulated caches?
if registry.all_ready() { | ||
break (registry, cx); | ||
} else { | ||
// TODO: dont hot loop for it to be Ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I'm imagining a new method on Registry
which is something like "block on things" which is called here.
break deps; | ||
} | ||
Poll::Pending => { | ||
// TODO: dont hot loop for it to be Ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget the context in which this function is called, but this could call the hypothetical new Source::block_on_stuff
method
I don't have too much to add to this except that |
@jonhoo yeah I'm imagining that after this PR we'll want to refactor the internal index implementation of the registry like you mention. We'll want to tweak the current implementations as well to match the new interface, and the http index should then fit quite cleanly into the implementation. |
I just added a commit to push the |
src/cargo/sources/registry/index.rs
Outdated
.summaries(pkg.name(), &req, load)? | ||
.any(|summary| summary.yanked); | ||
Ok(found) | ||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just also be modified to return Poll<bool>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One level is straightforward and worth it. Continuing up the stack Source::is_yanked
; I am not seeing where we would want to call a lot of it in parallel nor where it ends in an existing loop. Seams like it is called where there is not an opportunity for parallelism (install
code) or where we have already updated that pkg (writing a lockfile
). What have I missed?
I think my general thinking here is that we should just continue to propagate |
break; | ||
} | ||
Poll::Pending => { | ||
// TODO: dont hot loop for it to be Ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will want the ability to return up Poll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to do the work, I am just not seeing where up the stack we would want to call this in parallel with other work? What have I missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's true, although I think we'll want special handling for this since I don't think we want to add a round-trip time to get this file all the time in Cargo. We'd presumably want to cache this file separately for a fixed length of time and while in that window of time Cargo doesn't re-fetch the file.
405d92b
to
4c13c26
Compare
b2d3468
to
f6e9f8c
Compare
☔ The latest upstream changes (presumably #9125) made this pull request unmergeable. Please resolve the merge conflicts. |
f6e9f8c
to
44012fd
Compare
☔ The latest upstream changes (presumably #9133) made this pull request unmergeable. Please resolve the merge conflicts. |
Any updates on this? I keep hoping that this will be unblocked so #8890 will become a reality. |
That is my hope as well. My plan was to get back to this, but I have been spending my time understanding the design space around the Cargo Auth RFC. When that is done, I will get back to this. (Best laid plans and all, but I will try) The Cargo team may close this PR as stale in the meantime, but it will still be "next" on my todo list. |
Thank you. |
Registry functions return Poll to enable parallel fetching of index data Adds `Poll` as a return type for several registry functions to enable parallel fetching of crate metadata with a future http-based registry. Work is scheduled by calling the `query` and related functions, then waited on with `block_until_ready`. This PR is based on the draft PR started by eh2406 here [#8985](#8985). r? `@Eh2406` cc `@alexcrichton` cc `@jonhoo`
This can be closed since it landed via #10064! |
This adds
Poll
to the return type ofSource
. This is prep work for @jonhoo's #8890 as suggested by @alexcrichton.There is definitely a lot of polishing work to do before merge, but it is good enough to start a conversation and see what CI has to say.
Some questions as it stands now:
CargoResult<Poll<_>>
vsPoll<CargoResult<()>>
neither is clearly more ergonomic to work with?query
pull its weight, or should we only have thequery_vec
version?Poll::Pending
? Currently we bizzy weight, but that is not grate. What can we do short of async and await.query_weight
that is defaulted to callquery
in a loop?expect
forPoll
, should we start an extension trait to add a method?