-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid collecting all RecordStore records into a Vec in the behaviour #3021
Conversation
Filtering is now done one by one
e834d6b
to
6711981
Compare
@mxinden ping, improving this area is pretty blocking for us, would be great to get some feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋
One thing that is going to cause problems
Have you been able to reproduce this in benchmarks?
So this PR removes the Cow that is not really being used (all places that consume the Cow immediately call into_owned() on it) and makes the iterator standalone so it can be stored in the behaviour.
Apart from the RecordStore::records
method, right?
so I have just shifted the problem.
Can we introduce this change with the future pull request that actually does the improvement then?
E.g. in my draft persistent RecordStore I could create a snapshot of the database and pull records out of that snapshot one by one to avoid having a giant Vec.
In case this is an in-memory database, there would be no need for a snapshot, given that the NetworkBehaviour
event loop is single threaded, correct?
@mxinden ping, improving this area is pretty blocking for us, would be great to get some feedback
Happy to help on improving the interface, but I am having difficulties understanding (a) what the current performance issue is (missing benchmarks) and (b) how these changes would improve the performance issues. See questions above.
All that said, happy to see more interest in improving the RecordStore
`trait!
@@ -64,36 +66,33 @@ pub enum Error { | |||
/// content. Just like a regular record, a provider record is distributed | |||
/// to the closest nodes to the key. | |||
/// | |||
pub trait RecordStore<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't 'a
have been on the individual methods instead of the entire trait
definition?
std::mem::swap(&mut skipped, &mut self.skipped); | ||
let record_ttl = self.record_ttl; | ||
let local_id = self.local_id; | ||
let records = Box::new(store.records().filter_map(move |r| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store.records
now returns an owned Iter
. With the filter_map
right after we might be cloning many items in vain. Isn't that an issue?
This pull request has merge conflicts. Could you please resolve them @rklaehn? 🙏 |
Closing this as stale. There is an open discussion about re-designing this interface here: #3035 |
Context:
I have recently joined https://n0.computer/ and am working on https://github.com/n0-computer/iroh .
One of the first issues with iroh I have discovered was that it won't allow adding large files because the default size of the kad RecordStore is too small. We just increased the size of the mem store for now. But I am looking into what it would take to provide a lot of records.
One thing that is going to cause problems is that currently all records are materialized into a
Vec<ProviderRecord>
in the behaviour. I think that needs to be changed to be able to provide a large number of records. So this PR removes theCow
that is not really being used (all places that consume the Cow immediately call into_owned() on it) and makes the iterator standalone so it can be stored in the behaviour. A nice benefit of this is that the lifetime parameter can also be removed.Currently the way I obtain a standalone iterator is to just materialize into a Vec inside the MemoryStore impl, so I have just shifted the problem. This obviously would have to be changed in the next step. E.g. in my draft persistent RecordStore I could create a snapshot of the database and pull records out of that snapshot one by one to avoid having a giant
Vec<ProviderRecord>
.