-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate replication manager with networking stack #387
Integrate replication manager with networking stack #387
Conversation
…plication messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We finally made it 🤣 !!! Thanks for these final changes, it's really clear what's going on now and I'm observing very consistent behaviour when testing locally. Much better experience observing node behaviour now that the reaction to disconnects is immediate.
This final approach to peers/connections should make it easy to transition to scheduling dialing peers from a behaviour, rather than in the sync manager.
I believe I'm still experiencing one issue which occurs when new schema are being materialized. It looks like a materialization task crashes and so the schema provider is never updated, if i restart the node, the provider is updated and replication can continue. Logs are here: https://laub.liebechaos.org/WQ3SzzmtSPyfESfLfNEgFA don't know if this is a concern of the current PR though, I can open an issue once I understand the problem better.
I renamed Naive
strategy to LogHeight
. It's not perfect, but seems mean to call the current setup naive 😄
Would be good to figure out the cause of the issue (above) but apart from that I'm very happy with this PR now.
} | ||
} | ||
|
||
fn connection_keep_alive(&self) -> KeepAlive { | ||
self.keep_alive | ||
if self.critical_error { | ||
return KeepAlive::No; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check my understanding, right now we still expect this to trigger the connection to be gracefully closed as no other behaviours will keep it open. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When all behaviours flip KeepAlive to false then the connection gets removed (time out)
Yes, I observed the same! It's caused by entries arriving in unexpected order (or at least some race condition). Also think it's unrelated to the networking stack |
Wahoo!! MERGED! |
Juhuuu!!! |
* development: (23 commits) Implement `dialer` behaviour (#444) Sort expected results in strategy tests Update CHANGELOG Replicate operations in topo order (#442) Maintain sorted operation indexes (#438) Use fork of `asynchronous-codec` (#440) Ingest check for duplicate entries (#439) Reverse lookup for pinned relations in dependency task (#434) Remove unnecessary exact version pinning in Cargo.toml Make `TaskInput` an enum and other minor clean ups in materialiser (#429) Use `libp2p` `v0.52.0` (#425) Fix race condition when check for existing view ids was too early (#420) Reduce logging verbosity CI: Temporary workaround for Rust compiler bug (#417) Fix early document view insertion (#413) Handle duplicate document view insertions (#410) Decouple p2panda's authentication data types from libp2p's (#408) Remove dead_code attribute in lib Integrate replication manager with networking stack (#387) Implement naive replication protocol (#380) ...
This PR introduces a new replication service which represents the replication protocol layer on top of the networking stack. It uses the service bus to communicate with the networking service, gets informed about established and closed connection with peers and new incoming replication messages.
The replication service keeps track of all currently available peers, can initiate new replication sessions with a simple scheduler logic and maintains the sync manager which is the state machine dealing with the actual replication protocol logic.
Closes: #373 and #396
📋 Checklist
CHANGELOG.md