Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - syncv2: pairwise set reconciliation #6350

Closed
wants to merge 19 commits into from

Conversation

ivan4th
Copy link
Contributor

@ivan4th ivan4th commented Sep 25, 2024

Motivation

Existing sync is causing a lot of resource usage, including memory consumption and network traffic, and can't replace gossip which can also be unreliable at times.

Description

This implements pairwise set reconciliation protocol based on "Range-Based Set Reconciliation" by Aljoscha Meyer https://arxiv.org/pdf/2212.13567.pdf
This supersedes a part of #5769 and does not contain the optimized data structures for range fingerprinting and does not implement multipeer reconciliation algorithm.

One part of this change that may see further improvement: the reconciled IDs themselves are sent as "chunks" contained in SyncMessages. This might be suboptimal but makes implementation of the interactive reconciliation protocol somewhat simpler. It will be replaced by non-chunked streaming after some optimizations of the range fingerprinting data structures, so that they can be copied in O(1) time and the ID stream can be consumed immediately not waiting for the current messages to be processed.

See package documentation in doc.go and fundamental sync types in sync2/types/types.go for the description of the protocol and its fundamental primitives.

Prerequisite: spacemeshos/go-scale#98

This implements pairwise set reconciliation protocol based on
"Range-Based Set Reconciliation" by Aljoscha Meyer
https://arxiv.org/pdf/2212.13567.pdf
@ivan4th ivan4th mentioned this pull request Sep 25, 2024
35 tasks
Copy link

codecov bot commented Sep 25, 2024

Codecov Report

Attention: Patch coverage is 79.29249% with 240 lines in your changes missing coverage. Please review.

Project coverage is 81.6%. Comparing base (118ee7f) to head (8c50bd4).
Report is 61 commits behind head on develop.

Files with missing lines Patch % Lines
sync2/rangesync/rangesync.go 68.2% 80 Missing and 44 partials ⚠️
sync2/rangesync/wire_helpers.go 63.7% 17 Missing and 12 partials ⚠️
sync2/rangesync/dumbset.go 86.7% 11 Missing and 9 partials ⚠️
sync2/rangesync/wire_types.go 71.9% 16 Missing ⚠️
sync2/rangesync/seq.go 80.0% 11 Missing and 1 partial ⚠️
sync2/rangesync/fingerprint.go 74.1% 4 Missing and 4 partials ⚠️
sync2/rangesync/p2p.go 88.2% 5 Missing and 3 partials ⚠️
sync2/rangesync/wire_conduit.go 93.2% 6 Missing and 2 partials ⚠️
sync2/rangesync/message.go 91.3% 5 Missing and 2 partials ⚠️
sync2/rangesync/keybytes.go 88.5% 2 Missing and 2 partials ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff            @@
##           develop   #6350     +/-   ##
=========================================
- Coverage     81.7%   81.6%   -0.1%     
=========================================
  Files          313     328     +15     
  Lines        34972   36317   +1345     
=========================================
+ Hits         28573   29652   +1079     
- Misses        4557    4742    +185     
- Partials      1842    1923     +81     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

sync2/rangesync/dumbset.go Outdated Show resolved Hide resolved
sync2/rangesync/wire_conduit.go Show resolved Hide resolved
Also, add a test that verifies that not adding received items to the
set immediately doesn't affect set reconciliation operation.
Copy link
Contributor

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @ivan4th. You've spent a great deal of effort on this and it is noticeable. I spent a couple of hours trying to review this and have put in some comments. In general a few issues stand out:

  • there's a lot of panics in the code which I'm not sure if they're planned to be fixed or left as is. If left as-is, they could pose as attack surfaces by using crafted messages. Leaving so many of them at this early stage could mean a lot of signature changes and all sorts of edge cases occuring due to (so-far) unhandled errors becoming handled.
  • there's a few significant files that have no noteworthy documentation.
  • to me, the protocol stands out in its complexity - a p2p protocol that has more than a few messages becomes difficult to reason about in various terms. it would be useful, at least, to understand the different messages, their relationships and possible state transitions. a simple sequence diagram(s) will do.
  • while some general concepts of the design of the protocol are described in the doc.go file, there's no mentioning of how the different components interact. i believe that this would be useful for the review process, and for posterity
  • due to complexity and importance of some of the undocumented files, i could not review the whole thing (as i'd just be rubber-stamping this). once these issues are discussed/addressed, i could continue this.

Thanks!

sync2/rangesync/dispatcher.go Show resolved Hide resolved
sync2/types/types.go Outdated Show resolved Hide resolved
sync2/rangesync/wire_helpers.go Show resolved Hide resolved
sync2/types/types.go Outdated Show resolved Hide resolved
sync2/types/types.go Outdated Show resolved Hide resolved
sync2/rangesync/minhash.go Outdated Show resolved Hide resolved
sync2/rangesync/p2p.go Show resolved Hide resolved
sync2/rangesync/wire_conduit.go Show resolved Hide resolved
sync2/rangesync/message.go Show resolved Hide resolved
return sb.String()
}

type sender struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why does a message need to be aware of a sender? In the context of this file, I'd expect it to handle stuff related to messages only - instantiation/(de)serialization, why "mix joy with joy" and blend in stuff related to message sending?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sender is a convenience message constructor. As every message, once constructed, is immediately sent and nothing else is ever done with it, it's no point in separating send call from that constructor.
Might try to think about better name for it...
I could also remove sender completely, but that will make the related code somewhat less readable IMO, in particular due to type casting and "compact hash" construction.

@ivan4th
Copy link
Contributor Author

ivan4th commented Oct 7, 2024

Thanks for this @ivan4th. You've spent a great deal of effort on this and it is noticeable. I spent a couple of hours trying to review this and have put in some comments. In general a few issues stand out:

* there's a lot of panics in the code which I'm not sure if they're planned to be fixed or left as is. If left as-is, they could pose as attack surfaces by using crafted messages. Leaving so many of them at this early stage could mean _a lot_ of signature changes and all sorts of edge cases occuring due to (so-far) unhandled errors becoming handled.

The panics used in the PR are assertions.
If the code panics, the program is already broken, and will very likely crash soon one way or another, possibly giving opportunities to an attacker. With panic in the logs, we can at least have a clear understanding of the issue if the panic happens.
We could use complete defensive programming approach everywhere, treating every unexpected inconsistency of the internal state as an error instead as a fatal assertion failure, and then try to somehow recover from the said state, but done properly, that could lead to a lot of extra effort. If we pursue this approach to the end, we must note that there are a lot of implicit assertions in Go itself, e.g. your program may panic every time it accesses the slice b/c of a boundary error. Does it make sense to add check that panics with a clue on why the boundary error could happen in the first place? Probably, depending on the context.
Does it make sense to do a check returning an error upon each slice access? Probably not.
What I think would be more efficient is adding fuzz tests for the protocol. I probably need to do that, with fuzzing both on byte and message level.

* there's a few significant files that have no noteworthy documentation.

As far as I understand, that primarily concerns dumbSet which is a simplified implementation of OrderedSet for tests and wireConduit which is the real impl of the Conduit interface.
I will fix this.
It is worth noting that both OrderedSet and Conduit interfaces are documented (see replies to the code comments for name discussion)

* to me, the protocol stands out in its complexity - a p2p protocol that has more than a few messages becomes difficult to reason about in various terms. it would be useful, at least, to understand the different messages, their relationships and possible state transitions. a simple sequence diagram(s) will do.

Will add diagrams as part of a separate set-reconciliation.md document.

* while some general concepts of the design of the protocol are described in the `doc.go` file, there's no mentioning of how the different _components_ interact. i believe that this would be useful for the review process, and for posterity

Will try to make things more clear in the comments/doc.

* due to complexity and importance of some of the undocumented files, i could not review the whole thing (as i'd just be rubber-stamping this). once these issues are discussed/addressed, i could continue this.

As noted above, I described some of the undocumented code briefly (implementations of documented OrderedSet and Conduit interfaces), but also will do so in the code, and will check what other docs I did forget.

@acud
Copy link
Contributor

acud commented Oct 7, 2024

If the code panics, the program is already broken, and will very likely crash soon one way or another, possibly giving opportunities to an attacker. With panic in the logs, we can at least have a clear understanding of the issue if the panic happens.
We could use complete defensive programming approach everywhere, treating every unexpected inconsistency of the internal state as an error instead as a fatal assertion failure, and then try to somehow recover from the said state, but done properly, that could lead to a lot of extra effort. If we pursue this approach to the end, we must note that there are a lot of implicit assertions in Go itself, e.g. your program may panic every time it accesses the slice b/c of a boundary error. Does it make sense to add check that panics with a clue on why the boundary error could happen in the first place? Probably, depending on the context.

So here I partially agree.

If an attacker can craft a message that can make a remote node panic - we have a serious problem (as the entire network can just be shut-down by any half-baked script-kiddy).

Now, since I am not particularly familiar with the call-stack depth in which these dependencies will be used, or the likelihood of an attacker-crafted data reaching a point in the code where there's absolutely no chance of the code panicking (simply because it is already checked/type translated in a way which mitigates the risk completely), then consider my comment as retracted.

Generally, inconsistent data received from one peer should never cause a node to panic (unless we really screwed up, which sometimes happens - nil pointer exceptions and out of bound exceptions, etc, however, they are almost unlikely and are invoked by the std lib).

Use black box testing
Add mising KeyBytes and Fingerprint tests
Document dumbSet and related stuff
Document wireConduit
@ivan4th
Copy link
Contributor Author

ivan4th commented Oct 9, 2024

Generally, inconsistent data received from one peer should never cause a node to panic (unless we really screwed up, which sometimes happens - nil pointer exceptions and out of bound exceptions, etc, however, they are almost unlikely and are invoked by the std lib).

Of course the code should never panic if the user-provided data is invalid, that would be too amateur of an approach ;)
The panics are only for cases where the internal state of the program is inconsistent and cannot yield predictable results anymore.

@ivan4th
Copy link
Contributor Author

ivan4th commented Oct 9, 2024

Copy link
Contributor

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a few more comments

sync2/rangesync/doc.go Outdated Show resolved Hide resolved
sync2/rangesync/dumbset.go Outdated Show resolved Hide resolved
sync2/rangesync/dumbset.go Outdated Show resolved Hide resolved
sync2/rangesync/export_test.go Outdated Show resolved Hide resolved
sync2/rangesync/fingerprint.go Show resolved Hide resolved
sync2/rangesync/wire_conduit.go Show resolved Hide resolved
sync2/rangesync/rangesync.go Show resolved Hide resolved
sync2/rangesync/rangesync.go Outdated Show resolved Hide resolved
sync2/rangesync/rangesync_test.go Outdated Show resolved Hide resolved
sync2/rangesync/rangesync_test.go Show resolved Hide resolved
Copy link
Contributor

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your patience :)

@ivan4th
Copy link
Contributor Author

ivan4th commented Oct 17, 2024

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Oct 17, 2024
## Motivation

Existing sync is causing a lot of resource usage, including memory consumption and network traffic, and can't replace gossip which can also be unreliable at times.
@ivan4th
Copy link
Contributor Author

ivan4th commented Oct 17, 2024

bors cancel

@spacemesh-bors
Copy link

Canceled.

@ivan4th
Copy link
Contributor Author

ivan4th commented Oct 17, 2024

gzipped excalidraw files in the docs before merging to reduce LoC footprint

@ivan4th
Copy link
Contributor Author

ivan4th commented Oct 17, 2024

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Oct 17, 2024
## Motivation

Existing sync is causing a lot of resource usage, including memory consumption and network traffic, and can't replace gossip which can also be unreliable at times.
@spacemesh-bors
Copy link

Pull request successfully merged into develop.

Build succeeded:

@spacemesh-bors spacemesh-bors bot changed the title syncv2: pairwise set reconciliation [Merged by Bors] - syncv2: pairwise set reconciliation Oct 17, 2024
@spacemesh-bors spacemesh-bors bot closed this Oct 17, 2024
@spacemesh-bors spacemesh-bors bot deleted the syncv2/pairwise branch October 17, 2024 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants