-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail holding-cell AddHTLCs on Channel deser to match disconnection #754
Fail holding-cell AddHTLCs on Channel deser to match disconnection #754
Conversation
Note that I only think we need to take this for 0.0.12 because of the risk of forwarding something when we don't have a current chain - the assertion is only a debug assertion, and as far as I can tell the behavior even when hitting it is correct. |
As Channel::write says in the comment at the top: "we write out as if remove_uncommitted_htlcs_and_mark_paused had just been called", except that we previously deliberately included holding-cell AddHTLC events in the serialization. On the flip side, in remove_uncommitted_htlcs_and_mark_paused, we removed pending AddHTLC events under the assumption that, if we can't forward something ASAP, its better to fail it back to the origin than to sit on it for a while. Given there's likely to be just as large a time-lag between ser/deserialization as between when a peer dis/reconnects, there isn't much of a reason for this difference. Worse, we debug_assert that there are no pending AddHTLC holding cell events when doing a reconnect, so any tests or fuzzers which deserialized a ChannelManager with AddHTLC events would panic. We resolve this by adding logic to fail any holding-cell AddHTLC events upon deserialization, in part because trying to forward it before we're sure we have an up-to-date chain is somewhat risky - the sender may have already gone to chain while our upstream has not.
4956ad6
to
1ecee17
Compare
Codecov Report
@@ Coverage Diff @@
## main #754 +/- ##
==========================================
+ Coverage 91.45% 91.49% +0.04%
==========================================
Files 37 37
Lines 22249 22340 +91
==========================================
+ Hits 20347 20440 +93
+ Misses 1902 1900 -2
Continue to review full report at Codecov.
|
In terms of "concept," the one question I have is whether this could lead to unexpected behavior for users. It's not obvious that reading state from disk would lead to failing HTLCs back. For example, if a user has a method that reads all of RL's info from disk, and they batch Maybe this PR speaks to a need for (edit: possibly relevant tweet: https://twitter.com/jaredpalmer/status/1171415929865064449?s=20) |
I'm a little confused by this - you have to provide the
Hopefully not. I tend to subscribe to the "having clean shutdown logic lulls you into a false sense of security" - user apps crash, especially on mobile, and getting a |
Discussed it offline for a while with @valentinewallace, in short I think this at least doesn't make sense because we should be going a different direction - solve #661 and the debug-assert failure by not failing-back holding cell HTLCs just because monitor updating has paused or we went to disk, and think harder about how to handle pre-chain-sync load-time HTLC relaying more generally (probably for now just say that you shouldn't connect to peers until chain is synced). |
Note that this bug was found thanks to the changes in #753, but its reasonable to take this by itself for 0.0.12 without 753.
As Channel::write says in the comment at the top: "we write out as
if remove_uncommitted_htlcs_and_mark_paused had just been called",
except that we previously deliberately included holding-cell
AddHTLC events in the serialization. On the flip side, in
remove_uncommitted_htlcs_and_mark_paused, we removed pending
AddHTLC events under the assumption that, if we can't forward
something ASAP, its better to fail it back to the origin than to
sit on it for a while.
Given there's likely to be just as large a time-lag between
ser/deserialization as between when a peer dis/reconnects, there
isn't much of a reason for this difference. Worse, we debug_assert
that there are no pending AddHTLC holding cell events when doing a
reconnect, so any tests or fuzzers which deserialized a
ChannelManager with AddHTLC events would panic.
We resolve this by adding logic to fail any holding-cell AddHTLC
events upon deserialization, in part because trying to forward it
before we're sure we have an up-to-date chain is somewhat risky -
the sender may have already gone to chain while our upstream has
not.