-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid cascading failure: give up on incoming HTLCs in time if outgoing is stuck. #6378
Avoid cascading failure: give up on incoming HTLCs in time if outgoing is stuck. #6378
Conversation
13a190f
to
0103299
Compare
I shut down bitcoind during a test, and bcli leak reports flooded in. They're all temporary, but this fixes them. Signed-off-by: Rusty Russell <[email protected]>
Caught by leak detection, we just re-assigned this when we retried: sure, it's temporary, but it's technically a leak. Signed-off-by: Rusty Russell <[email protected]>
0103299
to
91aca34
Compare
Trivial rebase. |
91aca34
to
bb3a4b3
Compare
…losing on us. Signed-off-by: Rusty Russell <[email protected]>
The test actually triggers this: 1. We don't get our commitment tx mined at all (we block it). 2. By the time the peer does, the HTLC is expired. 3. We have the preimage but we don't even try, since it's expired. We should at least *try* to collect the HTLC in this case. Signed-off-by: Rusty Russell <[email protected]>
This cause of cascading failure was pointed out by @t-bast: if fees spike and you don't timeout an outgoing onchain HTLC, you should nonetheless fail the incoming htlc because otherwise the incoming peer will close on you. Of course, there's a risk of losing funds, but this only happens if you weren't going to get the HTLC spend in time anyway. And it would also catch any other reason that the downstream onchain goes wrong, containing the damage. Signed-off-by: Rusty Russell <[email protected]> Reported-by: @t-bast Changelog-Fixed: Protocol: We will close incoming HTLCs early if the outgoing HTLC is stuck onchain long enough, to avoid cascating failure.
bb3a4b3
to
fd4b778
Compare
if (outs[i]->resolved->tx_type != SELF) { | ||
status_broken("HTLC already resolved by %s" | ||
" when we found preimage", | ||
tx_type_name(outs[i]->resolved->tx_type)); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove the {..}
due the single stmt inside the if?
&& !hout->in->preimage) { | ||
local_fail_in_htlc(hout->in, | ||
take(towire_permanent_channel_failure(NULL))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some tiny comments on it but for me we are ready to go
@t-bast described this issue in general, and here's the fix, with some cleanups on the way.
Our solution is to monitor for this:
This is a risk, of course: the outgoing HTLC could be claimed by the peer. But that's no worse to us than it not getting mined at all, which we were prepared for.