Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid cascading failure: give up on incoming HTLCs in time if outgoing is stuck. #6378

Merged

Conversation

rustyrussell
Copy link
Contributor

@rustyrussell rustyrussell commented Jul 6, 2023

@t-bast described this issue in general, and here's the fix, with some cleanups on the way.

  1. We forward an HTLC.
  2. The outgoing channel goes onchain
  3. We try to spend the outgoing HTLC (onchain) to time it out.
  4. But we either don't pay enough because fees spiked, or with anchors, we decide it's not worthwhile to pay enough fees.
  5. HTLC stays open and incoming side gets upset, closing channel.

Our solution is to monitor for this:

  1. If an htlc is incoming, and <= 3 blocks away from timing out
  2. And there's a corresponding outgoing
  3. And it's onchain (it will be by default, since cltv_delta is 6 (testnet) or 34 (mainnet).
  4. Fail the incoming immediately.

This is a risk, of course: the outgoing HTLC could be claimed by the peer. But that's no worse to us than it not getting mined at all, which we were prepared for.

@rustyrussell rustyrussell added the protocol These issues are protocol level issues that should be discussed on the protocol spec repo label Jul 6, 2023
@rustyrussell rustyrussell added this to the v23.08 milestone Jul 6, 2023
@rustyrussell rustyrussell added the Optech Make Me Famous! Look! Look! Look! COOL NEW FEATURE! label Jul 6, 2023
@rustyrussell rustyrussell force-pushed the guilt/give-up-onb-htlcs branch from 13a190f to 0103299 Compare July 6, 2023 06:57
I shut down bitcoind during a test, and bcli leak reports flooded in.
They're all temporary, but this fixes them.

Signed-off-by: Rusty Russell <[email protected]>
Caught by leak detection, we just re-assigned this when we retried: sure,
it's temporary, but it's technically a leak.

Signed-off-by: Rusty Russell <[email protected]>
@rustyrussell rustyrussell force-pushed the guilt/give-up-onb-htlcs branch from 0103299 to 91aca34 Compare July 22, 2023 10:00
@rustyrussell
Copy link
Contributor Author

Trivial rebase.

@rustyrussell rustyrussell force-pushed the guilt/give-up-onb-htlcs branch from 91aca34 to bb3a4b3 Compare July 23, 2023 04:12
The test actually triggers this:
1. We don't get our commitment tx mined at all (we block it).
2. By the time the peer does, the HTLC is expired.
3. We have the preimage but we don't even try, since it's expired.

We should at least *try* to collect the HTLC in this case.

Signed-off-by: Rusty Russell <[email protected]>
This cause of cascading failure was pointed out by @t-bast: if fees spike and
you don't timeout an outgoing onchain HTLC, you should nonetheless fail the incoming htlc
because otherwise the incoming peer will close on you.

Of course, there's a risk of losing funds, but this only happens if you weren't going to get the HTLC spend in time anyway.  And it would also catch any other reason that the downstream onchain goes wrong, containing the damage.

Signed-off-by: Rusty Russell <[email protected]>
Reported-by: @t-bast
Changelog-Fixed: Protocol: We will close incoming HTLCs early if the outgoing HTLC is stuck onchain long enough, to avoid cascating failure.
@rustyrussell rustyrussell force-pushed the guilt/give-up-onb-htlcs branch from bb3a4b3 to fd4b778 Compare July 24, 2023 03:09
Comment on lines +1506 to +1511
if (outs[i]->resolved->tx_type != SELF) {
status_broken("HTLC already resolved by %s"
" when we found preimage",
tx_type_name(outs[i]->resolved->tx_type));
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove the {..} due the single stmt inside the if?

Comment on lines +1761 to +1764
&& !hout->in->preimage) {
local_fail_in_htlc(hout->in,
take(towire_permanent_channel_failure(NULL)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here?

Copy link
Contributor

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some tiny comments on it but for me we are ready to go

@rustyrussell rustyrussell merged commit 978c169 into ElementsProject:master Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Optech Make Me Famous! Look! Look! Look! COOL NEW FEATURE! protocol These issues are protocol level issues that should be discussed on the protocol spec repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants