Avoid cascading failure: give up on incoming HTLCs in time if outgoing is stuck. #6378

rustyrussell · 2023-07-06T02:25:14Z

@t-bast described this issue in general, and here's the fix, with some cleanups on the way.

We forward an HTLC.
The outgoing channel goes onchain
We try to spend the outgoing HTLC (onchain) to time it out.
But we either don't pay enough because fees spiked, or with anchors, we decide it's not worthwhile to pay enough fees.
HTLC stays open and incoming side gets upset, closing channel.

Our solution is to monitor for this:

If an htlc is incoming, and <= 3 blocks away from timing out
And there's a corresponding outgoing
And it's onchain (it will be by default, since cltv_delta is 6 (testnet) or 34 (mainnet).
Fail the incoming immediately.

This is a risk, of course: the outgoing HTLC could be claimed by the peer. But that's no worse to us than it not getting mined at all, which we were prepared for.

I shut down bitcoind during a test, and bcli leak reports flooded in. They're all temporary, but this fixes them. Signed-off-by: Rusty Russell <[email protected]>

Caught by leak detection, we just re-assigned this when we retried: sure, it's temporary, but it's technically a leak. Signed-off-by: Rusty Russell <[email protected]>

rustyrussell · 2023-07-22T10:01:23Z

Trivial rebase.

…losing on us. Signed-off-by: Rusty Russell <[email protected]>

The test actually triggers this: 1. We don't get our commitment tx mined at all (we block it). 2. By the time the peer does, the HTLC is expired. 3. We have the preimage but we don't even try, since it's expired. We should at least *try* to collect the HTLC in this case. Signed-off-by: Rusty Russell <[email protected]>

@t-bast

This cause of cascading failure was pointed out by @t-bast: if fees spike and you don't timeout an outgoing onchain HTLC, you should nonetheless fail the incoming htlc because otherwise the incoming peer will close on you. Of course, there's a risk of losing funds, but this only happens if you weren't going to get the HTLC spend in time anyway. And it would also catch any other reason that the downstream onchain goes wrong, containing the damage. Signed-off-by: Rusty Russell <[email protected]> Reported-by: @t-bast Changelog-Fixed: Protocol: We will close incoming HTLCs early if the outgoing HTLC is stuck onchain long enough, to avoid cascating failure.

vincenzopalazzo · 2023-07-25T06:59:33Z

onchaind/onchaind.c

+			if (outs[i]->resolved->tx_type != SELF) {
+				status_broken("HTLC already resolved by %s"
+					      " when we found preimage",
+					      tx_type_name(outs[i]->resolved->tx_type));
+				return;
+			}


we can remove the {..} due the single stmt inside the if?

vincenzopalazzo · 2023-07-25T07:00:14Z

lightningd/peer_htlcs.c

+		    && !hout->in->preimage) {
+			local_fail_in_htlc(hout->in,
+					   take(towire_permanent_channel_failure(NULL)));
+		}


lightningd/peer_htlcs.c

vincenzopalazzo

LGTM, just some tiny comments on it but for me we are ready to go

rustyrussell added the protocol These issues are protocol level issues that should be discussed on the protocol spec repo label Jul 6, 2023

rustyrussell added this to the v23.08 milestone Jul 6, 2023

rustyrussell added the Optech Make Me Famous! Look! Look! Look! COOL NEW FEATURE! label Jul 6, 2023

rustyrussell force-pushed the guilt/give-up-onb-htlcs branch from 13a190f to 0103299 Compare July 6, 2023 06:57

rustyrussell added 2 commits July 22, 2023 13:56

plugins/bcli: fix leak report when bitcoind goes away.

cfaf638

I shut down bitcoind during a test, and bcli leak reports flooded in. They're all temporary, but this fixes them. Signed-off-by: Rusty Russell <[email protected]>

plugins/bcli: plug temporary leak on retry.

f6f3a66

Caught by leak detection, we just re-assigned this when we retried: sure, it's temporary, but it's technically a leak. Signed-off-by: Rusty Russell <[email protected]>

rustyrussell force-pushed the guilt/give-up-onb-htlcs branch from 0103299 to 91aca34 Compare July 22, 2023 10:00

rustyrussell force-pushed the guilt/give-up-onb-htlcs branch from 91aca34 to bb3a4b3 Compare July 23, 2023 04:12

rustyrussell added 3 commits July 24, 2023 12:38

pytest: test that we proactively close incoming HTLCs to avoid them c…

5d401a4

…losing on us. Signed-off-by: Rusty Russell <[email protected]>

rustyrussell force-pushed the guilt/give-up-onb-htlcs branch from bb3a4b3 to fd4b778 Compare July 24, 2023 03:09

vincenzopalazzo reviewed Jul 25, 2023

View reviewed changes

lightningd/peer_htlcs.c Show resolved Hide resolved

vincenzopalazzo reviewed Jul 25, 2023

View reviewed changes

rustyrussell merged commit 978c169 into ElementsProject:master Jul 25, 2023

rustyrussell mentioned this pull request Jul 26, 2023

Abandon downstream htlcs if they don't confirm #6284

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid cascading failure: give up on incoming HTLCs in time if outgoing is stuck. #6378

Avoid cascading failure: give up on incoming HTLCs in time if outgoing is stuck. #6378

rustyrussell commented Jul 6, 2023 •

edited

Loading

rustyrussell commented Jul 22, 2023

vincenzopalazzo Jul 25, 2023

vincenzopalazzo Jul 25, 2023

vincenzopalazzo left a comment

Avoid cascading failure: give up on incoming HTLCs in time if outgoing is stuck. #6378

Avoid cascading failure: give up on incoming HTLCs in time if outgoing is stuck. #6378

Conversation

rustyrussell commented Jul 6, 2023 • edited Loading

rustyrussell commented Jul 22, 2023

vincenzopalazzo Jul 25, 2023

Choose a reason for hiding this comment

vincenzopalazzo Jul 25, 2023

Choose a reason for hiding this comment

vincenzopalazzo left a comment

Choose a reason for hiding this comment

rustyrussell commented Jul 6, 2023 •

edited

Loading