Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node crashes on startup, failing RBF in incomplete state #4511

Closed
niftynei opened this issue May 5, 2021 · 6 comments · Fixed by #4521
Closed

Node crashes on startup, failing RBF in incomplete state #4511

niftynei opened this issue May 5, 2021 · 6 comments · Fixed by #4521
Assignees

Comments

@niftynei
Copy link
Contributor

niftynei commented May 5, 2021

@jsarenik and I got most of the way through an RBF attempt before he desisted, now it's crashing my node.

We saved the 'inflight' attempt since we got as far as exchanging commitment sigs, but now the "most recent inflight" doesn't match the funding_txid for the channel (sigs should have been exchanged but failed to broadcast..?)

Worth noting that the initial feerate bump wasn't high enough to pass bitcoind's RBF rules, so broadcasting the tx failed.
Jan decided he didn't want to try again, which is valid. It's entirely a manual process at the moment, could definitely use a nicer/more concise command to bump a channel.

2021-05-05T23:09:52.142Z **BROKEN** lightningd: FATAL SIGNAL 6 (version v0.10.0-130-g929a418-modded)
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: common/daemon.c:44 (send_backtrace) 0x562a27aa874b
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: common/daemon.c:52 (crashdump) 0x562a27aa879b
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7fe62a6c0fcf
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7fe62a6c0f47
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7fe62a6c28b0
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7fe62a6b2429
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7fe62a6b24a1
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: lightningd/channel.c:735 (channel_current_inflight) 0x562a27a341fc
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: lightningd/channel.c:743 (channel_last_funding_feerate) 0x562a27a3421a
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: lightningd/peer_control.c:753 (json_add_channel) 0x562a27a69614
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: lightningd/peer_control.c:1424 (json_add_peer) 0x562a27a6b80e
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: lightningd/peer_control.c:1457 (json_listpeers) 0x562a27a6b9e0
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:643 (command_exec) 0x562a27a4e63c
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:767 (rpc_command_hook_final) 0x562a27a4ebd2
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: lightningd/plugin_hook.c:275 (plugin_hook_call_) 0x562a27a7dde0
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:855 (plugin_hook_call_rpc_command) 0x562a27a4ef99
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:942 (parse_request) 0x562a27a4f3e8
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:1033 (read_json) 0x562a27a4f74f
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x562a27b00b47
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x562a27b016c4
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x562a27b01702
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: ccan/ccan/io/poll.c:445 (io_loop) 0x562a27b038c8
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: lightningd/io_loop_with_timers.c:24 (io_loop_with_timers) 0x562a27a4c71c
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: lightningd/lightningd.c:1111 (main) 0x562a27a524fc
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7fe62a6a3b96
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x562a27a2c629
2021-05-05T23:09:52.142Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0xffffffffffffffff

Temp patch to fix:

diff --git a/lightningd/channel.c b/lightningd/channel.c
index 094a66eb3..b60a7ef8f 100644
--- a/lightningd/channel.c
+++ b/lightningd/channel.c
@@ -726,15 +726,10 @@ void channel_fail_forget(struct channel *channel, const char *fmt, ...)
 struct channel_inflight *
 channel_current_inflight(const struct channel *channel)
 {
-       struct channel_inflight *inflight;
        /* The last inflight should always be the one in progress */
-       inflight = list_tail(&channel->inflights,
-                            struct channel_inflight,
-                            list);
-       if (inflight)
-               assert(bitcoin_txid_eq(&channel->funding_txid,
-                                      &inflight->funding->txid));
-       return inflight;
+       return list_tail(&channel->inflights,
+                        struct channel_inflight,
+                        list);
 }
@niftynei niftynei self-assigned this May 5, 2021
@niftynei
Copy link
Contributor Author

niftynei commented May 5, 2021

@jsarenik if this is crashing your node, the included patch should fix it. fix incoming...

@jsarenik
Copy link
Contributor

jsarenik commented May 6, 2021

Yes, it is crashing also my node, see jsarenik/clightning-dual-crash-logs#1

@jsarenik
Copy link
Contributor

jsarenik commented May 6, 2021

Recompiling 9825f32 with the patch now...

diff --git a/lightningd/channel.c b/lightningd/channel.c
index 094a66eb3..b60a7ef8f 100644
--- a/lightningd/channel.c
+++ b/lightningd/channel.c
@@ -726,15 +726,10 @@ void channel_fail_forget(struct channel *channel, const char *fmt, ...)
 struct channel_inflight *
 channel_current_inflight(const struct channel *channel)
 {
-	struct channel_inflight *inflight;
 	/* The last inflight should always be the one in progress */
-	inflight = list_tail(&channel->inflights,
-			     struct channel_inflight,
-			     list);
-	if (inflight)
-		assert(bitcoin_txid_eq(&channel->funding_txid,
-				       &inflight->funding->txid));
-	return inflight;
+	return list_tail(&channel->inflights,
+			 struct channel_inflight,
+			 list);
 }
 
 u32 channel_last_funding_feerate(const struct channel *channel)

@jsarenik
Copy link
Contributor

jsarenik commented May 6, 2021

With the patch it is humming merrily. Thanks!

niftynei added a commit to niftynei/lightning that referenced this issue May 7, 2021
When we re-populate from disk, we need to know what order to recreate the
inflights list in.

Fixes ElementsProject#4511
niftynei added a commit to niftynei/lightning that referenced this issue May 11, 2021
When we re-populate from disk, we need to know what order to recreate the
inflights list in.

Fixes ElementsProject#4511
rustyrussell pushed a commit to niftynei/lightning that referenced this issue May 12, 2021
When we re-populate from disk, we need to know what order to recreate the
inflights list in.

Fixes ElementsProject#4511
Changelog-Experimental: Protocol: multiple fixes for dual-funding and rbf crashes.
@jsarenik
Copy link
Contributor

Running current master v0.10.0-151-gce1e5bd38 and everything works fine (without the patch that previously helped).

@jsarenik
Copy link
Contributor

Although it is not crashing on current master, I see this in the log repeating every few seconds (despite it says that it will try to reconnect in 60 seconds):

...
DEBUG   03e2408a49f07d2f4083a47344138ef89e7617e63919202c92aa8d49b574a560ae-channeld-chan#90: peer_in WIRE_UPDATE_FEE
DEBUG   03e2408a49f07d2f4083a47344138ef89e7617e63919202c92aa8d49b574a560ae-channeld-chan#90: update_fee 336, range 708-208870
DEBUG   03e2408a49f07d2f4083a47344138ef89e7617e63919202c92aa8d49b574a560ae-channeld-chan#90: peer_out WIRE_WARNING
INFO    03e2408a49f07d2f4083a47344138ef89e7617e63919202c92aa8d49b574a560ae-chan#90: Peer transient failure in CHANNELD_NORMAL: channeld WARNING: update_fee 336 outside range 708-208870
DEBUG   03e2408a49f07d2f4083a47344138ef89e7617e63919202c92aa8d49b574a560ae-chan#90: Will try reconnect in 60 seconds
UNUSUAL 03e2408a49f07d2f4083a47344138ef89e7617e63919202c92aa8d49b574a560ae-channeld-chan#90: Status closed, but waitpid 20900 says No child process
DEBUG   03e2408a49f07d2f4083a47344138ef89e7617e63919202c92aa8d49b574a560ae-connectd: disconnect
DEBUG   plugin-funder: Cleaning up inflights for peer id 03e2408a49f07d2f4083a47344138ef89e7617e63919202c92aa8d49b574a560ae

I would be happy to increase the fee, but what I know so far does not work.

niftynei added a commit to niftynei/lightning that referenced this issue May 18, 2021
When we re-populate from disk, we need to know what order to recreate the
inflights list in.

Fixes ElementsProject#4511
niftynei added a commit to niftynei/lightning that referenced this issue May 20, 2021
When we re-populate from disk, we need to know what order to recreate the
inflights list in.

Fixes ElementsProject#4511
rustyrussell pushed a commit that referenced this issue May 24, 2021
When we re-populate from disk, we need to know what order to recreate the
inflights list in.

Fixes #4511
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants