Free holding cell on monitor-updating-restored when there's no upd #755

TheBlueMatt · 2020-11-19T21:18:20Z

If there is no pending channel update messages when monitor updating
is restored (though there may be an RAA to send), and we're
connected to our peer and not awaiting a remote RAA, we need to
free anything in our holding cell.

Without this, chanmon_fail_consistency was able to find a stuck
condition where we sit on an HTLC failure in our holding cell and
don't ever handle it (at least until we have other actions to take
which empty the holding cell).

Still, this approach sucks - it introduces reentrancy in a
particularly dangerous form:
a) we re-enter user code around monitor updates while being called
from user code around monitor updates, making deadlocks very
likely (in fact, our current tests have a bug here!),
b) the re-entrancy only occurs in a very rare case, making it
likely users will not hit it in testing, only deadlocking in
production.
I'm not entirely sure what the alternative is, however - we could
move to a world where we poll for holding cell events that can be
freed on our 1-minute-timer, but we still have a super rare
reentrancy case, just in timer_chan_freshness_every_min() instead.

I didn't bother writing a test cause I'm not entirely sure this is the right approach - mostly I'd like concept feedback.

If there is no pending channel update messages when monitor updating is restored (though there may be an RAA to send), and we're connected to our peer and not awaiting a remote RAA, we need to free anything in our holding cell. Without this, chanmon_fail_consistency was able to find a stuck condition where we sit on an HTLC failure in our holding cell and don't ever handle it (at least until we have other actions to take which empty the holding cell). Still, this approach sucks - it introduces reentrancy in a particularly dangerous form: a) we re-enter user code around monitor updates while being called from user code around monitor updates, making deadlocks very likely (in fact, our current tests have a bug here!), b) the re-entrancy only occurs in a very rare case, making it likely users will not hit it in testing, only deadlocking in production. I'm not entirely sure what the alternative is, however - we could move to a world where we poll for holding cell events that can be freed on our 1-minute-timer, but we still have a super rare reentrancy case, just in timer_chan_freshness_every_min() instead.

TheBlueMatt · 2020-11-19T21:19:34Z

Because I'm not sure if this is the right fix, and the case should be incredibly rare, and its not really a critical bug, per se, and this is a more involved change than everything else that's left, I'm not going to bother tagging this 0.0.12 and suggest we hold off on fixing it.

TheBlueMatt · 2020-11-19T21:21:07Z

Oh, I should mention this was found by the updates in #753.

codecov · 2020-11-19T21:30:01Z

Codecov Report

Merging #755 (07fb28e) into main (4e82003) will decrease coverage by 0.06%.
The diff coverage is 93.18%.

@@            Coverage Diff             @@
##             main     #755      +/-   ##
==========================================
- Coverage   91.45%   91.38%   -0.07%     
==========================================
  Files          37       37              
  Lines       22249    22263      +14     
==========================================
- Hits        20347    20346       -1     
- Misses       1902     1917      +15

Impacted Files	Coverage Δ
lightning/src/ln/channel.rs	`87.37% <81.25%> (-0.07%)`	⬇️
lightning/src/ln/channelmanager.rs	`85.48% <100.00%> (+0.11%)`	⬆️
lightning/src/ln/functional_tests.rs	`96.92% <0.00%> (-0.25%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e82003...f35c8bf. Read the comment docs.

valentinewallace · 2021-02-09T17:44:58Z

Could I see the fuzz failure that this is solving?

I'm having a hard time wrapping my mind around #756 and I think it's because it seems to be solving multiple distinct problems that could be separated out. So, gonna try to understand them one by one.

TheBlueMatt · 2021-02-09T18:46:01Z

This input to chanmon_consistency fails on current upstream.

3c10143411341434100d5454543033333038345454545454545454545454
5454545454545454545454545454110015542d3454545411151154543154
541cff14

valentinewallace · 2021-02-10T20:03:42Z

This input to chanmon_consistency fails on current upstream.
3c10143411341434100d5454543033333038345454545454545454545454
5454545454545454545454545454110015542d3454545411151154543154
541cff14

For me this fuzz input^ fails with the same error on main and this PR's commit. Is that expected?

If this PR can stand on its own -- and it's not too inconvenient -- I'd rather reopen, review + get this mergeable separately from addressing the other commits in #756.

TheBlueMatt · 2021-02-11T04:31:11Z

Right, honestly its been a while and I don't remember exactly the contours of what is/isn't in this PR, but that fuzz input does pass on #756. Most of the other commits on 756 are just merging to codepaths that were near-identical that should now be identical and then fixing a few miner further bugs in the fuzztarget. I'm happy to try to dig deeper into it, but I'm not sure just this version is really an option, so much as an intended demonstration of an idea seeking concept acks.

TheBlueMatt closed this Nov 20, 2020

TheBlueMatt mentioned this pull request Nov 20, 2020

Clean up and more liberally free holding cell HTLCs #756

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Free holding cell on monitor-updating-restored when there's no upd #755

Free holding cell on monitor-updating-restored when there's no upd #755

TheBlueMatt commented Nov 19, 2020

TheBlueMatt commented Nov 19, 2020

TheBlueMatt commented Nov 19, 2020

codecov bot commented Nov 19, 2020

valentinewallace commented Feb 9, 2021

TheBlueMatt commented Feb 9, 2021

valentinewallace commented Feb 10, 2021 •

edited

Loading

TheBlueMatt commented Feb 11, 2021

Free holding cell on monitor-updating-restored when there's no upd #755

Free holding cell on monitor-updating-restored when there's no upd #755

Conversation

TheBlueMatt commented Nov 19, 2020

TheBlueMatt commented Nov 19, 2020

TheBlueMatt commented Nov 19, 2020

codecov bot commented Nov 19, 2020

Codecov Report

valentinewallace commented Feb 9, 2021

TheBlueMatt commented Feb 9, 2021

valentinewallace commented Feb 10, 2021 • edited Loading

TheBlueMatt commented Feb 11, 2021

valentinewallace commented Feb 10, 2021 •

edited

Loading