-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beat itest [1/3]: remove the usage of standby nodes #9306
Conversation
Important Review skippedAuto reviews are limited to specific labels. 🏷️ Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Pull reviewers statsStats of the last 30 days for lnd:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! I like the idea of shuffling the tests. And lots of other goodies in this PR as well.
The PR body mentions splitting the sub tests into standalone tests to speed up. I assume that's been moved to a different part of the saga?
Looks like one of the tests fails everywhere but not sure if it's related to this PR or fixed in a later one. So non-blocking since this is being merged into a side branch.
--- FAIL: TestLightningNetworkDaemon/tranche02/53-of-192/bitcoind/open_channel_reorg_test (38.98s)
harness_node.go:403: Starting node (name=Alice) with PID=5580
harness_node.go:403: Starting node (name=Bob) with PID=5600
miner.go:631:
Error Trace: /home/runner/work/lnd/lnd/lntest/miner/miner.go:631
/home/runner/work/lnd/lnd/itest/lnd_open_channel_test.go:70
/home/runner/work/lnd/lnd/lntest/harness.go:297
/home/runner/work/lnd/lnd/itest/lnd_test.go:130
Error: Received unexpected error:
expected new miner(555) to be 5 blocks ahead of original miner(551)
Test: TestLightningNetworkDaemon/tranche02/53-of-192/bitcoind/open_channel_reorg_test
Messages: failed to assert block height delta
0725cb5
to
c9af5f9
Compare
c6819c0
to
3c60868
Compare
c9af5f9
to
2763c5a
Compare
3c60868
to
abda3ce
Compare
2763c5a
to
087b7d1
Compare
abda3ce
to
21082c5
Compare
087b7d1
to
07ae75c
Compare
21082c5
to
9374711
Compare
07ae75c
to
0fc74ad
Compare
9374711
to
e7beb0a
Compare
0fc74ad
to
e5ed4c9
Compare
e7beb0a
to
d03a8b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice itest maintenance work, code LGTM, just left nits and a question regarding how we achieve a clean plate after each itest now where you removed tearing down the stand-by nodes.
Also the itests don't run yet, which might be fixed by a rebase?
itest/lnd_custom_features.go
Outdated
@@ -13,6 +13,8 @@ import ( | |||
// sets. For completeness, it also asserts that features aren't set in places | |||
// where they aren't intended to be. | |||
func testCustomFeatures(ht *lntest.HarnessTest) { | |||
alice, bob := ht.Alice, ht.Bob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that adds nicely to readability 👍
@@ -109,7 +109,7 @@ func testEstimateRouteFee(ht *lntest.HarnessTest) { | |||
}, | |||
) | |||
|
|||
bobsPrivChannels := ht.Bob.RPC.ListChannels(&lnrpc.ListChannelsRequest{ | |||
bobsPrivChannels := mts.bob.RPC.ListChannels(&lnrpc.ListChannelsRequest{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
itest/lnd_psbt_test.go
Outdated
@@ -1596,6 +1598,9 @@ func sendAllCoinsToAddrType(ht *lntest.HarnessTest, | |||
// the channel opening. The psbt funding flow is used to simulate this behavior | |||
// because we can easily let the remote peer run into the timeout. | |||
func testPsbtChanFundingFailFlow(ht *lntest.HarnessTest) { | |||
alice := ht.Alice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe stick to alice, bob := ht.Alice, ht.Bob
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is later changed tho
itest/lnd_route_blinding_test.go
Outdated
@@ -349,11 +349,24 @@ func newBlindedForwardTest(ht *lntest.HarnessTest) (context.Context, | |||
func (b *blindedForwardTest) setupNetwork(ctx context.Context, | |||
withInterceptor bool) { | |||
|
|||
const chanAmt = btcutil.Amount(100000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I always like the 100_000
notation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool changed
@@ -63,10 +63,6 @@ func testChannelBalance(ht *lntest.HarnessTest) { | |||
|
|||
// Ensure Bob currently has no available balance within the channel. | |||
checkChannelBalance(bob, 0, amount-lntest.CalcStaticFee(cType, 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are channels and nodes now closed/shutdown to get a fresh state for the next test? Maybe I missed that in the review so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So from now on all the nodes are created in each test independently, and are destroyed when the test ends, which means no more standby nodes that will carry their states into other tests.
e5ed4c9
to
d6be04c
Compare
This commit adds a new flag to shuffle all the test cases before running them so tests which require lots of blocks to be mined are less likely to be run in the same tranch. The other benefit is this approach provides a more efficient way to figure which tests are broken since all the differnet backends are running different tranches in their builds, we can identify more failed tests in one push.
To make sure each run is shuffled, we use the action ID as the seed.
Prepares the upcoming refactor. We now never call `ht.Alice` directly, instead, we always init `alice := ht.Alice` so it's easier to see how they are removed in a following commit.
This commit removes the usage of the standby nodes and uses `CreateSimpleNetwork` when applicable. Also introduces a helper method `NewNodeWithCoins` to quickly start a node with funds.
This commit removes the standby nodes Alice and Bob.
Since we don't have standby nodes anymore, we don't need to close the channels when the test finishes. Previously we would do so to make sure the standby nodes have a clean state for the next test case, which is no longer relevant.
Soemtimes the node may be hanging for a while without being noticed, causing failures in its following tests, thus making the debugging extrememly difficult. We now assert the node has been shut down from the logs to assert the shutdown process behaves as expected.
This commit fixes a misuse of `ht.Subtest`, where the nodes should always be created inside the subtest.
This bug was hidden because we used standby nodes before, which always have more-than-necessary wallet utxos.
d03a8b2
to
dfa75eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying my question, LGTM 💾
This PR is resulted from fixing the itest failures in
blockbeat
. Since it's a critical system, we need to make sure the CI is indicative of reporting failures.Check #9260 for the final result.
Improvements
This PR does the following,
btcd
itest, previously it takes 45m and now it takes around 18m.Changes
standby nodes are removed. On one hand, this setup contributes to some of the flakes we've previously seen, e.g., checking num of edges. It creates quite a burden as all the tests are supposed to be independent, and this setup has created unnecessary states that need to be managed. On the other hand, this setup hides bugs, especially when it comes to how the wallet manages UTXOs since it always gives each standby node more than enough UTXOs to use, which is shown in one of the test failures found, in which the
ListCoins
gives incorrect result and is currently under investigation.tests are shuffled deterministically before running to maximize the tranche scheduler. Out of the 8 tranches in the CI, one of them takes much longer to run due to more blocks mined in this tranche. Since the time it takes to run the test is almost linear to the num of blocks mined, we now shuffle the test cases so it's more likely each tranche mines a similar num of blocks. This setup also gives a faster feedback loop during development, which was why I did it originally. Suppose one change caused 10 tests to fail, with shuffling, we are likely to catch all of them in one CI run, whereas we'd run the CI multiple times previously.
most of the table-driven tests are now broken into independent tests, mostly because we want them to be more maintainable, but also contribute to speeding up the CI as the more tests we have, the more likely we get even tranches.
Aside from the above structural changes, all the flakes are now documented, including the ones in windows (windows is less stable, unfortunately). We also assert the shutdown is clean (which contributes to some of the nasty flakes).
Dependent PRs
Issues fixed from this new itest setup, which are dependent PRs,
htlcswitch+routing: handle nil pointer dereference properly #9303
Reapply #8644 #9242
lnrpc: sort
Invoice.HTLCs
based onHtlcIndex
#9338lnd: stop
graphBuilder
during shutdown #9292lnwallet: log the amounts in the same unit #9291
peer+lnd: fix peer blocking on node shutdown #9275
multi: fix rpcclient shutdown #9261
trivial: prepare itest for
blockbeat
#9259chainntnfs: fix missing notifications #9258
routing: fix missionControlStore blocks on shutting down #9249
lntest: fix edge assertion and reset min relay fee #9248
itest: fix flake in
payment_failure_reason_canceled
#9228