Break grabLockOrStop into two pieces to facilitate investigating deadlocks #17187

ncabatoff · 2022-09-19T13:37:26Z

Without this change, the "grab" locking goroutine looks the same regardless of who was calling grabLockOrStop, so there's no way to identify one of the deadlock parties.

Example of the confusion we're trying to eliminate:

=== CONT  TestSecret_TokenAccessor/auth
POTENTIAL DEADLOCK:
Previous place where the lock was grabbed
goroutine 8799 lock 0xc000ccd298
../../init.go:230 vault.(*Core).Initialize { c.stateLock.Lock() } <<<<<
../../testing.go:341 vault.TestCoreInitClusterWrapperSetup { result, err := core.Initialize(context.Background(), initParams) }
../../testing.go:2037 vault.(*TestCluster).initCores { bKeys, rKeys, root := TestCoreInitClusterWrapperSetup(t, leader.Core, leader.Handler) }
../../testing.go:1759 vault.NewTestCluster { testCluster.initCores(t, opts, addAuditBackend) }
api_integration_test.go:61 api.testVaultServerCoreConfig { cluster := vault.NewTestCluster(benchhelpers.TBtoT(t), coreConfig, &vault.TestClusterOptions{ }
api_integration_test.go:36 api.testVaultServerUnseal { return testVaultServerCoreConfig(t, &vault.CoreConfig{ }
api_integration_test.go:27 api.testVaultServer { client, _, closer := testVaultServerUnseal(t) }
secret_test.go:1005 api.TestSecret_TokenPolicies.func3 { client, closer := testVaultServer(t) }

Have been trying to lock it again for more than 30s
goroutine 10172 lock 0xc000ccd298
../../ha.go:670 vault.grabLockOrStop.func1 { lockFunc() } <<<<<

…locks. Without this change, the "grab" goroutine looks the same regardless of who was calling grabLockOrStop, so there's no way to identify one of the deadlock parties.

raskchanky · 2022-09-19T16:59:25Z

It looks like there's a test using the old way of checking grabLockOrStop() here. Does that need to be updated to be called the new way?

ncabatoff · 2022-09-20T12:37:35Z

It looks like there's a test using the old way of checking grabLockOrStop() here. Does that need to be updated to be called the new way?

No, I kept the old func around for now, in part so I don't break ent when we merge to it. I might get rid of it in another pass later.

hghaf099

Looks good to me. I just posted a nit suggestion.

hghaf099 · 2022-09-20T14:11:16Z

vault/ha.go

+	defer close(l.doneCh)
+	l.lockFunc()
+
+	// The parent goroutine may or may not be waiting.


Would it worth adding godoc for this function explaining how this function should be used? Although all the examples usages show that, it is not clear maybe for a future developer as to why this function should be used in a separate go routine.

Good idea, added.

raskchanky · 2022-09-20T16:00:05Z

It looks like there's a test using the old way of checking grabLockOrStop() here. Does that need to be updated to be called the new way?

No, I kept the old func around for now, in part so I don't break ent when we merge to it. I might get rid of it in another pass later.

Right, I noticed that the function itself is still there, but the way the function is used has changed, no? Previously it was:

stopped := grabLockOrStop(c.stateLock.RLock, c.stateLock.RUnlock, stopCh)
if stopped {
    // something
}

And now it's:

l := newLockGrabber(c.stateLock.RLock, c.stateLock.RUnlock, stopCh)
go l.grab()
if stopped := l.lockOrStop(); stopped {
    // something
}

Everything in this PR uses the new way and that test still uses the old way.

ncabatoff added 3 commits September 19, 2022 09:34

Break grabLockOrStop into two pieces to facilitate investigating dead…

3d29f54

…locks. Without this change, the "grab" goroutine looks the same regardless of who was calling grabLockOrStop, so there's no way to identify one of the deadlock parties.

Add CL

4e560bd

Add stopCh

5c0a12b

mpalmi added the core Issues and Pull-Requests specific to Vault Core label Sep 19, 2022

mpalmi requested a review from a team September 20, 2022 13:06

mpalmi added the enhancement label Sep 20, 2022

ncabatoff removed the enhancement label Sep 20, 2022

hghaf099 approved these changes Sep 20, 2022

View reviewed changes

Add godoc

1be9b24

ncabatoff enabled auto-merge (squash) September 20, 2022 14:41

ncabatoff disabled auto-merge September 20, 2022 15:02

ncabatoff merged commit cbbf1a5 into main Sep 20, 2022

ncabatoff deleted the split-grabLockOrStop branch September 20, 2022 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break grabLockOrStop into two pieces to facilitate investigating deadlocks #17187

Break grabLockOrStop into two pieces to facilitate investigating deadlocks #17187

ncabatoff commented Sep 19, 2022

raskchanky commented Sep 19, 2022

ncabatoff commented Sep 20, 2022

hghaf099 left a comment

hghaf099 Sep 20, 2022

ncabatoff Sep 20, 2022

raskchanky commented Sep 20, 2022

Break grabLockOrStop into two pieces to facilitate investigating deadlocks #17187

Break grabLockOrStop into two pieces to facilitate investigating deadlocks #17187

Conversation

ncabatoff commented Sep 19, 2022

raskchanky commented Sep 19, 2022

ncabatoff commented Sep 20, 2022

hghaf099 left a comment

Choose a reason for hiding this comment

hghaf099 Sep 20, 2022

Choose a reason for hiding this comment

ncabatoff Sep 20, 2022

Choose a reason for hiding this comment

raskchanky commented Sep 20, 2022