-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
template: lock task handle before trying script check #23917
Conversation
4f0f72e
to
e09048c
Compare
// remove the reference to the task handle and update the template contents | ||
// again | ||
harness.manager.SetDriverHandle(nil) | ||
harness.consul.SetKV(t, key1, []byte(content1_3)) | ||
timeout = time.After(time.Duration(5*testutil.TestMultiplier()) * time.Second) | ||
|
||
OUTER2: | ||
for { | ||
select { | ||
case <-harness.mockHooks.RestartCh: | ||
t.Fatal(t, "restart not expected") | ||
case ev := <-harness.mockHooks.EmitEventCh: | ||
if strings.Contains( | ||
ev.DisplayMessage, "task driver handle is not available") { | ||
break OUTER2 | ||
} | ||
case <-harness.mockHooks.SignalCh: | ||
t.Fatal(t, "signal not expected") | ||
case <-timeout: | ||
t.Fatal(t, "should have received an event that task driver handle is unavailable") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: unfortunately this test doesn't really exercise the bug that must have happened in #23875, because we'd have to concurrently make the change between the nil check in processScript
and the subsequent Exec
call, which is a very short race. Locally this test doesn't add anything meaningful to the time it takes the test to complete (always around ~3.2s on my machine), but I simultaneously hate to have a test that doesn't really test anything and a bugfix without a new test. I could go either way on removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the missing lock for sure. I say leave the test, but I'm not a 100% sure I understand how this bug happens.
When we introduced change_mode=script to templates, we passed the driver handle down into the template manager so we could call its `Exec` method directly. But the lifecycle of the driver handle is managed by the taskrunner and isn't available when the template manager is first created. This has led to a series of patches trying to fixup the behavior (#15915, #15192, #23663, #23917). Part of the challenge in getting this right is using an interface to avoid the circular import of the driver handle. But the taskrunner already has a way to deal with this problem using a "lazy handle". The other template change modes already use this indirectly through the `Lifecycle` interface. Change the driver handle `Exec` call in the template manager to a new `Lifecycle.Exec` call that reuses the existing behavior. This eliminates the need for the template manager to know anything at all about the handle state. Fixes: #24051
When we introduced change_mode=script to templates, we passed the driver handle down into the template manager so we could call its `Exec` method directly. But the lifecycle of the driver handle is managed by the taskrunner and isn't available when the template manager is first created. This has led to a series of patches trying to fixup the behavior (#15915, #15192, #23663, #23917). Part of the challenge in getting this right is using an interface to avoid the circular import of the driver handle. But the taskrunner already has a way to deal with this problem using a "lazy handle". The other template change modes already use this indirectly through the `Lifecycle` interface. Change the driver handle `Exec` call in the template manager to a new `Lifecycle.Exec` call that reuses the existing behavior. This eliminates the need for the template manager to know anything at all about the handle state. Fixes: #24051
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
In #23663 we fixed the template hook so that
change_mode="script"
didn't lose track of the task handle during restores. But this revealed a second bug which is that access to the handle is not locked while in use, which can allow it to be removed concurrently.Fixes: #23875
Note for reviewers: we're still waiting on we can get some reproduction hints from the reporter, but the missing lock is clearly a bug anyways.