Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to fix synchronization in SemaphoreStep #258

Merged
merged 1 commit into from
Mar 29, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -189,13 +189,16 @@ public static class Execution extends AbstractStepExecutionImpl {
Object returnValue = null;
Throwable error = null;
boolean success = false, failure = false, sync = true;
String c = Jenkins.XSTREAM.toXML(getContext());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just keeping this out of the synchronized block as before.

synchronized (s) {
if (s.returnValues.containsKey(k)) {
success = true;
returnValue = s.returnValues.get(k);
} else if (s.errors.containsKey(k)) {
failure = true;
error = s.errors.get(k);
} else {
s.contexts.put(k, c);
Copy link
Member Author

@dwnusbaum dwnusbaum Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix. Other threads now only ever see State in one of two modes: either contexts does not contain the relevant key, and returnValues and errors should be used and will be checked here, or contexts does contain the key, and the StepContext should be used instead. Previously, there was a case where contexts did not contain the key yet but returnValues and errors had already been checked.

}
}
if (success) {
Expand All @@ -206,10 +209,6 @@ public static class Execution extends AbstractStepExecutionImpl {
getContext().onFailure(error);
} else {
LOGGER.info(() -> "Blocking " + k);
String c = Jenkins.XSTREAM.toXML(getContext());
synchronized (s) {
s.contexts.put(k, c);
}
sync = false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think there may be other pre-existing bugs here. What happens if another thread calls success or failure after the first synchronized block above but before this completes? Those threads will call StepContext.onSuccess/StepContext.onFailure before StepExecution.start has completed, which seems unusual, but maybe it's fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should change the overall approach, drop support for the synchronous completion cases, and have the step just poll continuously on a background thread looking for results. It would be slower and less efficient, but it also seems less prone to complex synchronization issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should change the overall approach

FWIW, I filed #259 with an example of what that could look like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those threads will call StepContext.onSuccess/StepContext.onFailure before StepExecution.start has completed

That would be normal if false is returned. It would be weird if start later returns true, but I do not think it is harmful. TBH I do not recall the logic here in workflow-cps very well. Anyway this seems unlikely in a normal test, because it should call wait and then do some stuff before calling success.

}
synchronized (s) {
Expand Down