-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent: bounce session after failed status update #1539
agent: bounce session after failed status update #1539
Conversation
Current coverage is 53.72% (diff: 0.00%)@@ master #1539 diff @@
==========================================
Files 82 82
Lines 13432 13430 -2
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 7221 7215 -6
- Misses 5224 5227 +3
- Partials 987 988 +1
|
Confirmed fix with reproduction that always returns errors from
Before this patch, we get the following in the logs:
After applying this PR we get the following:
While it will spam the logs with this simple recreation because we keep reestablishing the session, note that we are clearly in the backoff loop ("agent: rebuild session"). |
@@ -342,11 +344,15 @@ func (s *session) close() error { | |||
case <-s.closed: | |||
return errSessionClosed | |||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the select
with the addition of this sync.Once
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Previously, failed calls on an agent session to UpdateTaskStatus weren't causing the agent to fall back into the session creation retry loop. Without the retry loop, the agent ends up pounding the dispatcher with status updates that will never drain. This PR closes the session on a fatal error, ensuring that the agent will fallback into a retry loop. Signed-off-by: Stephen J Day <[email protected]>
4f045f8
to
353423b
Compare
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Previously, failed calls on an agent session to UpdateTaskStatus weren't
causing the agent to fall back into the session creation retry loop.
Without the retry loop, the agent ends up pounding the dispatcher with
status updates that will never drain.
This PR closes the session on a fatal error, ensuring that the agent
will fallback into a retry loop.
Consider backporting this change to 1.12.2.
Signed-off-by: Stephen J Day [email protected]
cc @aaronlehmann
Closes #1533