-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cmd/opampsupervisor]: Persist the instance ID between restarts #32618
[cmd/opampsupervisor]: Persist the instance ID between restarts #32618
Conversation
// DirectoryOrDefault returns the configured storage directory if it was configured, | ||
// otherwise it returns the system default. | ||
func (s Storage) DirectoryOrDefault() string { | ||
if s.Directory == "" { | ||
switch runtime.GOOS { | ||
case "windows": | ||
// Windows default is "%ProgramData%\Otelcol\Supervisor" | ||
// If the ProgramData environment variable is not set, | ||
// it falls back to C:\ProgramData | ||
programDataDir := os.Getenv("ProgramData") | ||
if programDataDir == "" { | ||
programDataDir = `C:\ProgramData` | ||
} | ||
return filepath.Join(programDataDir, "Otelcol", "Supervisor") | ||
default: | ||
// Default for non-windows systems | ||
return "/var/lib/otelcol/supervisor" | ||
} | ||
} | ||
|
||
return s.Directory | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might make sense to move this to a separate PR, I think we'll need to create these directories if they don't exist too.
case <-s.doneChan: | ||
err := s.commander.Stop(context.Background()) | ||
if err != nil { | ||
s.logger.Error("Could not stop agent process", zap.Error(err)) | ||
} | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is due to a race condition I was bumping into with the e2e tests. Basically, the shutdown would happen while the command was in the middle of starting, so this goroutine would never exit (the command never exited), then Supervisor.Shutdown would never exit as a consequence.
Instead, we just handle the full lifecycle of the command all in the same goroutine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I agree with handling the full lifecycle in a single goroutine.
s.supervisorWG.Wait() | ||
|
||
if s.healthCheckTicker != nil { | ||
s.healthCheckTicker.Stop() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was another race condition, the health check ticker is recreated by the agent command goroutine, so we need to make sure that goroutine is exited before we try to stop it (otherwise we may leak a non-stopped ticker)
Thanks for opening this and fixing a few race conditions. Do you want a review while this is still a draft, or would you like to wait until it's marked "ready for review"? Do you think there's any overlap with #30807? |
d9cad2c
to
f1508a0
Compare
@evan-bradley It's ready for review now, I tend to open things in draft and come back to them with a fresh mind to look back over the PR, turns out it was a good idea because I caught a few weird naming things here 😆 I do think there's a small bit of overlap in #30807 - but I think it's only in the configuration. I think it makes sense to persist those statuses in different files, just because they are raw byte payloads that probably wouldn't be so useful in a structured yaml file like this. |
32ae098
to
b6b5829
Compare
case <-s.doneChan: | ||
err := s.commander.Stop(context.Background()) | ||
if err != nil { | ||
s.logger.Error("Could not stop agent process", zap.Error(err)) | ||
} | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I agree with handling the full lifecycle in a single goroutine.
4e70be0
to
c891001
Compare
2672219
to
b314426
Compare
@tigrannajaryan is this ready to be merged from your perspective? |
User-visible functionality LGTM. I didn't review the code, but @evan-bradley did so I believe it should be good to merge. |
Description:
Link to tracking Issue: Closes #21073
Testing:
Documentation: N/A