-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Heartbeat] Monitor Retries #36147
[Heartbeat] Monitor Retries #36147
Conversation
Pinging @elastic/uptime (Team:Uptime) |
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
@emilioalvap thanks for the review feedback, I think I've addressed your concerns. Now if I can make the linter happy 🤞 I think the build may go green soon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One last concern, changes to enrich.go
we discussed the other day about emitting summaries on cmd/status
only don't seem to have been included yet:
return je.createSummary(event) |
@@ -17,7 +17,6 @@ import ( | |||
type loaderDB struct { | |||
keysToState map[string]*monitorstate.State | |||
mtx *sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this needs to be a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm far from an expert in Heartbeat / Synthetics so I didn't review the logic in this PR. But I made a few general Golang code review comments.
@ycombinator thanks for the review, I think I've addressed all of your comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -91,7 +91,7 @@ func (s *Summarizer) Wrap(j jobs.Job) jobs.Job { | |||
return func(event *beat.Event) ([]jobs.Job, error) { | |||
conts, jobErr := j(event) | |||
|
|||
_, _ = event.PutValue("monitor.check_group", s.checkGroup) | |||
_, _ = event.PutValue("monitor.check_group", fmt.Sprintf("%s-%d", s.checkGroup, s.jobSummary.Attempt)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc Sorry for the delayed note on this PR, Would this break the existing uptime UI in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the UI treats the UUID as opaque values, as does ES (it's a keyword type, there's no dedicated UUID type).
Adds retries to Heartbeat monitors. Part of elastic/synthetics#792 This refactors a ton of code around summarizing events, and cleans up a lot of tech debt as well.
After experimenting around a bit, this gets retries working, using code at the monitor level. This also really neatly refactors the summary and state wrappers into much easier to understand code. This is an experiment only. Code is still pretty ugly and we should reorganize parameters and objects into something neater than what's here.
The main question this seeks to answer is where we should put this logic.
Sample output of:
Why is it important?
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs