Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added prober package to generalize readiness probes #1242

Merged
merged 5 commits into from
Jul 1, 2019

Conversation

FUSAKLA
Copy link
Member

@FUSAKLA FUSAKLA commented Jun 10, 2019

This PR adds prober package/component which will unify readiness and health checks of all thanos components.

For more info please refer to the PR #656

@FUSAKLA FUSAKLA force-pushed the fus-add-readiness-prober branch from cb9b27f to a58e510 Compare June 10, 2019 22:13
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for splitting. This would much much faster then in review!

Some initial comments, need to run home (:

@@ -6,6 +6,11 @@ import (
"github.com/improbable-eng/thanos/pkg/store/storepb"
)

// Component is a generic component interface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing trailing period

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, should be added

pkg/prober/prober.go Outdated Show resolved Hide resolved
pkg/prober/prober.go Outdated Show resolved Hide resolved
}

// SetLogger sets logger used by the Prober.
func (p *Prober) SetLogger(logger log.Logger) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was needed formerly when there was mutex for logger as well but you suggested to drop it and I just didn't notice there is no need for the getter/setter now without it.
Thanks for pointing out, will remove.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

p.logger = logger
}

func (p *Prober) getLogger() log.Logger {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, more info above

func NewProber(component component.Component, logger log.Logger) *Prober {
initialErr := fmt.Errorf(initialErrorText, component)
prober := &Prober{}
prober.SetComponent(component)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prober should not use public method TBH (:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just create prober with filled struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, just out of curiosity why should be this avoided? Is that generic thing in Golang not to use own public methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed also calls to SetNotReady and SetNotHealthy which caused that the mutex locking is gone but since it's the instantiation of the struct it should be ok I hope?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there is no need for this. Literally we will never in our code require dynamic reload of component. We don't have components that transforms into another components right?

Every bit of genercity is introducing complexity. Since we don't need that now (YAGNI rule), we can avoid that complexity, right? (:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for for the explanation :)

@FUSAKLA FUSAKLA force-pushed the fus-add-readiness-prober branch from b054f62 to c2961f1 Compare June 11, 2019 20:46
@FUSAKLA
Copy link
Member Author

FUSAKLA commented Jun 11, 2019

@bwplotka thanks fot the comments! Should be all resolved now I hope.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better (:

Still some comments! Let's aim for perfection.

// NewProber returns Prober representing readiness and healthiness of given component.
func NewProber(component component.Component, logger log.Logger) *Prober {
initialErr := fmt.Errorf(initialErrorFmt, component)
prober := &Prober{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this variable, IMO just return &Prober

return prober
}

// RegisterInRouter registers readiness and liveness probes to mux.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// RegisterInRouter registers readiness and liveness probes to mux.
// RegisterInRouter registers readiness and liveness probes to router.


// IsReady returns error if component is not ready and nil if it is.
func (p *Prober) IsReady() error {
p.readyMtx.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's Move to sync.RWMutex and do Rlock and RUnlock here (: There will be more readers than writerrs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks I didn't know there is also RWMutex, TIL :)

}
}

// IsReady returns error if component is not ready and nil if it is.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// IsReady returns error if component is not ready and nil if it is.
// IsReady returns error if component is not ready and nil otherwise


// SetReady sets components status to ready.
func (p *Prober) SetReady() {
if p.IsReady() != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong I think as those two methods are locked so you can have a race. IF you need this log line, please add this to SetNotReady

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, just so I understand where the race is. The IsReady locks but once finished it again unlocks it. And only after it returns and it does not return nil the SetNotReady is called which again locks and unlocks. Which part could go wrong here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, now I get it. Meanwhile I unlock it the status of prober could change.


// SetNotReady sets components status to not ready with given error as a cause.
func (p *Prober) SetNotReady(err error) {
if err != nil && p.IsReady() == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong from the same reason above, there can be a race.

Please lock here and then check p.readiness instead to check what to log. What do you think? (:


// SetHealthy sets components status to healthy.
func (p *Prober) SetHealthy() {
if p.IsHealthy() != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly the same issue as for Readiness.

Let's move to RWLock and log under lock as well (it's only on write so should be fine)

// HandleIfReady if probe is ready calls the function otherwise returns 503.
func (p *Prober) HandleIfReady(f http.HandlerFunc) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
p.readyMtx.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not reusing p.IsReady? (: In this case it's fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially that we don't know how long f is and what it does from this function perspecitve.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This could cause serious issues.

@FUSAKLA FUSAKLA force-pushed the fus-add-readiness-prober branch from c2961f1 to f4c6453 Compare June 16, 2019 18:52
@FUSAKLA
Copy link
Member Author

FUSAKLA commented Jun 16, 2019

@bwplotka thank you for all the valuable comments/suggestions and for answering all my stupid questions so I can learn from it, I really appreciate it.

Is it ok with you this way?

@bwplotka
Copy link
Member

No worries, just tell if you are rdy for next review iteration (: I think it's rdy now?

@FUSAKLA
Copy link
Member Author

FUSAKLA commented Jun 20, 2019

Yes, it should be ready I hope.

@FUSAKLA FUSAKLA force-pushed the fus-add-readiness-prober branch from f4c6453 to 06ed8e1 Compare July 1, 2019 17:43
@FUSAKLA
Copy link
Member Author

FUSAKLA commented Jul 1, 2019

@bwplotka if you find a while and could take a look at this?

It's rebased and I added metrics for observing the Prober state if that is ok with you

  • thanos_<component>_ready
  • thanos_<component>_healthy (wasn't sure about this one since it should be 1 whenever e serve metrics but on the other hand it felt weird not to expose it either)

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FUSAKLA 🎉 Amazing code right now 👍

Super clean and with attention to details - I love it.

I have some more suggestions let me know what you think. Otherwise LGTM! Thanks.

const (
healthyEndpointPath = "/-/healthy"
readyEndpointPath = "/-/ready"
okProbeFmt = "thanos %v is %v"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we inline all constants that are used only once? (: There is no point of those if they are not references more than once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, inlined them probably will be easier to read 👍

}

func (p *Prober) writeResponse(w http.ResponseWriter, probeFunc func() error, probeType string) {
err := probeFunc()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do if err := probeFn(); err != nil here (: Also we usually do Fn suffix for function vars so probeFn would be nice? (: What do you think?

http.Error(w, fmt.Sprintf(errorProbeFmt, p.component, probeType, err), probeErrorHTTPStatus)
return
}
if _, e := io.WriteString(w, fmt.Sprintf(okProbeFmt, p.component, probeType)); e == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please let's keep err as error variable name! (:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Becuse of introducing the new name there is a bug here -> you log "err", err not "err", e as it should be (:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for noticing, changed to err

healthiness: initialErr,
readiness: initialErr,
readyStateMetric: prometheus.NewGauge(prometheus.GaugeOpts{
Name: fmt.Sprintf("thanos_%s_ready", component),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH instead of that I would just do `prober_ready", and then:

  • have label component. This will allow for better, potential aggregations in future.
  • you can have just prober_ready and do not specify thanos prefix. This allows to reuse this package outside of Thanos even (why not)!

Thanos can then on caller side do prometheus.WrapRegistererWithPrefix("thanos" (:

What do you think?

same for prober_healthy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the naming a lot bud than looked to the rest of packages and found everywhere hardcoded the thanos prefix so I wanted to stay consistent but I agree your suggestion is more variable.

Applied as suggested

Copy link
Member

@bwplotka bwplotka Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we did not think about that in initial implementation, but now I would aim for no thanos_ prefix inside packages.

Ideally only main adds those

@bwplotka
Copy link
Member

bwplotka commented Jul 1, 2019

BTW nice idea with metric 👍

@FUSAKLA
Copy link
Member Author

FUSAKLA commented Jul 1, 2019

@bwplotka thanks for those comments! All should be resolved as you suggested.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for this 👍

@bwplotka bwplotka merged commit 3d30a90 into thanos-io:master Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants