-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingester: split push and read circuit breakers #8315
Conversation
Signed-off-by: Yuri Nikolic <[email protected]>
Signed-off-by: Yuri Nikolic <[email protected]>
Signed-off-by: Yuri Nikolic <[email protected]>
e3a12e2
to
e5bedfd
Compare
Signed-off-by: Yuri Nikolic <[email protected]>
Signed-off-by: Yuri Nikolic <[email protected]>
…it() Signed-off-by: Yuri Nikolic <[email protected]>
Signed-off-by: Yuri Nikolic <[email protected]>
Signed-off-by: Yuri Nikolic <[email protected]>
3037b08
to
887bd11
Compare
Signed-off-by: Yuri Nikolic <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see much value for adding *ingesterCircuitBreaker
, it makes everything more cumbersome.
I’d suggest to drop ingesterCircuitBreaker
type, and simply use two *circuitBreaker
implementations directly from ingester. If you want to use callback to release permit, then do it at (cb *circuitBreaker) tryAcquirePermit()
method, not in (cb *ingesterCircuitBreaker) tryPushAcquirePermit()
.
I’d also make newCircuitBreakerMetrics
work with single circuit breaker only (by adding ConstLabel request_type
to all metrics — we can register multiple metrics with same name/help/labels, but different const label. example.
WDYT?
pkg/ingester/circuitbreaker.go
Outdated
) | ||
|
||
type circuitBreakerMetrics struct { | ||
circuitBreakerTransitions *prometheus.CounterVec | ||
circuitBreakerResults *prometheus.CounterVec | ||
} | ||
|
||
func newCircuitBreakerMetrics(r prometheus.Registerer, currentStateFn func() circuitbreaker.State) *circuitBreakerMetrics { | ||
func newCircuitBreakerMetrics(r prometheus.Registerer, currentStateFn func(string) circuitbreaker.State, requestTypes []string) *circuitBreakerMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
func newCircuitBreakerMetrics(r prometheus.Registerer, currentStateFn func(string) circuitbreaker.State, requestTypes []string) *circuitBreakerMetrics { | |
func newCircuitBreakerMetrics(r prometheus.Registerer, currentState func(string) circuitbreaker.State, requestTypes []string) *circuitBreakerMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed all the occurrences of Fn in this source
pkg/ingester/circuitbreaker.go
Outdated
} | ||
|
||
circuitBreakerTransitionsCounterFn := func(metrics *circuitBreakerMetrics, state circuitbreaker.State) prometheus.Counter { | ||
return metrics.circuitBreakerTransitions.WithLabelValues(state.String()) | ||
circuitBreakerTransitionsCounterFn := func(metrics *circuitBreakerMetrics, requestType string, state circuitbreaker.State) prometheus.Counter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
circuitBreakerTransitionsCounterFn := func(metrics *circuitBreakerMetrics, requestType string, state circuitbreaker.State) prometheus.Counter { | |
circuitBreakerTransitionsCounter := func(metrics *circuitBreakerMetrics, requestType string, state circuitbreaker.State) prometheus.Counter { |
In a slack conversation @pstibrany and I agreed to keep the |
pkg/ingester/circuitbreaker.go
Outdated
} | ||
circuitBreakerCurrentStateGaugeFn := func(state circuitbreaker.State) prometheus.GaugeFunc { | ||
circuitBreakerCurrentStateGaugeFn := func(requestType string, state circuitbreaker.State) prometheus.GaugeFunc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
circuitBreakerCurrentStateGaugeFn := func(requestType string, state circuitbreaker.State) prometheus.GaugeFunc { | |
circuitBreakerCurrentStateGauge := func(requestType string, state circuitbreaker.State) prometheus.GaugeFunc { |
or
circuitBreakerCurrentStateGaugeFn := func(requestType string, state circuitbreaker.State) prometheus.GaugeFunc { | |
circuitBreakerCurrentStateGaugeFunc := func(requestType string, state circuitbreaker.State) prometheus.GaugeFunc { |
If we want to be consistent with the type name.
pkg/ingester/circuitbreaker.go
Outdated
} | ||
return func(duration time.Duration, err error) { | ||
if pushAcquiredPermit { | ||
_ = cb.push.finishRequest(duration, cb.push.cfg.RequestTimeout, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to pass to cb.push.foo()
a config from cb.push.cfg
?
I think we can refactor finishRequest
to read it's own config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good catch. Thank you
pkg/ingester/ingester.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
|
||
finishReadRequest := func(err error) { | ||
if acquiredCircuitBreakerPermit { | ||
i.circuitBreaker.finishReadRequest(time.Since(start), err) | ||
if finish != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finish
can't be nil, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right
pkg/ingester/ingester.go
Outdated
@@ -1004,8 +1006,8 @@ func (i *Ingester) FinishPushRequest(ctx context.Context) { | |||
if st.requestSize > 0 { | |||
i.inflightPushRequestsBytes.Sub(st.requestSize) | |||
} | |||
if st.acquiredCircuitBreakerPermit { | |||
i.circuitBreaker.finishPushRequest(st.requestDuration, st.pushErr) | |||
if st.requestFinish != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are, again, right. It cannot be nil.
Signed-off-by: Yuri Nikolic <[email protected]>
Signed-off-by: Yuri Nikolic <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't checked tests yet, but code changes look good to me.
pkg/ingester/circuitbreaker.go
Outdated
// tryPushAcquirePermit tries to acquire a permit to use the push circuit breaker and returns whether a permit was acquired. | ||
// If it was possible, tryPushAcquirePermit returns a function that should be called to release the acquired permit. | ||
// If it was not possible, the causing error is returned. | ||
func (cb *ingesterCircuitBreaker) tryPushAcquirePermit() (func(time.Duration, error), error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:do we need the comment here?
// If it was not possible, the causing error is returned. | ||
func (cb *ingesterCircuitBreaker) tryReadAcquirePermit() (func(time.Duration, error), error) { | ||
// If the read circuit breaker is not active, we don't try to acquire a permit. | ||
if !cb.read.isActive() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is now checked twice, right? once here, second time in tryAcquirePermit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's true
pkg/ingester/circuitbreaker.go
Outdated
// tryReadAcquirePermit tries to acquire a permit to use the read circuit breaker and returns whether a permit was acquired. | ||
// If it was possible, tryReadAcquirePermit returns a function that should be called to release the acquired permit. | ||
// If it was not possible, the causing error is returned. | ||
func (cb *ingesterCircuitBreaker) tryReadAcquirePermit() (func(time.Duration, error), error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we simplify comment and just say that it's also checking push CB (but doesn't get permit)
pkg/ingester/circuitbreaker.go
Outdated
// tryPushAcquirePermit tries to acquire a permit to use the push circuit breaker and returns whether a permit was acquired. | ||
// If it was possible, tryPushAcquirePermit returns a function that should be called to release the acquired permit. | ||
// If it was not possible, the causing error is returned. | ||
func (cb *ingesterCircuitBreaker) tryPushAcquirePermit() (func(time.Duration, error), error) { | ||
return cb.push.tryAcquirePermit() | ||
} | ||
|
||
// tryReadAcquirePermit tries to acquire a permit to use the read circuit breaker and returns whether a permit was acquired. | ||
// If it was possible, tryReadAcquirePermit returns a function that should be called to release the acquired permit. | ||
// If it was not possible, the causing error is returned. | ||
func (cb *ingesterCircuitBreaker) tryReadAcquirePermit() (func(time.Duration, error), error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// tryPushAcquirePermit tries to acquire a permit to use the push circuit breaker and returns whether a permit was acquired. | |
// If it was possible, tryPushAcquirePermit returns a function that should be called to release the acquired permit. | |
// If it was not possible, the causing error is returned. | |
func (cb *ingesterCircuitBreaker) tryPushAcquirePermit() (func(time.Duration, error), error) { | |
return cb.push.tryAcquirePermit() | |
} | |
// tryReadAcquirePermit tries to acquire a permit to use the read circuit breaker and returns whether a permit was acquired. | |
// If it was possible, tryReadAcquirePermit returns a function that should be called to release the acquired permit. | |
// If it was not possible, the causing error is returned. | |
func (cb *ingesterCircuitBreaker) tryReadAcquirePermit() (func(time.Duration, error), error) { | |
// tryAcquirePushPermit tries to acquire a permit to use the push circuit breaker and returns whether a permit was acquired. | |
// If it was possible, tryAcquirePushPermit returns a function that should be called to release the acquired permit. | |
// If it was not possible, the causing error is returned. | |
func (cb *ingesterCircuitBreaker) tryAcquirePushPermit() (func(time.Duration, error), error) { | |
return cb.push.tryAcquirePermit() | |
} | |
// tryAcquireReadPermit tries to acquire a permit to use the read circuit breaker and returns whether a permit was acquired. | |
// If it was possible, tryAcquireReadPermit returns a function that should be called to release the acquired permit. | |
// If it was not possible, the causing error is returned. | |
func (cb *ingesterCircuitBreaker) tryAcquireReadPermit() (func(time.Duration, error), error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the method names. They actually sound better.
pkg/ingester/ingester.go
Outdated
@@ -356,7 +358,7 @@ type Ingester struct { | |||
ingestPartitionID int32 | |||
ingestPartitionLifecycler *ring.PartitionInstanceLifecycler | |||
|
|||
circuitBreaker *circuitBreaker | |||
circuitBreaker *ingesterCircuitBreaker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need for this to be a pointer, and no need to return a pointer from newIngesterCircuitBreaker
: if we just hold a ingesterCircuitBreaker
struct here, we'll also handle the zero-valued case gracefully (as nil cb's inside of ingesterCircuitBreaker
are already valid)
Signed-off-by: Yuri Nikolic <[email protected]>
* Ingester: splitting push and read circuit breakers Signed-off-by: Yuri Nikolic <[email protected]> * Improving TestIngester_StartReadRequest Signed-off-by: Yuri Nikolic <[email protected]> * Updating documentation Signed-off-by: Yuri Nikolic <[email protected]> * Fixing lint issues Signed-off-by: Yuri Nikolic <[email protected]> * Fixing documentation issues Signed-off-by: Yuri Nikolic <[email protected]> * Do not call tryAcquirePermit() on the push cb from tryReadAcquirePermit() Signed-off-by: Yuri Nikolic <[email protected]> * Rename prCircuitBreaker into ingesterCircuitBreaker Signed-off-by: Yuri Nikolic <[email protected]> * Rename label name path to request_type and label value write to push Signed-off-by: Yuri Nikolic <[email protected]> * Do not allow acquiring permit if cbs are not active Signed-off-by: Yuri Nikolic <[email protected]> * Fixing review findings Signed-off-by: Yuri Nikolic <[email protected]> * Fixing review findings Signed-off-by: Yuri Nikolic <[email protected]> * Fixing review findings Signed-off-by: Yuri Nikolic <[email protected]> --------- Signed-off-by: Yuri Nikolic <[email protected]>
* Ingester: splitting push and read circuit breakers Signed-off-by: Yuri Nikolic <[email protected]> * Improving TestIngester_StartReadRequest Signed-off-by: Yuri Nikolic <[email protected]> * Updating documentation Signed-off-by: Yuri Nikolic <[email protected]> * Fixing lint issues Signed-off-by: Yuri Nikolic <[email protected]> * Fixing documentation issues Signed-off-by: Yuri Nikolic <[email protected]> * Do not call tryAcquirePermit() on the push cb from tryReadAcquirePermit() Signed-off-by: Yuri Nikolic <[email protected]> * Rename prCircuitBreaker into ingesterCircuitBreaker Signed-off-by: Yuri Nikolic <[email protected]> * Rename label name path to request_type and label value write to push Signed-off-by: Yuri Nikolic <[email protected]> * Do not allow acquiring permit if cbs are not active Signed-off-by: Yuri Nikolic <[email protected]> * Fixing review findings Signed-off-by: Yuri Nikolic <[email protected]> * Fixing review findings Signed-off-by: Yuri Nikolic <[email protected]> * Fixing review findings Signed-off-by: Yuri Nikolic <[email protected]> --------- Signed-off-by: Yuri Nikolic <[email protected]> (cherry picked from commit ee069d2)
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-8315-to-r294 origin/r294
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x ee069d247c7c22848ee668027c432a97e120cd9f
# Push it to GitHub
git push --set-upstream origin backport-8315-to-r294
git switch main
# Remove the local backport branch
git branch -D backport-8315-to-r294 Then, create a pull request where the |
* Ingester: splitting push and read circuit breakers Signed-off-by: Yuri Nikolic <[email protected]> * Improving TestIngester_StartReadRequest Signed-off-by: Yuri Nikolic <[email protected]> * Updating documentation Signed-off-by: Yuri Nikolic <[email protected]> * Fixing lint issues Signed-off-by: Yuri Nikolic <[email protected]> * Fixing documentation issues Signed-off-by: Yuri Nikolic <[email protected]> * Do not call tryAcquirePermit() on the push cb from tryReadAcquirePermit() Signed-off-by: Yuri Nikolic <[email protected]> * Rename prCircuitBreaker into ingesterCircuitBreaker Signed-off-by: Yuri Nikolic <[email protected]> * Rename label name path to request_type and label value write to push Signed-off-by: Yuri Nikolic <[email protected]> * Do not allow acquiring permit if cbs are not active Signed-off-by: Yuri Nikolic <[email protected]> * Fixing review findings Signed-off-by: Yuri Nikolic <[email protected]> * Fixing review findings Signed-off-by: Yuri Nikolic <[email protected]> * Fixing review findings Signed-off-by: Yuri Nikolic <[email protected]> --------- Signed-off-by: Yuri Nikolic <[email protected]> (cherry picked from commit ee069d2)
* Ingester: splitting push and read circuit breakers Signed-off-by: Yuri Nikolic <[email protected]> * Improving TestIngester_StartReadRequest Signed-off-by: Yuri Nikolic <[email protected]> * Updating documentation Signed-off-by: Yuri Nikolic <[email protected]> * Fixing lint issues Signed-off-by: Yuri Nikolic <[email protected]> * Fixing documentation issues Signed-off-by: Yuri Nikolic <[email protected]> * Do not call tryAcquirePermit() on the push cb from tryReadAcquirePermit() Signed-off-by: Yuri Nikolic <[email protected]> * Rename prCircuitBreaker into ingesterCircuitBreaker Signed-off-by: Yuri Nikolic <[email protected]> * Rename label name path to request_type and label value write to push Signed-off-by: Yuri Nikolic <[email protected]> * Do not allow acquiring permit if cbs are not active Signed-off-by: Yuri Nikolic <[email protected]> * Fixing review findings Signed-off-by: Yuri Nikolic <[email protected]> * Fixing review findings Signed-off-by: Yuri Nikolic <[email protected]> * Fixing review findings Signed-off-by: Yuri Nikolic <[email protected]> --------- Signed-off-by: Yuri Nikolic <[email protected]> (cherry picked from commit ee069d2) Co-authored-by: Đurica Yuri Nikolić <[email protected]>
What this PR does
In #8180 and #8285 we introduced an ingester server-side circuit breaker for both push and read requests together. This PR splits the introduced circuit breaker into 2 different circuit breakers: one for push and the other for read requests only.
They can be enabled by
-ingester.push-circuit-breaker.enabled
and-ingester.read-circuit-breaker.enabled
configuration options. Other configuration options are:-
-ingester.push-circuit-breaker.failure-threshold-percentage
-
-ingester.push-circuit-breaker.failure-execution-threshold
-
-ingester.push-circuit-breaker.thresholding-period
-
-ingester.push-circuit-breaker.cooldown-period
-
-ingester.push-circuit-breaker.initial-delay
-
-ingester.push-circuit-breaker.request-timeout
-
-ingester.read-circuit-breaker.failure-threshold-percentage
-
-ingester.read-circuit-breaker.failure-execution-threshold
-
-ingester.read-circuit-breaker.thresholding-period
-
-ingester.read-circuit-breaker.cooldown-period
-
-ingester.read-circuit-breaker.initial-delay
-
-ingester.read-circuit-breaker.request-timeout
The following experimental configuration options introduced in #8180 and #8285 have been removed:
-
-ingester.push-circuit-breaker.enabled
-
-ingester.push-circuit-breaker.failure-threshold-percentage
-
-ingester.push-circuit-breaker.failure-execution-threshold
-
-ingester.push-circuit-breaker.thresholding-period
-
-ingester.push-circuit-breaker.cooldown-period
-
-ingester.push-circuit-breaker.initial-delay
-
-ingester.push-circuit-breaker.push-timeout
-
-ingester.push-circuit-breaker.read-timeout
Which issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.