Thanos Ruler fails to evaluate all recording rules correctly #4924

sharathfeb12 · 2021-12-06T00:44:58Z

I am currently running Thanos v0.24.0-rc.0.

Few recording rules are evaluated fine while few recording rules seems to be last evaluated 2 days back. This happens very frequently. Restarting the pod fixes the issue temporarily. This issue is reproducible on v0.23.0 as well.

Here is the args passed to the ruler:

- rule
- --log.level=info
- --log.format=logfmt
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --objstore.config=$(OBJSTORE_CONFIG)
- --data-dir=/thanos/data
- --eval-interval=1m
- --label=rule_replica="$(NAME)"
- --alert.label-drop=rule_replica
- --remote-write.config-file=/etc/thanos/conf/rw-config.yaml
- --query=dnssrv+_http._tcp.observatorium-thanos-query.monitoring.svc.cluster.local
- --rule-file=/etc/thanos/rules/*/*.yaml

The text was updated successfully, but these errors were encountered:

GiedriusS · 2021-12-06T09:50:04Z

Hello, could you please dump the goroutine stacks when this happens and upload them?

bwplotka · 2021-12-06T20:20:31Z

Thanks for reporting! pprof profiles available at /debug/pprof/goroutines done at the moment of things being stuck, would be super helpful!

jleloup · 2021-12-07T00:21:16Z

Isn't this issue similar to #4772 ?

jleloup · 2021-12-07T00:30:42Z

I think the .pprof file I uploaded there is actually more relevant in this issue as the behaviour I got in my Thanos Ruler is more comparable to this issue as restarting those pods helped only for some dozens of minutes before failing again to process records.

Link to the .pprof: #4772 (comment)

ahurtaud · 2022-01-05T09:08:20Z

I have the same issue on v0.24.0
Here the goroutine pprof after long time being stuck:
ruler-goroutine.zip

stale · 2022-04-17T04:54:46Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2022-05-01T16:12:22Z

Closing for now as promised, let us know if you need this to be reopened! 🤗

GiedriusS added bug component: rule needs-more-info labels Dec 6, 2021

ahurtaud mentioned this issue Feb 10, 2022

Ruler not evaluating any rules #4772

Open

stale bot added the stale label Apr 17, 2022

stale bot closed this as completed May 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thanos Ruler fails to evaluate all recording rules correctly #4924

Thanos Ruler fails to evaluate all recording rules correctly #4924

sharathfeb12 commented Dec 6, 2021

GiedriusS commented Dec 6, 2021

bwplotka commented Dec 6, 2021

jleloup commented Dec 7, 2021

jleloup commented Dec 7, 2021

ahurtaud commented Jan 5, 2022 •

edited

Loading

stale bot commented Apr 17, 2022

stale bot commented May 1, 2022

Thanos Ruler fails to evaluate all recording rules correctly #4924

Thanos Ruler fails to evaluate all recording rules correctly #4924

Comments

sharathfeb12 commented Dec 6, 2021

GiedriusS commented Dec 6, 2021

bwplotka commented Dec 6, 2021

jleloup commented Dec 7, 2021

jleloup commented Dec 7, 2021

ahurtaud commented Jan 5, 2022 • edited Loading

stale bot commented Apr 17, 2022

stale bot commented May 1, 2022

ahurtaud commented Jan 5, 2022 •

edited

Loading