Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rule: fix panic when calling API /api/v1/rules?type=alert #6189

Merged
merged 1 commit into from
Mar 7, 2023

Conversation

thib-ack
Copy link
Contributor

@thib-ack thib-ack commented Mar 6, 2023

Hello,

Sometimes, when I open the /alerts page on Ruler web UI, I get an error. When it happens, the situation is irremediable and I have to restart the Ruler component completely.. As you can see in the stacktrace linked below, the error look like and old issue fixed in #2925 . After digging, I think this is linked to the AlertInstance.ActiveAt date, which is not 'formatted' to UTC() before protobuf encoding like the ones in the old PR.

I tried to produce a testcase to reproduce this, but unfortunately I failed to find the exact combination of events...
I think this has someting to do with the reload process of Ruler (either SIGHUP or POST /-/reload) which might be copying the alerts and losing/changing the time.Time Locations (which is the real problem with protobuf encoding)..

Here is the full stacktrace from server:

2023/03/03 14:48:44 http: panic serving 10.10.90.171:64321: merger not found for type:int
goroutine 18109 [running]:
net/http.(*conn).serve.func1()
#011/usr/lib/go-1.19/src/net/http/server.go:1850 +0xbf
panic({0x2184f60, 0xc001afbc50})
#011/usr/lib/go-1.19/src/runtime/panic.go:890 +0x262
github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo(0xc001d70d80)
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:662 +0xe85
github.com/gogo/protobuf/proto.(*mergeInfo).merge(0xc001d70d80, {0xc0001029c0?}, {0x2145420?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:113 +0x58
github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo.func27({0x40d95f?}, {0x3f5560?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:545 +0x165
github.com/gogo/protobuf/proto.(*mergeInfo).merge(0xc001d70d00, {0xc0012f01c0?}, {0x25bf1e0?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:139 +0x305
github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo.func30({0x40d95f?}, {0xc000670a20?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:587 +0x8b
github.com/gogo/protobuf/proto.(*mergeInfo).merge(0xc001d70940, {0xc000b8f278?}, {0x94cd06?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:139 +0x305
github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo.func30({0x40d95f?}, {0xc000596340?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:587 +0x8b
github.com/gogo/protobuf/proto.(*mergeInfo).merge(0xc001d70a40, {0xc001e084e0?}, {0x1?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:139 +0x305
github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo.func29({0x1609a0?}, {0x0?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:567 +0xf2
github.com/gogo/protobuf/proto.(*mergeInfo).merge(0xc001d70980, {0x2356140?}, {0x4?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:139 +0x305
github.com/gogo/protobuf/proto.(*InternalMessageInfo).Merge(0x40b8bd?, {0x2bee3b0, 0xc0007ace40}, {0x2bee3b0, 0xc001cfc300})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:50 +0xb6
github.com/thanos-io/thanos/pkg/rules/rulespb.(*Alert).XXX_Merge(0x3e4fec0?, {0x2bee3b0?, 0xc001cfc300?})
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/rules/rulespb/rpc.pb.go:486 +0x3a
github.com/gogo/protobuf/proto.Merge({0x2bee3b0?, 0xc0007ace40}, {0x2bee3b0?, 0xc001cfc300})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/clone.go:95 +0x4a3
github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo.func32({0x40d95f?}, {0x62c5a0?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:652 +0x686
github.com/gogo/protobuf/proto.(*mergeInfo).merge(0xc001d70900, {0xc001afbc40?}, {0x8?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:139 +0x305
github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo.func29({0x3fbce49454356161?}, {0x408c200000000000?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:567 +0xf2
github.com/gogo/protobuf/proto.(*mergeInfo).merge(0xc001d70840, {0x418a880?}, {0x2?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:139 +0x305
github.com/gogo/protobuf/proto.(*InternalMessageInfo).Merge(0x40b8bd?, {0x2bee470, 0xc0012f0150}, {0x2bee470, 0xc0012f00e0})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/table_merge.go:50 +0xb6
github.com/thanos-io/thanos/pkg/rules/rulespb.(*RuleGroup).XXX_Merge(0x3e4fec0?, {0x2bee470?, 0xc0012f00e0?})
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/rules/rulespb/rpc.pb.go:310 +0x3a
github.com/gogo/protobuf/proto.Merge({0x2bee470?, 0xc0012f0150}, {0x2bee470?, 0xc0012f00e0})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/clone.go:95 +0x4a3
github.com/gogo/protobuf/proto.Clone({0x2bee470?, 0xc0012f00e0?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/[email protected]/proto/clone.go:52 +0x1a5
github.com/thanos-io/thanos/pkg/rules.(*Manager).Rules(0xc000bae300, 0xc000bd0080, {0x2c02670, 0xc001c3c060})
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/rules/manager.go:409 +0x219
github.com/thanos-io/thanos/pkg/rules.(*GRPCClient).Rules(0xc0001721c8, {0x2bf4408?, 0xc001d8c450?}, 0xc000bd0080)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/rules/rules.go:60 +0x174
github.com/thanos-io/thanos/pkg/api/query.NewRulesHandler.func1.3({0x2bf4408?, 0xc001d8c450?})
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/api/query/v1.go:990 +0x58
github.com/thanos-io/thanos/pkg/tracing.DoInSpan({0x2bf4408?, 0xc001d8c390?}, {0x26a1d4a?, 0x7?}, 0xc0019dec60, {0x0?, 0x0?, 0x7f78c0b90108?})
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/tracing/tracing.go:95 +0xa3
github.com/thanos-io/thanos/pkg/api/query.NewRulesHandler.func1(0xc001104400)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/api/query/v1.go:989 +0x485
github.com/thanos-io/thanos/pkg/api.GetInstr.func1.1({0x2be9b80, 0xc0012f0000}, 0x4?)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/api/api.go:211 +0x50
net/http.HandlerFunc.ServeHTTP(0xc00110c1e0?, {0x2be9b80?, 0xc0012f0000?}, 0x2bce5cc?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/thanos-io/thanos/pkg/logging.(*HTTPServerMiddleware).HTTPMiddleware.func1({0x2be9b80?, 0xc0012f0000}, 0xc001104400)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/logging/http.go:69 +0x3b8
net/http.HandlerFunc.ServeHTTP(0x2bf4408?, {0x2be9b80?, 0xc0012f0000?}, 0x2bcee48?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/thanos-io/thanos/pkg/server/http/middleware.RequestID.func1({0x2be9b80, 0xc0012f0000}, 0xc001104200)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/server/http/middleware/request_id.go:40 +0x542
net/http.HandlerFunc.ServeHTTP(0x2184f60?, {0x2be9b80?, 0xc0012f0000?}, 0x4?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1({0x2bedf60, 0xc000bd0020}, 0x490001?)
#011/home/jenkins/go/pkg/mod/github.com/!n!y!times/[email protected]/gzip.go:338 +0x26f
net/http.HandlerFunc.ServeHTTP(0x7f7899a5efff?, {0x2bedf60?, 0xc000bd0020?}, 0xc0019df170?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/thanos-io/thanos/pkg/extprom/http.httpInstrumentationHandler.func1({0x7f78993897e0?, 0xc0017f80a0}, 0xc001104200)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/extprom/http/instrument_server.go:75 +0x10b
net/http.HandlerFunc.ServeHTTP(0x7f78993897e0?, {0x7f78993897e0?, 0xc0017f80a0?}, 0xc001d8c270?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerResponseSize.func1({0x7f78993897e0?, 0xc0017f8050?}, 0xc001104200)
#011/home/jenkins/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:288 +0xc5
net/http.HandlerFunc.ServeHTTP(0x7f78993897e0?, {0x7f78993897e0?, 0xc0017f8050?}, 0x0?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x7f78993897e0?, 0xc0017f8000?}, 0xc001104200)
#011/home/jenkins/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:146 +0xb8
net/http.HandlerFunc.ServeHTTP(0x22c9b80?, {0x7f78993897e0?, 0xc0017f8000?}, 0x6?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/thanos-io/thanos/pkg/extprom/http.instrumentHandlerInFlight.func1({0x7f78993897e0, 0xc0017f8000}, 0xc001104200)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/extprom/http/instrument_server.go:162 +0x169
net/http.HandlerFunc.ServeHTTP(0x2bf1530?, {0x7f78993897e0?, 0xc0017f8000?}, 0xc00180b698?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerRequestSize.func1({0x2bf1530?, 0xc00160c0e0?}, 0xc001104200)
#011/home/jenkins/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:238 +0xc5
net/http.HandlerFunc.ServeHTTP(0x2bf4408?, {0x2bf1530?, 0xc00160c0e0?}, 0x418a220?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/thanos-io/thanos/pkg/tracing.HTTPMiddleware.func1({0x2bf1530, 0xc00160c0e0}, 0xc001104100)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/tracing/http.go:62 +0x9a2
github.com/prometheus/common/route.(*Router).handle.func1({0x2bf1530, 0xc00160c0e0}, 0xc001104000, {0x0, 0x0, 0x478d2e?})
#011/home/jenkins/go/pkg/mod/github.com/prometheus/[email protected]/route/route.go:83 +0x18d
github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc001a045a0, {0x2bf1530, 0xc00160c0e0}, 0xc001104000)
#011/home/jenkins/go/pkg/mod/github.com/julienschmidt/[email protected]/router.go:387 +0x81c
github.com/prometheus/common/route.(*Router).ServeHTTP(0xc00180baf0?, {0x2bf1530?, 0xc00160c0e0?}, 0x0?)
#011/home/jenkins/go/pkg/mod/github.com/prometheus/[email protected]/route/route.go:126 +0x26
net/http.(*ServeMux).ServeHTTP(0xc0006ea042?, {0x2bf1530, 0xc00160c0e0}, 0xc001104000)
#011/usr/lib/go-1.19/src/net/http/server.go:2487 +0x149
net/http.serverHandler.ServeHTTP({0xc001d8c090?}, {0x2bf1530, 0xc00160c0e0}, 0xc001104000)
#011/usr/lib/go-1.19/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc0017fa000, {0x2bf4408, 0xc000b12690})
#011/usr/lib/go-1.19/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
#011/usr/lib/go-1.19/src/net/http/server.go:3102 +0x4db
  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

fpetkovski
fpetkovski previously approved these changes Mar 6, 2023
Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's try this out.

saswatamcode
saswatamcode previously approved these changes Mar 7, 2023
Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing! 🙂

In the rules proto spec, we have four stdtime fields, the last_evaluation of RuleGroup, RecordingRule, and Alert and this active_at in AlertInstance. The gogo/protobuf issue does specify that the field needs to be non-nullable, but no harm in trying this out.

@@ -97,12 +97,14 @@ func ActiveAlertsToProto(s storepb.PartialResponseStrategy, a *rules.AlertingRul
active := a.ActiveAlerts()
ret := make([]*rulespb.AlertInstance, len(active))
for i, ruleAlert := range active {
// https://github.com/gogo/protobuf/issues/519
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep comments consistent with the other workarounds as well.

Suggested change
// https://github.com/gogo/protobuf/issues/519
// UTC needed due to https://github.com/gogo/protobuf/issues/519.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,
I updated the comment (+ another similar one found in the codebase)

@thib-ack thib-ack dismissed stale reviews from saswatamcode and fpetkovski via c2b856a March 7, 2023 07:58
@saswatamcode saswatamcode enabled auto-merge (squash) March 7, 2023 08:43
@saswatamcode saswatamcode merged commit dabbeda into thanos-io:main Mar 7, 2023
junotx pushed a commit to junotx/thanos that referenced this pull request Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants