Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query service: fix sending status to health check channel #1598

Merged
merged 2 commits into from
Jun 12, 2019

Conversation

jan25
Copy link
Contributor

@jan25 jan25 commented Jun 11, 2019

Signed-off-by: Abhilash Gnan [email protected]

Which problem is this PR solving?

Resolves #1542

Short description of the changes

This PR fixes how we send serviceUnavailable to healthcheck channel on SIGTERM so the query service is shutdown gracefully

@jan25 jan25 changed the title fix sending status to health check channel Query service: fix sending status to health check channel Jun 11, 2019
@codecov
Copy link

codecov bot commented Jun 11, 2019

Codecov Report

Merging #1598 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1598   +/-   ##
=======================================
  Coverage   98.81%   98.81%           
=======================================
  Files         191      191           
  Lines        9162     9162           
=======================================
  Hits         9053     9053           
  Misses         85       85           
  Partials       24       24

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update faa9178...c3f883b. Read the comment docs.

@yurishkuro yurishkuro merged commit 74ff96f into jaegertracing:master Jun 12, 2019
@yurishkuro
Copy link
Member

Good catch (this is what we get for not having tests in that pkg). However, it does not address the issue, I still got the same error messages

^C
{"level":"info","ts":1560304771.472248,"caller":"flags/service.go:143","msg":"Shutting down"}
{"level":"info","ts":1560304771.4723089,"caller":"healthcheck/handler.go:129","msg":"Health Check state change","status":"unavailable"}
{"level":"error","ts":1560304771.4724338,"caller":"app/server.go:127","msg":"Could not start multiplexed server","error":"accept tcp [::]:16686: use of closed network connection","stacktrace":"github.com/jaegertracing/jaeger/cmd/query/app.(*Server).Start.func3\n\t/Users/yurishkuro/golang/src/github.com/jaegertracing/jaeger/cmd/query/app/server.go:127"}
{"level":"error","ts":1560304771.472466,"caller":"app/server.go:109","msg":"Could not start HTTP server","error":"http: Server closed","stacktrace":"github.com/jaegertracing/jaeger/cmd/query/app.(*Server).Start.func1\n\t/Users/yurishkuro/golang/src/github.com/jaegertracing/jaeger/cmd/query/app/server.go:109"}
{"level":"info","ts":1560304771.472533,"caller":"flags/service.go:151","msg":"Shutdown complete"}
{"level":"error","ts":1560304771.472542,"caller":"flags/admin.go:117","msg":"failed to serve","error":"http: Server closed","stacktrace":"github.com/jaegertracing/jaeger/cmd/flags.(*AdminServer).serveWithListener.func1\n\t/Users/yurishkuro/golang/src/github.com/jaegertracing/jaeger/cmd/flags/admin.go:117"}
{"level":"info","ts":1560304771.4725702,"caller":"healthcheck/handler.go:129","msg":"Health Check state change","status":"broken"}

@jan25
Copy link
Contributor Author

jan25 commented Jun 12, 2019

Thanks for the review!
I tested this change using rebuilt docker images and by applying all-in-one manifest . This change seemed to remove the errors on deleting the pod. Now i think, that could've worked due to running query inside a container.

After bit of reading, i found that the way we do server.Close() here for httpServer and cmux is actually "not" gracefully shutting down the servers, instead they throw errors on Close() as mentioned here and here, Which is the expected behavior.

So, to remove errors in logs we could check for such expected shutdown error and not throw/log error in those cases. Thoughts?

@yurishkuro
Copy link
Member

So, to remove errors in logs we could check for such expected shutdown error and not throw/log error in those cases

Yes, and that's precisely what changed between 1.11 and 1.12 - we introduced a new cmux multiplexor, which is what seems to be logging the errors.

@jan25 jan25 deleted the fix-query-shutdown branch August 19, 2019 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query and all-in-one generates error message on SIGINT
2 participants