-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http_response does not measure anything on dns-error #2784
Comments
related #2624 |
I think we should use a new field Happy Path:
Timeout:
|
Are we sure we want to call it "timeout"? The request can fail for numerous reasons other than a timeout. Connection refused, dns lookup failure, connection closed without response, etc. |
I second phemmers comment. Basically what the error state says is "something below the http layer went wrong". It is hard to come up with a good metric name for that. If we call it "http_connection_failed" or something similar there will me confusion, whether it should be 1 or 0, when the server returns 500. The http connection was established but there was an http-error. Maybe one of you guys have a good idea for a name? If we create a new metric, we should at least come up with something less confusing than http_response returning -1 ;) |
That's a good point, I'm confusing this with that other timeout issue. If we did return a timeout field only for timeouts, would we still want another field for connection failed? The distinction between a missing point and sending an point with an error field seems to be that you know that telegraf is trying. If we want to know there was an attempt, does that mean we need at least one new field for every measurement in the case that there was an error? |
I am not quite sure if this is necessary for each type of measurement but why not? For http_response it is quite important to be able to measure these "error attempts" I think. We(and most users I think) use the plugin to check whether a Service under a specific URL is answering correctly. We are interested whether the service is responding and whether the reponse given is 200(or other). Currently however we only get data if the http service is reachable and responds. So our alarm bell rings when a 500 is responded but not when connections timeout or dns does not work correctly etc. However this is information I think most people expect from a check for a http endpoint. So it is quite important to fix http_response, I think. Would it be OK for you if I submit a patch that introduces a generic connection_failed metric? After that I would suggest we discuss whether and which more specific metrics we need for http_response. That way we would can a basic solution first and then we can concentrate on making it perfect (and think about, whether we need a generic solution for all measurements). |
Yes, lets do that. So we add |
See my pull request #2814 |
I think we should not use the |
I agree, pull request has been updated. Please have a look :) |
PR has been merged. |
Bug report
If a service is unavailable because of misconfiguration dns, the http_response plugin does not send measurement data.
Relevant telegraf.conf:
[[inputs.http_response]]
address = "http://does.not.exist/health"
response_timeout = "5s"
method = "GET"
System info:
Telegraf v1.2.1 (git: release-1.2 3b6ffb3)
Steps to reproduce:
Expected behavior:
response_time=0.16079805900000002,http_connection_failure=1
Actual behavior:
2017-05-11T11:36:20Z E! ERROR in input [inputs.http_response]: Get http://does.not.exist/health: dial tcp: lookup does.not.exist on 8.8.8.8:53: no such host
The text was updated successfully, but these errors were encountered: