Skip to content

Commit

Permalink
[Metrics]: Add system_name on service metrics
Browse files Browse the repository at this point in the history
This commit change the behaviour of the metrics, now the metrics
contains the service_id and the service_system_name labels.

```
bash-4.2$ curl http://localhost:9421/metrics  -s | grep service
total_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="00.200"} 1
total_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="00.300"} 1
total_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="00.400"} 1
total_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="00.500"} 1
total_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="00.750"} 2
total_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="01.000"} 2
total_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="01.500"} 3
total_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="02.000"} 3
total_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="03.000"} 3
total_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="04.000"} 3
total_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="05.000"} 3
total_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="10.000"} 3
total_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="+Inf"} 3
total_response_time_seconds_count{service_id="2555417794444",service_system_name="api"} 3
total_response_time_seconds_sum{service_id="2555417794444",service_system_name="api"} 1.802
upstream_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="00.200"} 1
upstream_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="00.300"} 1
upstream_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="00.400"} 1
upstream_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="00.500"} 2
upstream_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="00.750"} 3
upstream_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="01.000"} 3
upstream_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="01.500"} 3
upstream_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="02.000"} 3
upstream_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="03.000"} 3
upstream_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="04.000"} 3
upstream_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="05.000"} 3
upstream_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="10.000"} 3
upstream_response_time_seconds_bucket{service_id="2555417794444",service_system_name="api",le="+Inf"} 3
upstream_response_time_seconds_count{service_id="2555417794444",service_system_name="api"} 3
upstream_response_time_seconds_sum{service_id="2555417794444",service_system_name="api"} 1.102
upstream_status{status="200",service_id="2555417794444",service_system_name="api"} 3
```

Signed-off-by: Eloy Coto <[email protected]>
  • Loading branch information
eloycoto committed May 13, 2019
1 parent 1c5ea9a commit 00a0baf
Show file tree
Hide file tree
Showing 7 changed files with 54 additions and 37 deletions.
6 changes: 3 additions & 3 deletions doc/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,6 @@ with specific information that will provide more in-depth details about APIcast.

The metrics that will have extended information are:

- total_response_time_seconds: label service
- upstream_response_time_seconds: label service
- upstream_status: label service
- total_response_time_seconds: labels service_id and service_system_name
- upstream_response_time_seconds: labels service_id and service_system_name
- upstream_status: labels service_id and service_system_name
6 changes: 3 additions & 3 deletions doc/prometheus-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
| openresty_shdict_capacity | Capacity of the dictionaries shared between workers | gauge | dict(one for every dictionary) | Default |
| openresty_shdict_free_space | Free space of the dictionaries shared between workers | gauge | dict(one for every dictionary) | Default |
| nginx_metric_errors_total | Number of errors of the Lua library that manages the metrics | counter | - | Default |
| total_response_time_seconds | Time needed to sent a response to the client (in seconds) | histogram | service | Default |
| upstream_response_time_seconds | Response times from upstream servers (in seconds) | histogram | service | Default |
| upstream_status | HTTP status from upstream servers | counter | status, service | Default |
| total_response_time_seconds | Time needed to sent a response to the client (in seconds) | histogram | service_id, service_system_name | Default |
| upstream_response_time_seconds | Response times from upstream servers (in seconds) | histogram | service_id, service_system_name | Default |
| upstream_status | HTTP status from upstream servers | counter | status, service_id, service_system_name | Default |
| threescale_backend_calls | Authorize and report requests to the 3scale backend (Apisonator) | counter | endpoint(authrep, auth, report), status(2xx, 4xx, 5xx) | APIcast |
| batching_policy_auths_cache_hits | Hits in the auths cache of the 3scale batching policy | counter | - | 3scale Batcher |
| batching_policy_auths_cache_misses | Misses in the auths cache of the 3scale batching policy | counter | - | 3scale Batcher |
1 change: 1 addition & 0 deletions gateway/src/apicast/configuration.lua
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ function _M.parse_service(service)

return Service.new({
id = tostring(service.id or 'default'),
system_name = tostring(service.system_name or ''),
backend_version = backend_version,
authentication_method = proxy.authentication_method or backend_version,
hosts = proxy.hosts or { 'localhost' }, -- TODO: verify localhost is good default
Expand Down
21 changes: 16 additions & 5 deletions gateway/src/apicast/metrics/upstream.lua
Original file line number Diff line number Diff line change
Expand Up @@ -4,38 +4,49 @@ local prometheus = require('apicast.prometheus')

local _M = {}

local service_label = 'service'
local service_label = 'service_id'
local service_system_name_label = 'service_system_name'
local status_label = 'status'

local upstream_status_codes = prometheus(
'counter',
'upstream_status',
'HTTP status from upstream servers',
{ status_label, service_label }
{ status_label, service_label, service_system_name_label }
)

local upstream_resp_times = prometheus(
'histogram',
'upstream_response_time_seconds',
'Response times from upstream servers',
{ service_label }
{ service_label, service_system_name_label }
)

local function inc_status_codes_counter(status, service)
if tonumber(status) and upstream_status_codes then
upstream_status_codes:inc(1, { status, service })
upstream_status_codes:inc(1, {
status,
service.id or "",
service.system_name or ""
})
end
end

local function add_resp_time(response_time, service)
local time = tonumber(response_time)

if time and upstream_resp_times then
upstream_resp_times:observe(time, { service })
upstream_resp_times:observe(time, {
service.id or "",
service.system_name or ""
})
end
end

function _M.report(status, response_time, service)
if not service then
service = {}
end
inc_status_codes_counter(status, service)
add_resp_time(response_time, service)
end
Expand Down
13 changes: 8 additions & 5 deletions gateway/src/apicast/policy/nginx_metrics/nginx_metrics.lua
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ local response_times = prometheus(
'histogram',
'total_response_time_seconds',
'Time needed to send a response to the client (in seconds).',
{ 'service' }
{ 'service_id', 'service_system_name' }
)

function _M.init()
Expand Down Expand Up @@ -132,14 +132,17 @@ local function report_req_response_time(service)
-- the time spent in the post_action phase is not taken into account.
local resp_time = tonumber(ngx.var.original_request_time)
if resp_time and response_times then
response_times:observe(resp_time, { service })
response_times:observe(resp_time, {
service.id or "",
service.system_name or ""
})
end
end

function _M.log(_, context)
local service = ""
if context.service and context.service.id and extended_metrics then
service = context.service.id
local service = { id = "", system_name = "" }
if context.service and extended_metrics then
service = context.service
end
upstream_metrics.report(ngx.var.upstream_status, ngx.var.upstream_response_time, service)
report_req_response_time(service)
Expand Down
16 changes: 8 additions & 8 deletions spec/metrics/upstream_spec.lua
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,12 @@ describe('upstream metrics', function()

it('increases the counter of status codes', function()
upstream_metrics.report(200, 0.1)
assert.stub(test_counter.inc).was_called_with(test_counter, 1, { 200 })
assert.stub(test_counter.inc).was_called_with(test_counter, 1, { 200, "", "" })
end)

it('adds the latency to the histogram', function()
upstream_metrics.report(200, 0.1)
assert.stub(test_histogram.observe).was_called_with(test_histogram, 0.1, {})
assert.stub(test_histogram.observe).was_called_with(test_histogram, 0.1, {"", ""})
end)

describe('when the status is nil or empty', function()
Expand All @@ -58,17 +58,17 @@ describe('upstream metrics', function()


describe('With service id', function()
local service_metric_id = "42"
local service = {id = "123", system_name="foo"}
it("increases empty service on nil", function()
upstream_metrics.report(200, 0.1, nil)
assert.stub(test_histogram.observe).was_called_with(test_histogram, 0.1, {})
assert.stub(test_counter.inc).was_called_with(test_counter, 1, { 200 })
assert.stub(test_histogram.observe).was_called_with(test_histogram, 0.1, {"", ""})
assert.stub(test_counter.inc).was_called_with(test_counter, 1, { 200, "", "" })
end)

it("increase a valid service", function()
upstream_metrics.report(200, 0.1, service_metric_id)
assert.stub(test_histogram.observe).was_called_with(test_histogram, 0.1, { service_metric_id })
assert.stub(test_counter.inc).was_called_with(test_counter, 1, { 200, service_metric_id })
upstream_metrics.report(200, 0.1, service)
assert.stub(test_histogram.observe).was_called_with(test_histogram, 0.1, { service.id, service.system_name })
assert.stub(test_counter.inc).was_called_with(test_counter, 1, { 200, service.id, service.system_name })
end)
end)
end)
Expand Down
28 changes: 15 additions & 13 deletions t/prometheus-metrics.t
Original file line number Diff line number Diff line change
Expand Up @@ -273,8 +273,8 @@ In particular, it shows the status codes and the response times
"",
[
qr/upstream_response_time_seconds(.|\n)/,
qr/upstream_response_time_seconds_bucket\{service="",le=".*"\} 1/,
qr/upstream_status\{status="200",service=""\} 1/
qr/upstream_response_time_seconds_bucket\{service_id="",service_system_name="",le=".*"\} 1/,
qr/upstream_status\{status="200",service_id="",service_system_name=""\} 1/
]]
--- no_error_log
[error]
Expand Down Expand Up @@ -318,7 +318,7 @@ In particular, it shows the status codes and the response times
--- response_body_like eval
[
"",
qr/total_response_time_seconds(.|\n)*total_response_time_seconds_bucket\{service="",le=".*"\} 1/
qr/total_response_time_seconds(.|\n)*total_response_time_seconds_bucket\{service_id="",service_system_name="",le=".*"\} 1/
]
--- no_error_log
[error]
Expand All @@ -333,6 +333,7 @@ qr/total_response_time_seconds(.|\n)*total_response_time_seconds_bucket\{service
"services": [
{
"id": 42,
"system_name": "foo",
"proxy": {
"policy_chain": [
{
Expand Down Expand Up @@ -369,10 +370,10 @@ qr/total_response_time_seconds(.|\n)*total_response_time_seconds_bucket\{service
"",
[
qr/total_response_time_seconds(.|\n)*/,
qr/total_response_time_seconds_bucket\{service="42",le=".*"\} 1/,
qr/total_response_time_seconds_bucket\{service_id="42",service_system_name="foo",le=".*"\} 1/,
qr/upstream_response_time_seconds(.|\n)*/,
qr/upstream_response_time_seconds_bucket\{service="42",le=".*"\} 1/,
qr/upstream_status\{status="200",service="42"\} 1/
qr/upstream_response_time_seconds_bucket\{service_id="42",service_system_name="foo",le=".*"\} 1/,
qr/upstream_status\{status="200",service_id="42",service_system_name="foo"\} 1/
]
]
--- no_error_log
Expand All @@ -386,6 +387,7 @@ qr/total_response_time_seconds(.|\n)*total_response_time_seconds_bucket\{service
"services": [
{
"id": 42,
"system_name": "foo",
"proxy": {
"hosts": [
"one"
Expand All @@ -407,6 +409,7 @@ qr/total_response_time_seconds(.|\n)*total_response_time_seconds_bucket\{service
},
{
"id": 21,
"system_name": "bar",
"proxy": {
"hosts": [
"two"
Expand Down Expand Up @@ -451,13 +454,12 @@ qr/total_response_time_seconds(.|\n)*total_response_time_seconds_bucket\{service
"",
"",
[
qr/total_response_time_seconds_bucket\{service="42",le=".*"\} 1/,
qr/upstream_response_time_seconds_bucket\{service="42",le=".*"\} 1/,
qr/upstream_status\{status="200",service="42"\} 1/,
qr/total_response_time_seconds_bucket\{service="21",le=".*"\} 1/,
qr/upstream_response_time_seconds_bucket\{service="21",le=".*"\} 1/,
qr/upstream_status\{status="200",service="21"\} 1/
qr/total_response_time_seconds_bucket\{service_id="42",service_system_name="foo",le=".*"\} 1/,
qr/upstream_response_time_seconds_bucket\{service_id="42",service_system_name="foo",le=".*"\} 1/,
qr/upstream_status\{status="200",service_id="42",service_system_name="foo"\} 1/,
qr/total_response_time_seconds_bucket\{service_id="21",service_system_name="bar",le=".*"\} 1/,
qr/upstream_response_time_seconds_bucket\{service_id="21",service_system_name="bar",le=".*"\} 1/,
qr/upstream_status\{status="200",service_id="21",service_system_name="bar"\} 1/
]]
--- no_error_log
[error]

0 comments on commit 00a0baf

Please sign in to comment.