Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multiple upstreams in ingress-controller #21215

Merged
merged 11 commits into from
Sep 29, 2020
1 change: 1 addition & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
- Remove wrongly mapped `tls.client.server_name` from `fortinet/firewall` fileset. {pull}20983[20983]
- Fix an error updating file size being logged when EOF is reached. {pull}21048[21048]
- Fix error when processing AWS Cloudtrail Digest logs. {pull}21086[21086] {issue}20943[20943]
- Handle multiple upstreams in ingress-controller. {pull}21215[21215]

*Heartbeat*

Expand Down
64 changes: 52 additions & 12 deletions filebeat/docs/fields.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -105131,6 +105131,46 @@ Contains fields for the Ingress Nginx controller access logs.
An array of remote IP addresses. It is a list because it is common to include, besides the client IP address, IP addresses from headers like `X-Forwarded-For`. Real source IP is restored to `source.ip`.


type: array

--

*`nginx.ingress_controller.upstream_ip_list`*::
+
--
An array of the upstream IP addresses. It is a list because it is common that several upstream servers were contacted during request processing.


type: array

--

*`nginx.ingress_controller.upstream.response.length_list`*::
+
--
An array of upstream response lengths. It is a list because it is common that several upstream servers were contacted during request processing.


type: array

--

*`nginx.ingress_controller.upstream.response.time_list`*::
+
--
An array of upstream response durations. It is a list because it is common that several upstream servers were contacted during request processing.


type: array

--

*`nginx.ingress_controller.upstream.response.status_code_list`*::
+
--
An array of upstream response status codes. It is a list because it is common that several upstream servers were contacted during request processing.


type: array

--
Expand Down Expand Up @@ -105182,7 +105222,7 @@ type: keyword
*`nginx.ingress_controller.upstream.response.length`*::
+
--
The length of the response obtained from the upstream server
The length of the response obtained from the upstream server. If several servers were contacted during request process, the summary of the multiple response lengths is stored.


type: long
Expand All @@ -105194,7 +105234,7 @@ format: bytes
*`nginx.ingress_controller.upstream.response.time`*::
+
--
The time spent on receiving the response from the upstream server as seconds with millisecond resolution
The time spent on receiving the response from the upstream as seconds with millisecond resolution. If several servers were contacted during request process, the summary of the multiple response times is stored.


type: double
Expand All @@ -105206,40 +105246,40 @@ format: duration
*`nginx.ingress_controller.upstream.response.status_code`*::
+
--
The status code of the response obtained from the upstream server
The status code of the response obtained from the upstream server. If several servers were contacted during request process, only the status code of the response from the last one is stored in this field.


type: long

--

*`nginx.ingress_controller.http.request.id`*::
*`nginx.ingress_controller.upstream.ip`*::
+
--
The randomly generated ID of the request
The IP address of the upstream server. If several servers were contacted during request process, only the last one is stored in this field.


type: keyword
type: ip

--

*`nginx.ingress_controller.upstream.ip`*::
*`nginx.ingress_controller.upstream.port`*::
+
--
The IP address of the upstream server. If several servers were contacted during request processing, their addresses are separated by commas.
The port of the upstream server. If several servers were contacted during request process, only the last one is stored in this field.


type: ip
type: long

--

*`nginx.ingress_controller.upstream.port`*::
*`nginx.ingress_controller.http.request.id`*::
+
--
The port of the upstream server.
The randomly generated ID of the request


type: long
type: keyword

--

Expand Down
2 changes: 1 addition & 1 deletion filebeat/module/nginx/fields.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

43 changes: 34 additions & 9 deletions filebeat/module/nginx/ingress_controller/_meta/fields.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,26 @@
Real source IP is restored to `source.ip`.

# ingress-controller specific fields
- name: upstream_ip_list
type: array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that type: array only doesn't generate any mapping. This should be an array of keywords, or just keyword. Elasticsearch doesn't have a data type for arrays, but any field can have multiple values, see docs about arrays).

We could use just keyword:

Suggested change
type: array
type: keyword

Or if we still want to indicate that this is intended to store multiple values:

Suggested change
type: array
type: array
object_type: keyword

Same for the other lists.

description: >
An array of the upstream IP addresses. It is a list because it is common that several upstream servers
were contacted during request processing.
- name: upstream.response.length_list
type: array
description: >
An array of upstream response lengths. It is a list because it is common that several upstream servers
were contacted during request processing.
- name: upstream.response.time_list
type: array
description: >
An array of upstream response durations. It is a list because it is common that several upstream servers
were contacted during request processing.
- name: upstream.response.status_code_list
type: array
description: >
An array of upstream response status codes. It is a list because it is common that several upstream servers
were contacted during request processing.
- name: http.request.length
type: long
format: bytes
Expand All @@ -33,28 +53,33 @@
type: long
format: bytes
description: >
The length of the response obtained from the upstream server
The length of the response obtained from the upstream server. If several servers were contacted during request process,
the summary of the multiple response lengths is stored.
- name: upstream.response.time
type: double
format: duration
description: >
The time spent on receiving the response from the upstream server as seconds with millisecond resolution
The time spent on receiving the response from the upstream as seconds with millisecond resolution.
If several servers were contacted during request process, the summary of the multiple response times is stored.
- name: upstream.response.status_code
type: long
description: >
The status code of the response obtained from the upstream server
- name: http.request.id
type: keyword
description: >
The randomly generated ID of the request
The status code of the response obtained from the upstream server. If several servers were contacted during
request process, only the status code of the response from the last one is stored in this field.
- name: upstream.ip
type: ip
description: >
The IP address of the upstream server. If several servers were contacted during request processing, their addresses are separated by commas.
The IP address of the upstream server. If several servers were contacted during request process,
only the last one is stored in this field.
- name: upstream.port
type: long
description: >
The port of the upstream server.
The port of the upstream server. If several servers were contacted during request process,
only the last one is stored in this field.
- name: http.request.id
type: keyword
description: >
The randomly generated ID of the request

- name: body_sent.bytes
type: alias
Expand Down
101 changes: 98 additions & 3 deletions filebeat/module/nginx/ingress_controller/ingest/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,18 @@ processors:
%{NUMBER:http.response.status_code:long} %{NUMBER:http.response.body.bytes:long}
"(-|%{DATA:http.request.referrer})" "(-|%{DATA:user_agent.original})" %{NUMBER:nginx.ingress_controller.http.request.length:long}
%{NUMBER:nginx.ingress_controller.http.request.time:double} \[%{DATA:nginx.ingress_controller.upstream.name}\]
\[%{DATA:nginx.ingress_controller.upstream.alternative_name}\] (%{UPSTREAM_ADDRESS}|-)
(%{NUMBER:nginx.ingress_controller.upstream.response.length:long}|-) (%{NUMBER:nginx.ingress_controller.upstream.response.time:double}|-)
(%{NUMBER:nginx.ingress_controller.upstream.response.status_code:long}|-) %{GREEDYDATA:nginx.ingress_controller.http.request.id}
\[%{DATA:nginx.ingress_controller.upstream.alternative_name}\] (%{UPSTREAM_ADDRESS_LIST:nginx.ingress_controller.upstream_ip_list}|-)
(%{UPSTREAM_RESPONSE_LENGTH_LIST:nginx.ingress_controller.upstream.response.length_list}|-) (%{UPSTREAM_RESPONSE_TIME_LIST:nginx.ingress_controller.upstream.response.time_list}|-)
(%{UPSTREAM_RESPONSE_STATUS_CODE_LIST:nginx.ingress_controller.upstream.response.status_code_list}|-) %{GREEDYDATA:nginx.ingress_controller.http.request.id}
pattern_definitions:
NGINX_HOST: (?:%{IP:destination.ip}|%{NGINX_NOTSEPARATOR:destination.domain})(:%{NUMBER:destination.port})?
NGINX_NOTSEPARATOR: "[^\t ,:]+"
NGINX_ADDRESS_LIST: (?:%{IP}|%{WORD})("?,?\s*(?:%{IP}|%{WORD}))*
UPSTREAM_ADDRESS: '%{IP:nginx.ingress_controller.upstream.ip}(:%{NUMBER:nginx.ingress_controller.upstream.port})?'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used now.

UPSTREAM_ADDRESS_LIST: (?:%{IP}(:%{NUMBER})?)("?,?\s*(?:%{IP}(:%{NUMBER})?))*
UPSTREAM_RESPONSE_LENGTH_LIST: (?:%{NUMBER})("?,?\s*(?:%{NUMBER}))*
UPSTREAM_RESPONSE_TIME_LIST: (?:%{NUMBER})("?,?\s*(?:%{NUMBER}))*
UPSTREAM_RESPONSE_STATUS_CODE_LIST: (?:%{NUMBER})("?,?\s*(?:%{NUMBER}))*
ignore_missing: true
- grok:
field: nginx.ingress_controller.info
Expand All @@ -33,6 +37,22 @@ processors:
field: nginx.ingress_controller.remote_ip_list
separator: '"?,?\s+'
ignore_missing: true
- split:
field: nginx.ingress_controller.upstream_ip_list
separator: '"?,?\s+'
ignore_missing: true
- split:
field: nginx.ingress_controller.upstream.response.length_list
separator: '"?,?\s+'
ignore_missing: true
- split:
field: nginx.ingress_controller.upstream.response.time_list
separator: '"?,?\s+'
ignore_missing: true
- split:
field: nginx.ingress_controller.upstream.response.status_code_list
separator: '"?,?\s+'
ignore_missing: true
- split:
field: nginx.ingress_controller.origin
separator: '"?,?\s+'
Expand All @@ -41,6 +61,81 @@ processors:
field: source.address
if: ctx.source?.address == null
value: ""
- script:
if: ctx.nginx?.ingress_controller?.upstream?.response?.length_list != null && ctx.nginx.ingress_controller.upstream.response.length_list.length > 0
lang: painless
source: >-
try {
if (ctx.nginx.ingress_controller.upstream.response.length_list.length == null) {
return;
}
int length = 0;
for (def item : ctx.nginx.ingress_controller.upstream.response.length_list) {
length = length + Integer.parseInt(item);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, does it make sense to sum up the response lengths? The client would only see the most recent response.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I will go with it! Thanks!

}
ctx.nginx.ingress_controller.upstream.response.length = length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change the name to indicate it's a total value?

Suggested change
ctx.nginx.ingress_controller.upstream.response.length = length;
ctx.nginx.ingress_controller.upstream.response.length_total = length;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take into account that this is a breaking change as it changes an existing field. I would prefer to keep the original field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
catch (Exception e) {
ctx.nginx.ingress_controller.upstream.response.length = null;
}
- script:
if: ctx.nginx?.ingress_controller?.upstream?.response?.time_list != null && ctx.nginx.ingress_controller.upstream.response.time_list.length > 0
lang: painless
source: >-
try {
if (ctx.nginx.ingress_controller.upstream.response.time_list.length == null) {
return;
}
float res_time = 0;
for (def item : ctx.nginx.ingress_controller.upstream.response.time_list) {
res_time = res_time + Float.parseFloat(item);
}
ctx.nginx.ingress_controller.upstream.response.time = res_time;
}
catch (Exception e) {
ctx.nginx.ingress_controller.upstream.response.time = null;
}
- script:
if: ctx.nginx?.ingress_controller?.upstream?.response?.status_code_list != null && ctx.nginx.ingress_controller.upstream.response.status_code_list.length > 0
lang: painless
source: >-
try {
if (ctx.nginx.ingress_controller.upstream.response.status_code_list.length == null) {
return;
}
int last_status_code;
for (def item : ctx.nginx.ingress_controller.upstream.response.status_code_list) {
last_status_code = Integer.parseInt(item);
}
ctx.nginx.ingress_controller.upstream.response.status_code = last_status_code;
}
catch (Exception e) {
ctx.nginx.ingress_controller.upstream.response.time = null;
}
- script:
if: ctx.nginx?.ingress_controller?.upstream_ip_list != null && ctx.nginx.ingress_controller.upstream_ip_list.length > 0
lang: painless
source: >-
try {
if (ctx.nginx.ingress_controller.upstream_ip_list.length == null) {
return;
}
def last_upstream = "";
for (def item : ctx.nginx.ingress_controller.upstream_ip_list) {
last_upstream = item;
}
StringTokenizer tok = new StringTokenizer(last_upstream, ":");
if (tok.countTokens()>1) {
ctx.nginx.ingress_controller.upstream.ip = tok.nextToken();
ctx.nginx.ingress_controller.upstream.port = Integer.parseInt(tok.nextToken());
} else {
ctx.nginx.ingress_controller.upstream.ip = last_upstream;
}
}
catch (Exception e) {
ctx.nginx.ingress_controller.upstream.ip = null;
ctx.nginx.ingress_controller.upstream.port = null;
}
- script:
if: ctx.nginx?.ingress_controller?.remote_ip_list != null && ctx.nginx.ingress_controller.remote_ip_list.length > 0
lang: painless
Expand Down
1 change: 1 addition & 0 deletions filebeat/module/nginx/ingress_controller/test/test.log
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@
192.168.64.1 - - [07/Feb/2020:12:02:38 +0000] "GET /v2 HTTP/1.1" 200 61 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:72.0) Gecko/20100101 Firefox/72.0" 343 0.000 [default-web2-8080] [] 172.17.0.6:8080 61 0.001 200 ba91c30454893c121879396b0a78be79
192.168.64.1 - - [07/Feb/2020:12:02:38 +0000] "GET /favicon.ico HTTP/1.1" 200 59 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:72.0) Gecko/20100101 Firefox/72.0" 262 0.001 [default-web-8080] [] 172.17.0.5:8080 59 0.000 200 98c81aa2d50c67f6fb1fa16d5ce62f8f
192.168.64.1 - - [07/Feb/2020:12:02:42 +0000] "GET /v2/some HTTP/1.1" 200 61 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:72.0) Gecko/20100101 Firefox/72.0" 348 0.001 [default-web2-8080] [] 172.17.0.6:8080 61 0.000 200 835136ae24486dbb4156dcbe21f5d402
192.168.64.14 - - [07/Feb/2020:12:02:42 +0000] "GET /v2/some HTTP/1.1" 200 61 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:72.0) Gecko/20100101 Firefox/72.0" 348 0.001 [default-web2-8080] [] 172.17.0.6:8080, 172.17.0.7:8080 61, 100 0.100, 0.004 200, 203 835136ae24486dbb4156dcbe21f5d402
Loading