Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider not adding host.ip metadata to k8s container metrics by default #6674

Closed
4 of 5 tasks
Tracked by #7364
felixbarny opened this issue Jun 22, 2023 · 23 comments
Closed
4 of 5 tasks
Tracked by #7364
Assignees
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team [elastic/obs-cloudnative-monitoring]

Comments

@felixbarny
Copy link
Member

felixbarny commented Jun 22, 2023

@martijnvg found out that in a test dataset for k8s container metrics, there were 100+ IP addresses attached as metadata.

We'll need to find out why that's the case and if these IPs make sense to add to container metrics as metadata. If these represent the IP addresses of the k8s node, it doesn't seem useful to add them as metadata to the container metrics anyway.

Tasks

Preview Give feedback
  1. Team:Cloudnative-Monitoring Team:Elastic-Agent
    gizas
  2. backport-skip
    gizas
  3. Monitoring-Cloudnative Team:Observability
    gizas
  4. backport-skip
  5. Team:Fleet backport:skip release_note:feature v8.11.0
@martijnvg
Copy link
Member

An example of the list of IPs that I have observed in documents:


"ip": [
              "10.100.6.1",
              "10.128.0.162",
              "169.254.123.1",
              "fe80::a2:36ff:fe8d:a721",
              "fe80::a9:6eff:fe66:57db",
              "fe80::4c3:cff:fe87:61af",
              "fe80::8af:82ff:fe10:a293",
              "fe80::c66:cbff:fed2:ff7e",
              "fe80::cbf:e1ff:fe7c:de84",
              "fe80::1490:e1ff:fe2d:c525",
              "fe80::14c1:9cff:fe36:3620",
              "fe80::18eb:57ff:febf:570d",
              "fe80::20bc:93ff:feb2:906e",
              "fe80::2480:31ff:fe41:4e64",
              "fe80::24a1:3fff:fe74:ae73",
              "fe80::2815:36ff:fe54:d2f",
              "fe80::288a:50ff:fe94:c471",
              "fe80::28cc:91ff:fef2:e4dc",
              "fe80::2c44:96ff:fead:1f17",
              "fe80::2cba:f7ff:fe8d:ba7d",
              "fe80::2cf1:deff:feea:b51d",
              "fe80::3087:4aff:fe98:35b0",
              "fe80::30ce:9aff:fe28:6329",
              "fe80::3880:8fff:fe39:bbb3",
              "fe80::3c49:d3ff:fe41:e6a5",
              "fe80::3c65:49ff:fe2a:c375",
              "fe80::4001:aff:fe80:a2",
              "fe80::40bf:fbff:feb3:88d2",
              "fe80::40de:26ff:fe7f:826",
              "fe80::4465:33ff:fe6f:2014",
              "fe80::44a1:d2ff:fe83:eecb",
              "fe80::484d:7dff:fe6c:f326",
              "fe80::48d5:cdff:fed3:207b",
              "fe80::4c6c:7bff:fefd:aa4e",
              "fe80::50b4:16ff:feaa:44ce",
              "fe80::5447:b1ff:fe53:a49f",
              "fe80::54d6:70ff:fe73:2ef6",
              "fe80::5889:feff:feca:2394",
              "fe80::6425:54ff:fee6:7942",
              "fe80::64e3:45ff:fe09:7830",
              "fe80::685b:6aff:fef3:60aa",
              "fe80::6c73:adff:fe93:6c4",
              "fe80::6c7c:6aff:fe1e:6e5b",
              "fe80::701a:25ff:fe63:7b47",
              "fe80::701c:bfff:fe92:96b5",
              "fe80::709e:c7ff:fea2:f322",
              "fe80::70b6:efff:fe31:da37",
              "fe80::749b:ffff:fead:1d26",
              "fe80::74cd:59ff:fee6:f893",
              "fe80::74cd:cbff:fea3:ef4c",
              "fe80::74d9:dcff:fe38:2278",
              "fe80::78f0:f3ff:fe7e:af53",
              "fe80::88f2:8fff:fe2f:efb8",
              "fe80::8cef:37ff:fe61:2a3e",
              "fe80::90ac:72ff:febd:ba1",
              "fe80::9820:29ff:feb3:6335",
              "fe80::988e:8cff:fe72:f5e",
              "fe80::98cd:2dff:fe17:d5cd",
              "fe80::9cf7:beff:fee5:983f",
              "fe80::a051:fbff:fe80:d76f",
              "fe80::a0b6:d0ff:fe42:e4fa",
              "fe80::a0da:1dff:fe6a:8129",
              "fe80::a42c:48ff:fe48:f80d",
              "fe80::a4bc:e8ff:fe2c:d407",
              "fe80::a88b:a3ff:feda:48b8",
              "fe80::a8a5:7bff:fe24:75d8",
              "fe80::ac33:42ff:feb2:9059",
              "fe80::b08e:7ff:fedd:9ecc",
              "fe80::b0a9:dbff:fe37:da70",
              "fe80::b0eb:ffff:feca:154f",
              "fe80::b410:b4ff:fe89:dd1c",
              "fe80::b4e6:d6ff:fe00:4334",
              "fe80::b836:37ff:fe9b:c8b8",
              "fe80::b8a2:7dff:fe1f:ab96",
              "fe80::b8f2:9aff:fe56:5623",
              "fe80::bc5d:67ff:fe09:8c7c",
              "fe80::bc76:dcff:fed9:1364",
              "fe80::bcb1:85ff:fe7b:8239",
              "fe80::c039:dff:fec6:5290",
              "fe80::c0bd:90ff:fe17:a780",
              "fe80::c44c:bbff:fef8:2d05",
              "fe80::c84f:caff:fe0b:5a44",
              "fe80::c8aa:c7ff:fee5:dda0",
              "fe80::cc3e:49ff:fe79:e547",
              "fe80::ccd9:c4ff:fea9:8dcc",
              "fe80::d01f:9dff:fe49:898f",
              "fe80::d0bb:c1ff:fe11:81f6",
              "fe80::d437:f7ff:fec7:ed52",
              "fe80::d472:63ff:fed0:ff99",
              "fe80::d4b8:11ff:fe44:cdd9",
              "fe80::d4d6:60ff:fe4a:c292",
              "fe80::d4f7:56ff:fe14:d8cb",
              "fe80::e003:83ff:fe37:51e6",
              "fe80::e086:faff:feec:ec8a",
              "fe80::e0f8:9dff:fe78:81dc",
              "fe80::e0ff:77ff:fe03:7f39",
              "fe80::e41f:93ff:fef3:fb8a",
              "fe80::e443:33ff:fe47:493a",
              "fe80::e490:6dff:fe06:1844",
              "fe80::e847:dbff:fe9f:6c5e",
              "fe80::e895:d4ff:fea0:4930",
              "fe80::e8aa:68ff:fe5b:4a",
              "fe80::ec85:79ff:fe51:d634",
              "fe80::f026:99ff:fe79:641e",
              "fe80::f40f:dbff:fe73:b43e",
              "fe80::f8a4:74ff:fe56:995",
              "fe80::fc00:54ff:fe64:b7a9",
              "fe80::fcc5:59ff:fef6:7e06",
              "fe80::fcfd:cfff:fe55:31e0"
            ],

@ruflin
Copy link
Contributor

ruflin commented Jun 26, 2023

These ip addresses are added by Beats / Elastic Agent AFAIK. Initially the idea was that 1 or 2 host ip addresses would be shipped. But k8s + ipv6 wreck havoc to the data we ship. What are all these ipv6 addresses? One for each container? For ipv6, should we skip all the fe80:: addresses? Which of the addresses is relevant?

@tommyers-elastic @gizas any thoughts on the above?

@mlunadia
Copy link

A quick search rendered that these are IPv6 link-local addresses

Link-local addresses are used for communication within a local network segment, such as a Kubernetes cluster. They are automatically assigned to network interfaces and are only valid within the local network segment.

In the case of Kubernetes containers, fe80 IP addresses are assigned to the containers' network interfaces for intra-cluster communication. Containers within the same network segment can use these link-local addresses to communicate with each other directly without the need for routing.

AFAIK these are not critical for most Observability use cases.
@tommyers-elastic @gizas is there an easy way to estimate the complexity of skipping these? Can we also determine what the non-fe80 ip addresses are for?
cc: @bturquet

@gizas
Copy link
Contributor

gizas commented Jun 27, 2023

I did some search today to be sure where this ips come from: Those are the nodes ips, so in other words the networking of the underlying host.

The metrics from an nginx pod
Screenshot 2023-06-27 at 10 51 40 AM

Node's networking:

4: veth038d14aa@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether aa:07:91:89:6d:0f brd ff:ff:ff:ff:ff:ff link-netns cni-fefa3105-06d0-56f5-302c-f5223545f4d3
    inet 10.244.0.1/32 scope global veth038d14aa
       valid_lft forever preferred_lft forever
5: vethebc54a3e@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether f2:12:90:39:f7:21 brd ff:ff:ff:ff:ff:ff link-netns cni-3b66ba8b-35bc-8292-9679-60fb0dea2237
    inet 10.244.0.1/32 scope global vethebc54a3e
       valid_lft forever preferred_lft forever
6: vethc8cbf2a8@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 6e:3d:06:e8:08:97 brd ff:ff:ff:ff:ff:ff link-netns cni-320ccfbc-5113-88ae-96ce-36be83b6692b
    inet 10.244.0.1/32 scope global vethc8cbf2a8
       valid_lft forever preferred_lft forever
7: vetha8e1da93@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether d2:be:13:f5:f2:9c brd ff:ff:ff:ff:ff:ff link-netns cni-d3fe6667-7f55-ae5f-978e-e2306aa7603c
    inet 10.244.0.1/32 scope global vetha8e1da93
       valid_lft forever preferred_lft forever
12: eth1@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:14:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.20.0.2/16 brd 172.20.255.255 scope global eth1
       valid_lft forever preferred_lft forever
14: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.0.3/16 brd 172.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::3/64 scope global nodad
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:3/64 scope link
       valid_lft forever preferred_lft forever

I can see the option of netinfo.enabled ( see docs) and refernece in code : https://github.com/elastic/beats/blob/main/libbeat/processors/add_host_metadata/add_host_metadata.go#L198 but could not make it work and disappear the host.ips either in beats or by adding the processor in agent. I will keep you posted for the updates on how to disable the host.ips

@gizas
Copy link
Contributor

gizas commented Jun 27, 2023

@cmacknz I see that add_host_metadata processor is responsible to add host.* fields and is enabled by default (https://www.elastic.co/guide/en/fleet/current/add_host_metadata-processor.html).

I have not found a way to override it t the moment in the Agent.
Can we use something like this: https://github.com/elastic/beats/blob/main/libbeat/processors/add_observer_metadata/config.go#L36?
Asking as I am trying to connect the pieces and understand the flow

@cmacknz
Copy link
Member

cmacknz commented Jun 27, 2023

The default set of processors each Beat runs when they are started by agent is defined in the code, the Beats don't read their own default configuration files when agent starts them. Here is the definition for Metricbeat:

https://github.com/elastic/beats/blob/e16de717459e5a62aa376427dd25d43441b5c582/x-pack/metricbeat/cmd/root.go#L68-L80

This is equal to the set of default global processors that are enabled in the default Metricbeat configuration file:

https://github.com/elastic/beats/blob/e16de717459e5a62aa376427dd25d43441b5c582/x-pack/metricbeat/metricbeat.yml#L123-L127

The problem right now is that an agent policy has no concept of a global processor today, so there is no place in the agent policy to expose these. This is something we plan to do, but there's no date set for it yet. https://github.com/elastic/ingest-dev/issues/2442 is the tracking issue. Even if we did have this, we'd want the change to the configuration here to be conditional on whether the agent is running on Kubernetes. This might actually be easier to do in code.

If you can come up with an alternate configuration, we would only want to apply it when the agent is running on k8s. Since the default processors are defined in code, if you have a function that can accurately detect that the agent runs on Kubernetes when these processor configurations are generated you can use it to conditionally change the add_host_metadata configuration for each Beat agent can start.

You could add an option to add_host_metadata to omit host.ip entirely on Kubernetes, limit the total number of reported IPs or interfaces that it polls, etc.

@tommyers-elastic
Copy link
Contributor

late to the party here but just chiming in that from where i'm standing these IPs for sure just look like noise. if there was some useful mapping from ip<->resource then maybe they would be more useful. hopefully we get something like that as part of the asset management work.

@gizas
Copy link
Contributor

gizas commented Jun 28, 2023

To add more to the issue, I have repeated some more tests with a GKE cluster with 15 nodes and there you can see more host.ips added (I count sth like 64)

Screenshot 2023-06-28 at 11 15 26 AM

I agree is noise because you can find the same information also in the kubernetes.state_node entries.

I have tried to add processors inside the integration (in the module level):

Screenshot 2023-06-28 at 11 53 28 AM

Also :

- add_host_metadata:
      netinfo.enabled: false

It seems that they dont apply. I guess that global processors are applied at beats level so they apply last and this means that they override our configuration. So long story short, host.ip fields remain in the event

@ChrsMark
Copy link
Member

ChrsMark commented Jun 28, 2023

Hey folks! Please have a look into elastic/elastic-agent#90. This seems to be the reason.
Long story short: Agent starts Beats with the default config files which enable the add_host_metadata processor, see https://github.com/elastic/beats/blob/718c9232cfa183f6a866ebcfa6401eae72346f0d/metricbeat/metricbeat.yml#L124. This processor will run in Beats level and hence after the processor that is running on Module's level.

We need a way to disable/tune the Beats' global level processors and that's what elastic/elastic-agent#90 is trying to address.

EDIT: check also #6674 (comment) comment which explains the same.

@ruflin
Copy link
Contributor

ruflin commented Jun 28, 2023

Could we adjust the add_host_metadata processor to just not ship local ip addresses in the first place by default, no config needed?

@cmacknz
Copy link
Member

cmacknz commented Jun 28, 2023

Could we adjust the add_host_metadata processor to just not ship local ip addresses in the first place by default, no config needed?

As I suggested in #6674 (comment), yes but you need to make it conditional on detecting that the Beat is running on Kubernetes where this information is not useful. I would think that just removing the IP fields unconditionally would be a breaking change, technically it would be on k8s as well but it is highly unlikely anyone depends on these fields today.

@gizas
Copy link
Contributor

gizas commented Jun 29, 2023

Please find a workaround here https://github.com/elastic/beats/blob/fixinghostips/x-pack/metricbeat/cmd/root.go#L76-L93

I am in the process of building the image and testing e2e so will report my findings. But let me know if workaround is ok.
The idea is to check for kubernetes (I check for a path or if a specific k8s environmantal variable that is common is present and this will identify if I am installing in K8s ) and we have introduced a new environmental variable valueNETINFO that the users can use specifically to bypass add_host_metadata_processor.

Not the most elegant solution but what do you think?

@ChrsMark
Copy link
Member

@gizas I think you don't need to check first for a k8s environment and then for the env var. Checking just for the env var directly should be enough.

The pros of checking just the env var is that we don't break anything for the existing users/configurations and then only users that want to use the NETIFNO: false env var will disable the addition of these data.

I would be ok with adding this but only as a temporary solution which means that we will create GH issue to keep track of this and find a better and more generic way to implement this.

To my mind we need a way to configure Beats through Agent and at the moment we are "locked" with the default configs which is really bad. @ruflin @cmacknz are there any plans to fix this? I thought that elastic/elastic-agent#90 and then https://github.com/elastic/ingest-dev/issues/2442 would address this but then #6674 (comment) mentions that https://github.com/elastic/ingest-dev/issues/2442 would not be enough. So in that case maybe we need to prioritize elastic/elastic-agent#90 directly?

@gizas
Copy link
Contributor

gizas commented Jun 29, 2023

Just to clarify it is an or not and.
So scenarios would be:

  1. User defines nothing and we manage to identify k8s -- remove netinfo
  2. User defines NETINFO=false -- remove netinfo
  3. In any other scenario we -- keep netinfo.enabled: true so keep host.ips

But yes we can change the logic. This is more to prove that it is working.
And yes maybe is time to raise again the prioritisation discussion for "global processor" as also BY has it in its list

@ChrsMark
Copy link
Member

  1. User defines nothing and we manage to identify k8s -- remove netinfo

This would be a breaking change for users that today run on k8s and actually collecting the data we want to skip here.

@gizas
Copy link
Contributor

gizas commented Jul 5, 2023

I managed to build the image and in my local cluster this works for now:
Before removal of IPs:
Before

After Removal of IPs:
After

So summary of above:

  1. So let me know what would be the default behaviour we want to introduce in our code? I agree with @ChrsMark that if by default we remove the host.ips then this breaking change needs to be clearly documented

  2. How about https://github.com/elastic/ingest-dev/issues/2442 ? Any info regarding prioritising this?

  3. If we agree with this fix, then this will need testing with all cloud providers I guess, especially if we introduce the aut k8s recognition

cc @bturquet , @mlunadia for prioritisation

@gizas gizas self-assigned this Jul 5, 2023
@felixbarny
Copy link
Member Author

Using the disk usage API on both the system cpu and the kubernetes pod data stream (on edge-lite) revealed that half of the disk usage is due to the host.ip and the host.mac fields. I bet this also has a significant impact on indexing.

I think it's important that we find a solution where these fields don't have such an impact when using default configurations.

To keep the risk of breaking users at a minimum, and to make the implementation simple, I suggest we investigate the approach proposed by @cmacknz and @ruflin and remove non-interesting (local?) ip and mac addresses directly in the metadata processor go code if it detects that it's running on k8s.

@gizas
Copy link
Contributor

gizas commented Sep 5, 2023

Team I updated the tasklist of the story with latest updates. The fix is working and for now we enable/disable the netinfo only with related environmental variable NETINFO:false inside agent pod.

I have not managed to find a way to pass from the kubernetes Integration a config option to add-host-metadata processor. The add-host-metadata is initialised even before kubernetes processor and this does not allow us to pass config options to it.

So for now I propose only elastic/kibana#165700 as a mean to help users in managed mode. Any other ideas?

@felixbarny
Copy link
Member Author

@gizas have you considered the proposal in my last message to change the host metadata processor to omit link-local IP addresses by default or by default when running inside a container? This would only require changes in the add_host_metadata processor.

@gizas
Copy link
Contributor

gizas commented Sep 6, 2023

@felixbarny thanks again for reminder, see my last udpate in PR, now all tests seem to work with changes in the add_host_metadata processor only

gizas added a commit to elastic/kibana that referenced this issue Sep 18, 2023
## Summary

This PR add the environmental veriable ELASTIC_NETINFO in the managed
and standalone manifests of Elasitc agent.

The variable has been introduced here
elastic/elastic-agent#3354

The reason for the introduction of the new variable
ELASTIC_NETINFO:false by default in the manifests, is related with the
work done elastic/integrations#6674
@gizas
Copy link
Contributor

gizas commented Oct 27, 2023

The Kubernetes Manifests will set ELASTIC_NETINFO:false by default.
We have decided not to do elastic/kibana#165700 for now. If we think that there is an urgency we can plan accordingly

I am closing this issue for now as related work is done

@gizas gizas closed this as completed Oct 27, 2023
@felixbarny
Copy link
Member Author

@gizas in which cases is host.ip still added by default now?

@gizas
Copy link
Contributor

gizas commented Oct 27, 2023

@felixbarny in both managed and standalone manifests (https://github.com/elastic/kibana/pull/166156/files) the variable is false.

So host.ip should not be added nowhere by default.

@andrewkroh andrewkroh added Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team [elastic/obs-cloudnative-monitoring] and removed Team: Cloud Native Integrations labels Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team [elastic/obs-cloudnative-monitoring]
Projects
None yet
Development

No branches or pull requests

9 participants