Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhanncement for host.ip and host.mac] Disabling netinfo.enabled option of add-host-metadata processor #36506

Merged
merged 15 commits into from
Sep 8, 2023

Conversation

gizas
Copy link
Contributor

@gizas gizas commented Sep 5, 2023

  • Enhancement

Proposed commit message

Relates to elastic/integrations#6674

Please explain:

  • WHAT: We are disabling the netinfo.enabled option of add-host-metadata processor by checking the existence of and environmental variable in the Elastic Agent Pod
  • WHY: Host.Ips and Host.Mac take a considerable size during indexing and affect both ingestion and visualisation process. As they dont provide meaningful infromation to the users we can disable them accordingly.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  1. Checkout this PR fix inside ~/beats folder of your system.
  2. Clone elastic-agent repo in another folder ~/elastic-agent
  3. Build elastic agent image with details provided in https://github.com/elastic/elastic-agent#testing-elastic-agent-on-kubernetes. Command DEV=true SNAPSHOT=true PLATFORMS=linux/arm64 PACKAGES=docker mage package
  4. Build the docker image of your custom agent that you will load later in your setup:
cd ~/elastic-agent
cd build/package/elastic-agent/elastic-agent-linux-arm64.docker/docker-build
docker build --platform linux/amd64 -t custom-agent-image .
  1. Install kind
  2. Load your custom build image into kind : kind load docker-image custom-agent-image:latest --name kind
  3. Install your local ES stack : elastic-package stack up -d -vvv --version=8.11.0-SNAPSHOT
  4. Use the the following manifests as samples to disable host.ip and host.mac

elastic-agent-standalone-kubernetesHostIPs.yml.txt
elastic-agent-standalone-kubernetesHostIPs.yml.txt

This is particular the piece of code that our manifests will need to have from now on:

# The following NETINNFO:false variable will disable the netinfo.enabled option of add-host-metadata processor. This will remove fields host.ip and host.mac.  
            # For more info: https://www.elastic.co/guide/en/beats/metricbeat/current/add-host-metadata.html
            # - name: NETINFO
            #   value: "false"

Related issues

Screens

Default 8.11.SNAPSHOT with host.ips and host.mac
with_8 11 SNAPSHOT
hots

Standalone Agent with NETINNFO:false
Without_hostips

Managed Agent with NETINNFO:false
Managed_withouthostips

@gizas gizas requested a review from a team as a code owner September 5, 2023 13:22
@gizas gizas requested review from belimawr and rdner September 5, 2023 13:22
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 5, 2023
@gizas gizas self-assigned this Sep 5, 2023
@gizas gizas added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Sep 5, 2023
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 5, 2023
@mergify
Copy link
Contributor

mergify bot commented Sep 5, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @gizas? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@mergify
Copy link
Contributor

mergify bot commented Sep 5, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fixinghostips upstream/fixinghostips
git merge upstream/main
git push upstream fixinghostips

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This change seems to be only done in metricbeat. Doesn't the same happen in Filebeat and any other Beat?
  • Do we need additional documentation around this how a user could enable it (again) if they wanted to?

{"add_docker_metadata": nil},
{"add_kubernetes_metadata": nil},

// We check for the existance of environmental variable NETINFO. Related to https://github.com/elastic/integrations/issues/6674
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets have some more details here in the code docs. Jumping to the very long Github issue is not helpful. We can still leave the reference in, but lets add more details here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments in beb7e85

@elasticmachine
Copy link
Collaborator

elasticmachine commented Sep 5, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-09-07T08:49:47.976+0000

  • Duration: 70 min 50 sec

Test stats 🧪

Test Results
Failed 0
Passed 28237
Skipped 2013
Total 30250

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@gizas
Copy link
Contributor Author

gizas commented Sep 5, 2023

  • This change seems to be only done in metricbeat. Doesn't the same happen in Filebeat and any other Beat?

Indeed eg. filebeat has the same configuration see

func defaultProcessors() []mapstr.M {

My first thought here is that we need it only for metricbeat and filestream (if we decide to) for now as these are related to kubernetes.

By a first look, auditbeat, osquerybeat and packetbeat initialise add_host_metadata processor as well, but dont need we need it for k8s.

@martijnvg do you think we need it also for filebeat? Our tests were run only for metrics that is why we focused only in metricbeat I guess

  • Do we need additional documentation around this how a user could enable it (again) if they wanted to?

For now I added it only in the manifests https://github.com/elastic/elastic-agent/pull/3354/files#diff-b85244108b202e9acc2f4eca1f918bb448bbd10a816a83123ec17f7cd3ca92b6

The other place I see that might feet is here: https://www.elastic.co/guide/en/fleet/current/agent-environment-variables.html

@martijnvg
Copy link
Member

do you think we need it also for filebeat? Our tests were run only for metrics that is why we focused only in metricbeat I guess

I think this depends on whether local-link addresses add any value in the non metric use cases? I don't think so? But I think I'm not the person to answer this question.

@ruflin
Copy link
Contributor

ruflin commented Sep 6, 2023

By a first look, auditbeat, osquerybeat and packetbeat initialise add_host_metadata processor as well, but dont need we need it for k8s.

My understanding is that when any of these Beats / Elastic Agent are run under k8s, the problem would exist. Is there anything special here around metrics? Is the assumption that the other beats are not run in k8s?

if valueNETINFO == "false" && status == true {
return []mapstr.M{
{"add_host_metadata": mapstr.M{
"netinfo.enabled": "false",
Copy link
Member

@ChrsMark ChrsMark Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to enable/disable this at this level and not inside the processor's implementation?
Wouldn't that be more generic without the need to configure every single Beat (also covering Agent)? And that would cover @ruflin 's concern at #36506 (comment), right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to do that inside the processor implementation. I think in the implementation we could even automatically check if this is running inside k8s or inside a container and either disable netinfo completely or filter out link-local addresses. See also elastic/integrations#6674 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all team.
Reverted changes per beat module and made those inside the processor. Building a new image and retesting and let you know

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the fix only to add-host-metadata processor (https://github.com/elastic/beats/pull/36506/files#diff-03b979565735e6707425275586e3215f44192f46e1ac3b456516d93c653eba13R38) seems that works only for metricbeat. Somehow seems we override it for filestream? Will keep looking

@gizas
Copy link
Contributor Author

gizas commented Sep 6, 2023

Lessons learned, you need to remove both builds in order to force agent image to build from local code:
rm -rf x-pack/metricbeat/build/distributions/metricbeat-8.11.0-SNAPSHOT-linux-*
rm -rf x-pack/filebeat/build/distributions/filebeat-8.11.0-SNAPSHOT-linux-*

So all work by disabling netinfo in processor :
Filestream:
filestream_disabled

Metricbeat:
metricbeat_disabled

For documentation please review: elastic/ingest-docs#463

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change LGTM. I left one comment about the env variable's name.

I think that skipping by default if k8s or containarized environment is detected would possibly be a breaking change at this point. Maybe some users want these data even in k8s envs.
So I think it's safer to go with the current approach.

// Setting environmental variable NETINFO:false in Elastic Agent pod will disable the netinfo.enabled option of add_host_metadata processor
// This will result to events not being enhanced with host.ip and host.mac
// Related to https://github.com/elastic/integrations/issues/6674
valueNETINFO, _ := os.LookupEnv("NETINFO")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only question would be the naming we want to choose here. Maybe sth like ELASTIC_NETINFO would be more explicit?

Copy link
Contributor Author

@gizas gizas Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree lets not introduce a breaking change (I was thinking an old comment you had done).

For the naming you are killing me :) but I was expecting the debate , so seeing the page again https://www.elastic.co/guide/en/fleet/current/agent-environment-variables.html, I would go for ELASTIC_AGENT_NETINFO as strickly speaking this is a change on the agent side.

I would wait until tomorrow to see if others have any opinion and I would need to change all prs

@ruflin
Copy link
Contributor

ruflin commented Sep 7, 2023

I think that skipping by default if k8s or containarized environment is detected would possibly be a breaking change at this point. Maybe some users want these data even in k8s envs.

I agree that this can potentially be considered a breaking change, at the same I'm not sure the use case where all these ip addresses would be used. If we don't change the default, most users will not get the benefits and have to make active changes to get the performance and storage benefits.

Do we agree false is the better default for most users? If we don't change it now, when would we change it? Maybe we can also separate the discussion and first get the change in to make it possible to disable and then figure out how we change the default in a good way. @felixbarny Thoughts?

// Setting environmental variable ELASTIC_NETINFO:false in Elastic Agent pod will disable the netinfo.enabled option of add_host_metadata processor
// This will result to events not being enhanced with host.ip and host.mac
// Related to https://github.com/elastic/integrations/issues/6674
valueNETINFO, _ := os.LookupEnv("ELASTIC_NETINFO")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make sure we have this documented. Can you expand the changelog to contain a mention of this env variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this is ok, I added some info also there.

Also see elastic/ingest-docs#463

@ChrsMark
Copy link
Member

ChrsMark commented Sep 7, 2023

Do we agree false is the better default for most users?

I believe we could just set ELASTIC_NETINFO to false by default within our k8s manifests as a first step. This has 2 main advantages:

  1. The codebase itself does not introduce any possible breaking change.
  2. We only affect the k8s use-case but from the configuration management point of view. This means that we have a reasonable default (to false) only for k8s deployments which can be easily changed from users by removing the variable definition from the k8s manifest (if needed).

If we agree on this here we can even change the manifests in this PR, or change those in a follow-up if we want to discuss it more. Whatever that fits better.

@felixbarny
Copy link
Member

SGTM. The main thing that's important to me is that host.ip/host.mac is either truncated or removed from k8s data sets by default.

@gizas
Copy link
Contributor Author

gizas commented Sep 7, 2023

SGTM. The main thing that's important to me is that host.ip/host.mac is either truncated or removed from k8s data sets by default.

Thanks again all for the quick feedback. Default value in manifests is ELASTIC_NETINFO:false. So the host.ip and host.mac will be removed by default

@felixbarny
Copy link
Member

So the host.ip and host.mac will be removed by default

I that also true for logs or just for metrics?

@pierrehilbert pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Sep 7, 2023
@gizas
Copy link
Contributor Author

gizas commented Sep 7, 2023

So the host.ip and host.mac will be removed by default

I that also true for logs or just for metrics?

Both . See tests #36506 (comment)

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch LGTM! Thanks!

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Lets make sure we get this deployed directly in our test environments with the new configs and also get it into our cloud environments.

Comment on lines +43 to +56
if valueNETINFO == "false" {
return Config{
NetInfoEnabled: false,
CacheTTL: 5 * time.Minute,
ExpireUpdateTimeout: time.Second * 10,
ReplaceFields: true,
}
} else {
return Config{
NetInfoEnabled: true,
CacheTTL: 5 * time.Minute,
ExpireUpdateTimeout: time.Second * 10,
ReplaceFields: true,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional suggestion to increase readability

Suggested change
if valueNETINFO == "false" {
return Config{
NetInfoEnabled: false,
CacheTTL: 5 * time.Minute,
ExpireUpdateTimeout: time.Second * 10,
ReplaceFields: true,
}
} else {
return Config{
NetInfoEnabled: true,
CacheTTL: 5 * time.Minute,
ExpireUpdateTimeout: time.Second * 10,
ReplaceFields: true,
}
return Config{
NetInfoEnabled: valueNETINFO == "true",
CacheTTL: 5 * time.Minute,
ExpireUpdateTimeout: time.Second * 10,
ReplaceFields: true,
}

@gizas gizas merged commit 0f47b30 into main Sep 8, 2023
@gizas gizas deleted the fixinghostips branch September 8, 2023 08:00
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
…ion of add-host-metadata processor (elastic#36506)

* adding check for host processor in case of K8s

* Changing variable to ELASTIC_NETINFO

* Updating changelog for variable  ELASTIC_NETINFO
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants