Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leak issue with filebeat and metricbeat #35796

Closed
kbujold opened this issue Jun 16, 2023 · 61 comments
Closed

Potential memory leak issue with filebeat and metricbeat #35796

kbujold opened this issue Jun 16, 2023 · 61 comments
Assignees
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team

Comments

@kbujold
Copy link

kbujold commented Jun 16, 2023

Since moving from ELK 7.17.1 to ELK 8.6.2 (ELK 8.7 and also ELK 8.8.0) we are experiencing OOMKilled on filebeat and metricbeat pods. We had no issues with ELK 7.17.1. Increasing the resources allocations does not resolve the issue and simply delays the pod restarting with OOM which usually occurs after 9-12 hours. This appears to be a memory leak issue with beats.

Below is the initial post which I raised in the forum with .bin files and configmaps
https://discuss.elastic.co/t/potential-memory-leak-issue-with-filebeat-and-metricbeat/334353

We have applied the configs recommended here but it did not resolved the issue. #33307 (comment)

Thank you,
Kris

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 16, 2023
@botelastic
Copy link

botelastic bot commented Jun 16, 2023

This issue doesn't have a Team:<team> label.

@cmacknz cmacknz added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Jun 16, 2023
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 16, 2023
@gizas
Copy link
Contributor

gizas commented Jun 19, 2023

Initially I would advise to add the add_resource_metadata block inside the autodiscovery configurtation:

Eg.

filebeat.autodiscover:
  providers:
  - hints.default_config:
      close_renamed: true
      paths:
      - /var/log/containers/*-${data.kubernetes.container.id}.log
      type: container
    hints.enabled: true
    host: ${NODE_NAME}
    type: kubernetes
    add_resource_metadata:
      deployment:false
      cronjob: false

The metadata enrichment is enabled by default in autodiscovery.

Additionally, you will need to remove the following from the module's config:

processors:
        - add_kubernetes_metadata:
            in_cluster: true
            host: ${NODE_NAME}
...

The add_kubernetes_metadata processor is redundant for the Kubernetes module since it automatically adds the metadata by default. Note that this processor uses the same "watcher" library under the hood and hence it could hit the same memory leak which is solved by disabling the add_resource_metadata.

In other words just make sure that they are disabling the add_resource_metadata.cronjob/deployment from both the Kubernetes module's config and in any add_kubernetes_metadata processor that is actually defined.

Also because you use the dedot configuration, have in mind that (see https://www.elastic.co/guide/en/beats/filebeat/current/configuration-autodiscover.html)
Starting from 8.6 release kubernetes.labels.* used in config templating are not dedoted regardless of labels.dedot value. This config parameter only affects the fields added in the final Elasticsearch document. So (although not 100% sure on what customer needs to achieve) I guess that the removal of processor wont affect customer

@gsantoro
Copy link
Contributor

as suggested by @gizas you also need to remove the processor from add_kubernetes_metadata. That seemed to have solved the problem at https://discuss.elastic.co/t/filebeat-memory-leak-via-filebeat-autodiscover-and-200-000-goroutines/322082/5.

@kbujold
Copy link
Author

kbujold commented Jun 20, 2023

@gizas We still have the same OOM issue with the changes you recommended. Below is the configmap for filebeat

Name:         mon-filebeat-daemonset-config
Namespace:    monitor
Labels:       app=mon-filebeat
              app.kubernetes.io/managed-by=Helm
              chart=filebeat-8.5.1
              helm.toolkit.fluxcd.io/name=filebeat
              helm.toolkit.fluxcd.io/namespace=monitor
              heritage=Helm
              release=mon-filebeat
Annotations:  meta.helm.sh/release-name: mon-filebeat
              meta.helm.sh/release-namespace: monitor
              remoteconfigchecksum: 12341234abcdabcd

Data
====
filebeat.yml:
----
fields:
  system:
    name: test
    uid: d1374af9-1234-aaaaa-bbb-974f1b033347
fields_under_root: true
filebeat.autodiscover:
  providers:
  - add_resource_metadata:
      cronjob: false
      deployment: false
    hints.default_config:
      close_renamed: true
      paths:
      - /var/log/containers/*-${data.kubernetes.container.id}.log
      type: container
    hints.enabled: true
    host: ${NODE_NAME}
    type: kubernetes
filebeat.inputs:
- close_timeout: 5m
  enabled: true
  exclude_files:
  - ^/var/log/containers/
  - ^/var/log/pods/
  paths:
  - /var/log/*.log
  - /var/log/messages
  - /var/log/syslog
  - /var/log/**/*.log
  type: log
http.port: 5066
monitoring:
  cluster_uuid: ${CLUSTER_UUID}
  elasticsearch:
    hosts:
    - https://mon-elasticsearch-client:9200
    password: ${beats_system_monitoring_password}
    ssl.certificate_authorities:
    - /usr/share/filebeat/ext-ca.crt
    username: ${beats_system_monitoring_user}
  enabled: ${BEAT_MONITORING_ENABLED}
name: ${NODE_NAME}
output.elasticsearch:
  enabled: false
  host: ${NODE_NAME}
  hosts:
  - https://mon-elasticsearch-client:9200
  ilm.pattern: "000001"
  index: ${INDEX_NAME}-%{+yyyy.MM.dd}
  password: ${ELASTICSEARCH_PASSWORD}
  protocol: https
  ssl.certificate_authorities:
  - /usr/share/filebeat/ext-ca.crt
  username: ${ELASTICSEARCH_USERNAME}
output.file:
  enabled: false
output.logstash:
  enabled: true
  hosts:
  - mon-logstash:5044
  ssl.certificate: /usr/share/filebeat/config/instance/filebeat.crt
  ssl.certificate_authorities:
  - /usr/share/filebeat/ca.crt
  - /usr/share/filebeat/previous/ca.crt
  - /usr/share/filebeat/next/ca.crt
  ssl.key: /usr/share/filebeat/config/instance/filebeat.key
  timeout: 9
setup.dashboards:
  enabled: false
setup.kibana:
  host: mon-kibana:5601
  password: ${filebeat_password}
  protocol: https
  ssl.certificate_authorities:
  - /usr/share/filebeat/ext-ca.crt
  ssl.verification_mode: none
  username: ${filebeat_user}
setup.template:
  name: ${INDEX_NAME}
  pattern: ${INDEX_PATTERN}


BinaryData
====

Events:  <none>

@gizas
Copy link
Contributor

gizas commented Jun 20, 2023

@kbujold thank you for the update. One thing noticing is that you have both inputs and autodiscovery.
(even in the manifest clearly suggests # To enable hints based autodiscover, remove filebeat.inputs configuration and uncomment ...)
I would suggest to remove the following:

filebeat.inputs:
- close_timeout: 5m
  enabled: true
  exclude_files:
  - ^/var/log/containers/
  - ^/var/log/pods/
  paths:
  - /var/log/*.log
  - /var/log/messages
  - /var/log/syslog
  - /var/log/**/*.log
  type: log
http.port: 5066

Can you please test the above and send us the manifest you use (and not the rendered output) to avoid any misalignment ?
If this also does not help, we could consider replicating your environment and test.

@kbujold
Copy link
Author

kbujold commented Jun 20, 2023

@gizas If we remove remove the filebeat.inputs section, we would lose our host log monitoring. We are running filebeat to capture both container and host logs, and have been doing so for years. We started having issues with ELK8.6.2.

What is the recommended way to monitor both host and container logs if we are to remove filebeat.inputs? This is a hard requirement for us.

root@mon-filebeat-bpxqs:/usr/share/filebeat# cat filebeat.yml
fields:
  system:
    name: test
    uid: d1374af9-1111-aaaa-bbbb-974f1b033347
fields_under_root: true
filebeat.autodiscover:
  providers:
  - add_resource_metadata:
      cronjob: false
      deployment: false
    hints.default_config:
      close_renamed: true
      paths:
      - /var/log/containers/*-${data.kubernetes.container.id}.log
      type: container
    hints.enabled: true
    host: ${NODE_NAME}
    type: kubernetes
filebeat.inputs:
- close_timeout: 5m
  enabled: true
  exclude_files:
  - ^/var/log/containers/
  - ^/var/log/pods/
  paths:
  - /var/log/*.log
  - /var/log/messages
  - /var/log/syslog
  - /var/log/**/*.log
  type: log
http.port: 5066
monitoring:
  cluster_uuid: ${CLUSTER_UUID}
  elasticsearch:
    hosts:
    - https://mon-elasticsearch-client:9200
    password: ${beats_system_monitoring_password}
    ssl.certificate_authorities:
    - /usr/share/filebeat/ext-ca.crt
    username: ${beats_system_monitoring_user}
  enabled: ${BEAT_MONITORING_ENABLED}
name: ${NODE_NAME}
output.elasticsearch:
  enabled: false
  host: ${NODE_NAME}
  hosts:
  - https://mon-elasticsearch-client:9200
  ilm.pattern: '000001'
  index: ${INDEX_NAME}-%{+yyyy.MM.dd}
  password: ${ELASTICSEARCH_PASSWORD}
  protocol: https
  ssl.certificate_authorities:
  - /usr/share/filebeat/ext-ca.crt
  username: ${ELASTICSEARCH_USERNAME}
output.file:
  enabled: false
output.logstash:
  enabled: true
  hosts:
  - mon-logstash:5044
  ssl.certificate: /usr/share/filebeat/config/instance/filebeat.crt
  ssl.certificate_authorities:
  - /usr/share/filebeat/ca.crt
  - /usr/share/filebeat/previous/ca.crt
  - /usr/share/filebeat/next/ca.crt
  ssl.key: /usr/share/filebeat/config/instance/filebeat.key
  timeout: 9
setup.dashboards:
  enabled: false
setup.kibana:
  host: mon-kibana:5601
  password: ${filebeat_password}
  protocol: https
  ssl.certificate_authorities:
  - /usr/share/filebeat/ext-ca.crt
  ssl.verification_mode: none
  username: ${filebeat_user}
setup.template:
  name: ${INDEX_NAME}
  pattern: ${INDEX_PATTERN}

@gizas
Copy link
Contributor

gizas commented Jun 21, 2023

Thanks for the details @kbujold ! (helped me understood your needs)

I am doing some tests this morning with 8.8.1 and filestream input, instead of the log input. In general we advise to use filestream and I would also advise to go with 8.8.1 (for fixes like this)

I am testing with this configuration and my memory seems stable (of course I have a limited cluster and probably with less traffic than you)

filebeat.inputs:
    - type: filestream
      id: my-filebeat-input
      paths:
        - /var/log/*.log
        - /var/log/messages
        - /var/log/syslog
        - /var/log/**/*.log
      prospector.scanner.exclude_files: ['^/var/log/containers/', '^/var/log/pods/']
      fields:
        system:
          name: test
          uid: d1374af9-1234-aaaaa-bbb-974f1b033347
      fields_under_root: true          

  # To enable hints based autodiscover, remove `filebeat.inputs` configuration and uncomment this:
filebeat.autodiscover:
   providers:
     - type: kubernetes
       node: ${NODE_NAME}
       hints.enabled: true
       hints.default_config:
         type: container
         paths:
           - /var/log/containers/*${data.kubernetes.container.id}.log
       add_resource_metadata:
        deployment: false
        cronjob: false 

So lets make another test with updated configuration and also upgrade to 8.8.1?
Also close_timeout: 5m and close_renamed: true can be removed if there is not a hard requirement from your side (although probably not related with the possible memory leak)

@gsantoro
Copy link
Contributor

gsantoro commented Jun 21, 2023

hello @kbujold ,
given the complex nature of replicating this issue in our dev environment I think we need the user to first identify where the leak is generated.

is it generated by the autodiscover or by filebeat.inputs?

Can we use the suggested config at #35796 (comment) but disable alternatively the autodiscover or the filebeat.inputs sections and tell us if there is a possible memory leak in either of those cases.

@kbujold
Copy link
Author

kbujold commented Jun 21, 2023

@gsantoro

Removing filebeat.autodiscover and metricbeat.autodiscover did not resulted in OOM pod restarts for metricbeat and filebeat. But as mentioned before we need this config.

@gsantoro
Copy link
Contributor

gsantoro commented Jun 21, 2023

ok. great so the problem is definitively in the autodiscover.

Do you mind trying this config instead

filebeat.inputs:
    - type: filestream
      id: my-filebeat-input
      paths:
        - /var/log/*.log
        - /var/log/messages
        - /var/log/syslog
        - /var/log/**/*.log
      prospector.scanner.exclude_files: ['^/var/log/containers/', '^/var/log/pods/']
      fields:
        system:
          name: test
          uid: d1374af9-1234-aaaaa-bbb-974f1b033347
      fields_under_root: true          

    # To enable hints based autodiscover, remove `filebeat.inputs` configuration and uncomment this:
    filebeat.autodiscover:
     providers:
       - type: kubernetes
         node: ${NODE_NAME}
         hints.enabled: true
         hints.default_config:
           type: filestream
           prospector.scanner.symlinks: true
           id: filestream-kubernetes-pod-${data.kubernetes.container.id}
           paths:
             - /var/log/containers/*${data.kubernetes.container.id}.log
           parsers:
           - container: ~ 
         add_resource_metadata:
          deployment: false
          cronjob: false 

just make sure that both filebeat and the Elastic stack are at version 8.8.1 otherwise you might encounter other issues since some of those changes have only been fixed recently.

@gizas has tested that the previous configs work with that version and he wasn't able to replicate any OOM issues.

@kbujold
Copy link
Author

kbujold commented Jun 21, 2023

@gsantoro @gizas

We still see the issue with filebeat 8.8.1 and the recommended settings. Below are our new settings.

filebeat.autodiscover:
  providers:
  - add_resource_metadata:
      cronjob: false
      deployment: false
    hints.default_config:
      id: filestream-kubernetes-pod-${data.kubernetes.container.id}
      parsers:
      - container: null
      paths:
      - /var/log/containers/*${data.kubernetes.container.id}.log
      prospector.scanner.symlinks: true
      type: filestream
    hints.enabled: true
    node: ${NODE_NAME}
    type: kubernetes
filebeat.inputs:
- fields:
    system:
      name: test
      uid: d1374af9-1111-2222-3333-974f1b033347
  fields_under_root: true
  id: wra-filestream-id
  paths:
  - /var/log/*.log
  - /var/log/messages
  - /var/log/syslog
  - /var/log/**/*.log
  prospector.scanner.exclude_files:
  - ^/var/log/containers/
  - ^/var/log/pods/
  type: filestream

Note that we are not seeing the metricbeat issue with the removal of add_kubernetes_metadata under processors
image

And adding this below in green
image

These metricbeat changes were tested in ELK 8.8.0

@gizas
Copy link
Contributor

gizas commented Jun 22, 2023

Thank you @kbujold , at least we have some progress.

For filebeat, can you also try the following:

add_resource_metadata:
        namespace:
          enabled: false
        node:
          enabled: false
        deployment: false
        cronjob: false

You are also disabling namespace and node metadata enrichment with above. I would suggest to try disabling those one by one and then both together. Have in mind that you might loose some metadata in some cases but I hope that wont be important in your case as you are mainly looking for the actual logs.

If this also does not work, I think we will need some information for your cluster, size, if you have restarts etc. in order to simulate the situation. Also the output of kubectl get events -A can be helpful

I am doing some tests this morning with 8.8.1 and filestream input, instead of the log input. In general we advise to use filestream and I would also advise to go with 8.8.1 (for fixes like #34388 (comment))

Also important note is to test please with 8.8.1 as this is the version where the fix with filestream input has been merged

@gsantoro
Copy link
Contributor

hello @kbujold ,
I'm happy to see they solved the problem with metricbeat.

It's not clear if they replicated the same changes with filebeat.

Can you please post here the entire filebeat config not just those provided sections?

@gsantoro
Copy link
Contributor

One more thing to be sure about the testing:

  • make sure to use 8.8.1 for your testing since there are some related fixes in that version
  • when you apply the changes make sure that you restart filebeat for those changes to be applied correctly.

@kbujold
Copy link
Author

kbujold commented Jun 22, 2023

@gsantoro
Here is the complete config used with filebeat 8.8.1, and yes we had restarted the pod.

kubectl -n monitor exec -it  mon-filebeat-lhbjj -- /bin/cat /usr/share/filebeat/filebeat.yml
Defaulted container "filebeat" out of: filebeat, a-beat-setenv (init), b-security-setenv (init), c-beat-setup (init)
filebeat.autodiscover:
  providers:
  - add_resource_metadata:
      cronjob: false
      deployment: false
    hints.default_config:
      id: filestream-kubernetes-pod-${data.kubernetes.container.id}
      parsers:
      - container: null
      paths:
      - /var/log/containers/*${data.kubernetes.container.id}.log
      prospector.scanner.symlinks: true
      type: filestream
    hints.enabled: true
    node: ${NODE_NAME}
    type: kubernetes
filebeat.inputs:
- fields:
    system:
      name: test
      uid: d1374af9-aaaa-bbbb-cccc-974f1b033347
  fields_under_root: true
  id: wra-filestream-id
  paths:
  - /var/log/*.log
  - /var/log/messages
  - /var/log/syslog
  - /var/log/**/*.log
  prospector.scanner.exclude_files:
  - ^/var/log/containers/
  - ^/var/log/pods/
  type: filestream
http.port: 5066
monitoring:
  cluster_uuid: ${CLUSTER_UUID}
  elasticsearch:
    hosts:
    - https://mon-elasticsearch-client:9200
    password: ${beats_system_monitoring_password}
    ssl.certificate_authorities:
    - /usr/share/filebeat/ext-ca.crt
    username: ${beats_system_monitoring_user}
  enabled: ${BEAT_MONITORING_ENABLED}
name: ${NODE_NAME}
output.elasticsearch:
  enabled: false
  host: ${NODE_NAME}
  hosts:
  - https://mon-elasticsearch-client:9200
  ilm.pattern: '000001'
  index: ${INDEX_NAME}-%{+yyyy.MM.dd}
  password: ${ELASTICSEARCH_PASSWORD}
  protocol: https
  ssl.certificate_authorities:
  - /usr/share/filebeat/ext-ca.crt
  username: ${ELASTICSEARCH_USERNAME}
output.file:
  enabled: false
output.logstash:
  enabled: true
  hosts:
  - mon-logstash:5044
  ssl.certificate: /usr/share/filebeat/config/instance/filebeat.crt
  ssl.certificate_authorities:
  - /usr/share/filebeat/ca.crt
  - /usr/share/filebeat/previous/ca.crt
  - /usr/share/filebeat/next/ca.crt
  ssl.key: /usr/share/filebeat/config/instance/filebeat.key
  timeout: 9
processors:
- add_kubernetes_metadata:
    annotations.dedot: true
    default_matchers.enabled: false
    labels.dedot: true
setup.dashboards:
  enabled: false
setup.kibana:
  host: mon-kibana:5601
  password: ${filebeat_password}
  protocol: https
  ssl.certificate_authorities:
  - /usr/share/filebeat/ext-ca.crt
  ssl.verification_mode: none
  username: ${filebeat_user}
setup.template:
  name: ${INDEX_NAME}
  pattern: ${INDEX_PATTERN}

@gizas
Copy link
Contributor

gizas commented Jun 23, 2023

@kbujold I can still see the processor block inside:

- add_kubernetes_metadata:
    annotations.dedot: true
    default_matchers.enabled: false
    labels.dedot: true

Can you also remove this?

@kbujold
Copy link
Author

kbujold commented Jun 23, 2023

@gizas

It OOMed with this config as well with add_kubernetes_metadata removed

kubectl -n monitor exec -it  mon-filebeat-6pkqg -- /bin/cat /usr/share/filebeat/filebeat.yml
Defaulted container "filebeat" out of: filebeat, a-beat-setenv (init), b-security-setenv (init), c-beat-setup (init)
filebeat.autodiscover:
  providers:
  - add_resource_metadata:
      cronjob: false
      deployment: false
      namespace:
        enabled: false
      node:
        enabled: false
    hints.default_config:
      id: filestream-kubernetes-pod-${data.kubernetes.container.id}
      parsers:
      - container: null
      paths:
      - /var/log/containers/*${data.kubernetes.container.id}.log
      prospector.scanner.symlinks: true
      type: filestream
    hints.enabled: true
    node: ${NODE_NAME}
    type: kubernetes
filebeat.inputs:
- fields:
    system:
      name: yow-cgcs-supermicro-2
      uid: d1374af9-1e53-4339-92be-974f1b033347
  fields_under_root: true
  id: wra-filestream-id
  paths:
  - /var/log/*.log
  - /var/log/messages
  - /var/log/syslog
  - /var/log/**/*.log
  prospector.scanner.exclude_files:
  - ^/var/log/containers/
  - ^/var/log/pods/
  type: filestream
http.port: 5066
monitoring:
  cluster_uuid: ${CLUSTER_UUID}
  elasticsearch:
    hosts:
    - https://mon-elasticsearch-client:9200
    password: ${beats_system_monitoring_password}
    ssl.certificate_authorities:
    - /usr/share/filebeat/ext-ca.crt
    username: ${beats_system_monitoring_user}
  enabled: ${BEAT_MONITORING_ENABLED}
name: ${NODE_NAME}
output.elasticsearch:
  enabled: false
  host: ${NODE_NAME}
  hosts:
  - https://mon-elasticsearch-client:9200
  ilm.pattern: '000001'
  index: ${INDEX_NAME}-%{+yyyy.MM.dd}
  password: ${ELASTICSEARCH_PASSWORD}
  protocol: https
  ssl.certificate_authorities:
  - /usr/share/filebeat/ext-ca.crt
  username: ${ELASTICSEARCH_USERNAME}
output.file:
  enabled: false
output.logstash:
  enabled: true
  hosts:
  - mon-logstash:5044
  ssl.certificate: /usr/share/filebeat/config/instance/filebeat.crt
  ssl.certificate_authorities:
  - /usr/share/filebeat/ca.crt
  - /usr/share/filebeat/previous/ca.crt
  - /usr/share/filebeat/next/ca.crt
  ssl.key: /usr/share/filebeat/config/instance/filebeat.key
  timeout: 9
setup.dashboards:
  enabled: false
setup.kibana:
  host: mon-kibana:5601
  password: ${filebeat_password}
  protocol: https
  ssl.certificate_authorities:
  - /usr/share/filebeat/ext-ca.crt
  ssl.verification_mode: none
  username: ${filebeat_user}
setup.template:
  name: ${INDEX_NAME}
  pattern: ${INDEX_PATTERN}

@gsantoro
Copy link
Contributor

are you using 8.8.1 elastic stack?

@kbujold
Copy link
Author

kbujold commented Jun 23, 2023

@gsantoro
Yes

    Image:         docker.elastic.co/beats/filebeat:8.8.1
    Image ID:      docker.elastic.co/beats/filebeat@sha256:0fe1426fd48b7e25681478e40ad2087c603046a16c54b8b529bafd0c8106a7d0

@gsantoro
Copy link
Contributor

@kbujold ,
at this point i think it is very weird the problem still persist in filebeat while it has been solved with metricbeat. We will need to investigate what is different between the two beats about metadata collection.

@gizas
Copy link
Contributor

gizas commented Jun 27, 2023

Lets me suggest some more things @kbujold in order to also be able to simulate your setup:

  1. Lets try to get some more traces

I have been running experiments trying to reproduce the issue. Here is what I'm using for reference:

Script to collect heap profiles

#!/bin/bash
sleepTime=$((60*5))
for i in $(seq 1 1 100)
do
   echo "Getting heap for $i time"
   go tool pprof -png http://localhost:5066/debug/pprof/heap > heap${i}.png
   sleep $sleepTime
done

Filebeat's config

http.enabled: true
http.port: 5066
http.host: 0.0.0.0
http.pprof.enabled: true

It would be great to see what part of memory is growing and really helpful to see a heap close to the restart

  1. You can try disabling the whole metadata enrichment by adding: add_metadata: false inside the config of autodiscovery. Please try this only in Filebeat to see if problem still exists
  2. Can you please send us some information about your cluster to see how close we are in our tests? No of pods to observe, any restarts, the output of kubectl get events -A, or some more logs from filebeat can also help.

@kbujold
Copy link
Author

kbujold commented Jun 27, 2023

@gizas We have collected the bin files already please check in the forum here .

kubectl get events provides no pertinent information.

The only pods that restarts are filebeat and metricbeat.

kubectl version --short
Client Version: v1.24.4
Kustomize Version: v4.5.4
Server Version: v1.24.4

@kbujold
Copy link
Author

kbujold commented Jun 27, 2023

@gizas
Still have pod restarts with add_metadata: false see below for full config.
We know there was no issue in version 7.17.1 and we started seeing this issue in 8.6.2. Can you found out what has changed in the potentially problematic code between these two releases?

kubectl -n monitor exec -it  mon-filebeat-zvbd6  -- /bin/cat /usr/share/filebeat/filebeat.yml
Defaulted container "filebeat" out of: filebeat, a-beat-setenv (init), b-security-setenv (init), c-beat-setup (init)
filebeat.autodiscover:
  add_metadata: false
  providers:
  - add_resource_metadata:
      cronjob: false
      deployment: false
      namespace:
        enabled: false
      node:
        enabled: false
    hints.default_config:
      id: filestream-kubernetes-pod-${data.kubernetes.container.id}
      parsers:
      - container: null
      paths:
      - /var/log/containers/*${data.kubernetes.container.id}.log
      prospector.scanner.symlinks: true
      type: filestream
    hints.enabled: true
    node: ${NODE_NAME}
    type: kubernetes
filebeat.inputs:
- fields:
    system:
      name: yow-cgcs-wildcat
      uid: 330dd772-2efa-4c9f-b03e-fbbd6b0bca54
  fields_under_root: true
  id: wra-filestream-id
  paths:
  - /var/log/*.log
  - /var/log/messages
  - /var/log/syslog
  - /var/log/**/*.log
  prospector.scanner.exclude_files:
  - ^/var/log/containers/
  - ^/var/log/pods/
  type: filestream
http.port: 5066
logging.level: warning
monitoring:
  cluster_uuid: ${CLUSTER_UUID}
  elasticsearch:
    hosts:
    - https://mon-elasticsearch-client:9200
    password: ${beats_system_monitoring_password}
    ssl.certificate_authorities:
    - /usr/share/filebeat/ext-ca.crt
    username: ${beats_system_monitoring_user}
  enabled: ${BEAT_MONITORING_ENABLED}
name: ${NODE_NAME}
output.elasticsearch:
  enabled: false
  host: ${NODE_NAME}
  hosts:
  - https://mon-elasticsearch-client:9200
  ilm.pattern: '000001'
  index: ${INDEX_NAME}-%{+yyyy.MM.dd}
  password: ${ELASTICSEARCH_PASSWORD}
  protocol: https
  ssl.certificate_authorities:
  - /usr/share/filebeat/ext-ca.crt
  username: ${ELASTICSEARCH_USERNAME}
output.file:
  enabled: false
output.logstash:
  enabled: true
  hosts:
  - mon-logstash:5044
  ssl.certificate: /usr/share/filebeat/config/instance/filebeat.crt
  ssl.certificate_authorities:
  - /usr/share/filebeat/ca.crt
  - /usr/share/filebeat/previous/ca.crt
  - /usr/share/filebeat/next/ca.crt
  ssl.key: /usr/share/filebeat/config/instance/filebeat.key
  timeout: 9
processors:
- add_kubernetes_metadata:
    annotations.dedot: true
    default_matchers.enabled: false
    labels.dedot: true
setup.dashboards:
  enabled: false
setup.kibana:
  host: mon-kibana:5601
  password: ${filebeat_password}
  protocol: https
  ssl.certificate_authorities:
  - /usr/share/filebeat/ext-ca.crt
  ssl.verification_mode: none
  username: ${filebeat_user}
setup.template:
  name: ${INDEX_NAME}
  pattern: ${INDEX_PATTERN}

@kbujold
Copy link
Author

kbujold commented Sep 27, 2023

We are still have issues with filebeat memory climbing in our labs. It keeps climbing over time with using the two following config

filebeat.autodiscover:
  providers:
  - add_resource_metadata:
      cronjob: false
      deployment: false
      namespace:
        enabled: false
      node:
        enabled: false
filebeat.autodiscover:
  providers:
  - add_resource_metadata:
      cronjob: false
      deployment: false

This is over 18 hours with the above first config
image

If we remove add_resource_metadata completely the memory is stable. This is over 24 hours
image

The problem is we need to have the metadata enabled for namespace and node. We using ELK 8.9.0.

Thank you
Kris

@gizas
Copy link
Contributor

gizas commented Sep 28, 2023

Hello @kbujold ,

Trying to understand your comment

If we remove add_resource_metadata completely

If you remove the add_resource_metadata, this means that all metadata enrichment will happen because by default all the flags (cronjob,namespace, node, deployment) are true. So please double confirm that if without the add_resource_metadata you have the logs and also the metadata is there, you have no problem.

In any case I would want:

  • Another copy of your manifest
  • A pprof to analyse again what is happening with 8.9.0

@kbujold
Copy link
Author

kbujold commented Sep 28, 2023

We will collect more data. Can the CPU running 100% for the pod cause issues with the pod's memory management? We did not increase the CPU pod limit from ELK 7.17 to ELK 8.9.0. With ELK 8.9.0 its running at close to 100%. Here's a 4 days collection.

image

We have found some promising results with increasing the CPU so the filebeat pods are not running at 100%.

@gizas
Copy link
Contributor

gizas commented Sep 29, 2023

Thank you @kbujold , indeed you need to keep monitoring both CPU and memory.

Can the CPU running 100% for the pod cause issues with the pod's memory management?

The CPU throttling will cause service degradation, latency etc. as silently requests will be dropped.
In that cases applications' processes dont guarantee the correct functionality, so not quite sure how the memory will behave. But I have seen cases with low memory and high CPU usage.

For the 100% usage, this is a percentage on the limits you have defined in your manifest.
See eg https://github.com/elastic/beats/blob/main/deploy/kubernetes/filebeat/filebeat-daemonset.yaml#L49-L54,
where we dont define CPU limits for filebeat, thus this means that kubernetes will try to assign the maximum available per request. I f you are sure that you node has enough CPU and rest pods wont be affected you can leave the filebeat with no CPU limit

@kbujold
Copy link
Author

kbujold commented Oct 3, 2023

Case 1) remove add_resource_metadata from filebeat config. In this case we see the memory being stable. I have collected profile at 3 hours and also 14 hours. See .bin files here

kubectl -n  monitor  exec -it  mon-filebeat-45rtr /bin/cat /usr/share/filebeat/filebeat.yml
fields:
  system:
    name: SystemControllerDC4
    uid: bc9450c8-8c28-4be4-a911-6ae5cd8b1297
fields_under_root: true
filebeat.autodiscover:
  providers:
  - hints.default_config:
      close_renamed: true
      paths:
      - /var/log/containers/*-${data.kubernetes.container.id}.log
      type: container
    hints.enabled: true
    host: ${NODE_NAME}
    type: kubernetes
filebeat.inputs:
- close_timeout: 5m
  enabled: true
  exclude_files:
  - ^/var/log/containers/
  - ^/var/log/pods/
  paths:
  - /var/log/*.log
  - /var/log/messages
  - /var/log/syslog
  - /var/log/**/*.log
  type: log
http:
  host: localhost
  pprof.enabled: true
http.port: 5066
logging.level: warning
monitoring:
  cluster_uuid: ${CLUSTER_UUID}
  elasticsearch:
    hosts:
    - https://mon-elasticsearch-client:9200
    password: ${beats_system_monitoring_password}
    ssl.certificate_authorities:
    - /usr/share/filebeat/ext-ca.crt
    username: ${beats_system_monitoring_user}
  enabled: ${BEAT_MONITORING_ENABLED}
name: ${NODE_NAME}
output.elasticsearch:
  enabled: false
  host: ${NODE_NAME}
  hosts:
  - https://mon-elasticsearch-client:9200
  ilm.pattern: '000001'
  index: ${INDEX_NAME}-%{+yyyy.MM.dd}
  password: ${ELASTICSEARCH_PASSWORD}
  protocol: https
  ssl.certificate_authorities:
  - /usr/share/filebeat/ext-ca.crt
  username: ${ELASTICSEARCH_USERNAME}
output.file:
  enabled: false
output.logstash:
  enabled: true
  hosts:
  - mon-logstash:5044
  ssl.certificate: /usr/share/filebeat/config/instance/filebeat.crt
  ssl.certificate_authorities:
  - /usr/share/filebeat/ca.crt
  - /usr/share/filebeat/previous/ca.crt
  - /usr/share/filebeat/next/ca.crt
  ssl.key: /usr/share/filebeat/config/instance/filebeat.key
  timeout: 9
setup.dashboards:
  enabled: false
setup.kibana:
  host: mon-kibana:5601
  password: ${filebeat_password}
  protocol: https
  ssl.certificate_authorities:
  - /usr/share/filebeat/ext-ca.crt
  ssl.verification_mode: none
  username: ${filebeat_user}
setup.template:
  name: ${INDEX_NAME}
  pattern: ${INDEX_PATTERN}

image

image

Case 2) We have add_resource_metadata with all options disabled in the filebeat config. I have 3 hours and 12 hours profile bin files here.

root@mon-filebeat-l865t:/usr/share/filebeat# cat filebeat.yml
fields:
  system:
    name: SystemControllerDC4
    uid: bc9450c8-8c28-4be4-a911-6ae5cd8b1297
fields_under_root: true
filebeat.autodiscover:
  providers:
  - add_resource_metadata:
      cronjob: false
      deployment: false
      namespace:
        enabled: false
      node:
        enabled: false
    hints.default_config:
      close_renamed: true
      paths:
      - /var/log/containers/*-${data.kubernetes.container.id}.log
      type: container
    hints.enabled: true
    host: ${NODE_NAME}
    type: kubernetes
filebeat.inputs:
- close_timeout: 5m
  enabled: true
  exclude_files:
  - ^/var/log/containers/
  - ^/var/log/pods/
  paths:
  - /var/log/*.log
  - /var/log/messages
  - /var/log/syslog
  - /var/log/**/*.log
  type: log
http:
  host: localhost
  pprof.enabled: true
http.port: 5066
logging.level: warning
monitoring:
  cluster_uuid: ${CLUSTER_UUID}
  elasticsearch:
    hosts:
    - https://mon-elasticsearch-client:9200
    password: ${beats_system_monitoring_password}
    ssl.certificate_authorities:
    - /usr/share/filebeat/ext-ca.crt
    username: ${beats_system_monitoring_user}
  enabled: ${BEAT_MONITORING_ENABLED}
name: ${NODE_NAME}
output.elasticsearch:
  enabled: false
  host: ${NODE_NAME}
  hosts:
  - https://mon-elasticsearch-client:9200
  ilm.pattern: '000001'
  index: ${INDEX_NAME}-%{+yyyy.MM.dd}
  password: ${ELASTICSEARCH_PASSWORD}
  protocol: https
  ssl.certificate_authorities:
  - /usr/share/filebeat/ext-ca.crt
  username: ${ELASTICSEARCH_USERNAME}
output.file:
  enabled: false
output.logstash:
  enabled: true
  hosts:
  - mon-logstash:5044
  ssl.certificate: /usr/share/filebeat/config/instance/filebeat.crt
  ssl.certificate_authorities:
  - /usr/share/filebeat/ca.crt
  - /usr/share/filebeat/previous/ca.crt
  - /usr/share/filebeat/next/ca.crt
  ssl.key: /usr/share/filebeat/config/instance/filebeat.key
  timeout: 9
setup.dashboards:
  enabled: false
setup.kibana:
  host: mon-kibana:5601
  password: ${filebeat_password}
  protocol: https
  ssl.certificate_authorities:
  - /usr/share/filebeat/ext-ca.crt
  ssl.verification_mode: none
  username: ${filebeat_user}
setup.template:
  name: ${INDEX_NAME}
  pattern: ${INDEX_PATTERN}

image

Controller-0
image

I would not have expected Case 2) to cause the creeping memory. Maybe it case be a clue to where this memory leak issue is coming from.

Thanks,
Kristine

@gizas
Copy link
Contributor

gizas commented Oct 3, 2023

Thank you Cristine, that is good news. So you have all enabled (full metadata enrichment) and no memory increase in 1.

I see in your 2) heaps that autodiscovery for pods is there (should not expect this as all are disabled, can you please doubleconfirm that configuration is aligned with correct agent )

Screenshot 2023-10-03 at 3 27 46 PM

Also from 2) heaps:

3h:
Total Memory: 42340KB
Newnformer (the function that is responsible for metadata enirchment): 1038KB

12h:
Total: 43328 KB
Newnformer: 1038KB

So even the heap states that there is no memory leak in the autodiscovery memory.

Question: What is the blue and what is the green agents that you have in your second Photo above? What is their difference?

Also mind that:

http.pprof.enabled: true
logging.level: warning

The above two options should be used only in lab environments. Especially pprof should be disabled in production as it increases memory consumption

@kbujold
Copy link
Author

kbujold commented Oct 3, 2023

There is no good news ;-)

Case 1 does not enable the full metadata enrichment. With add_resource_metadata removed from the filebeat config, the filebeat collections are missing kubernetes.namespace and kubernetes.nodes which are both needed in our product.

Case 2) The configuration is 100% correct and thus why it is puzzling. All configs are disabled and the memory is creeping overtime resulting eventually in a pod restart with OOMO

mon-filebeat-7jw2v                                 1/1     Running     0              23h     dead:beef::a4ce:fec1:5423:e333   controller-1   <none>           <none>
mon-filebeat-88kdr                                 1/1     Running     1 (7h2m ago)   23h     dead:beef::8e22:765f:6121:eb57   controller-0   <none>           <none>
Containers:
  filebeat:
    Container ID:  containerd://500808f3a29b6a5272145e6d30e5ba4e8b92b003249bd6275084e6fd4d5be9a4
    Image:         registry.local:9001/docker.elastic.co/beats/filebeat:8.9.0
    Image ID:      registry.local:9001/docker.elastic.co/beats/filebeat@sha256:615b1d701dddaff7010b77c1d219ebc9cd286e845286da1594659d62086727b3
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      source /usr/share/filebeat/scriptfiles/filebeat_startup.sh; source ./startup
    Args:
      -e
      -e
      -E
      http.enabled=true
    State:          Running
      Started:      Tue, 03 Oct 2023 06:17:28 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Mon, 02 Oct 2023 14:02:11 +0000
      Finished:     Tue, 03 Oct 2023 06:17:27 +0000
    Ready:          True
    Restart Count:  1

Our system runs the filebeat pod on two nodes. The green is controller-0 and the one with the memory creep.

We set the logging to warning because we were getting large amounts of info logs. You do not recommend setting loggin to warning in production?

logging.level: warning

Kristine

@gizas
Copy link
Contributor

gizas commented Oct 4, 2023

For 1) You can always enable only add_resource_metadata.node and add_resource_metadata.namespace as I dont expect much overhead.

There is this fix in #36736 in 8.10.3 (that will be released next days that should improve filebeat memory usage overall)

Also is advisable to set logging.level: error

Can I suggest another thing (as long as you are testing in lab:)
Do you want to remove the filebeat.inputs block and just add under filebeat.autodiscover.providers multiple paths:
like:

 paths:
  - /var/log/containers/*-${data.kubernetes.container.id}.log
  - /var/log/*.log
  - /var/log/messages
  - /var/log/syslog

@gizas
Copy link
Contributor

gizas commented Oct 5, 2023

Case 1 does not enable the full metadata enrichment.

@kbujold I tested the above and I can confirm that is not accurate:

With below config:

filebeat.autodiscover:
     providers:
       - type: kubernetes
         node: ${NODE_NAME}
         hints.enabled: true
         hints.default_config:
           type: container
           paths:
             - /var/log/containers/*${data.kubernetes.container.id}.log

Screenshot 2023-10-05 at 1 19 36 PM

See above node and namespace labels being present

@bturquet bturquet assigned constanca-m and unassigned gsantoro Oct 5, 2023
@kbujold
Copy link
Author

kbujold commented Oct 5, 2023

When we have case 1 set we see no data for kubernetes.namespace : * search in for the filebeat index

filebeat.autodiscover:
  providers:
  - hints.default_config:
      close_renamed: true
      paths:
      - /var/log/containers/*-${data.kubernetes.container.id}.log
      type: container
    hints.enabled: true
    host: ${NODE_NAME}
    type: kubernetes

image

@gizas
Copy link
Contributor

gizas commented Oct 6, 2023

This seems like you dont receive logs at all

  1. Make sure that you folder /var/log/containers/ inside filebeat receives new log files. Exec in the filebeat pod and see
  2. Please check the filebeat for any erros for the autodiscovery input

@rodrigc
Copy link

rodrigc commented Oct 11, 2023

I am deploying Elastic 8.9.0 via ECK,
and I am also seeing memory issues in filebeat, where the pods running filebeat are periodically
killed via the OOM Killer.

I reported this in case 01497623

the ECK yaml that I used to configure filebeat is:

---
# filebeat resources
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: my-filebeat
  namespace: elastic
spec:
  type: filebeat
  version: 8.9.0
  elasticsearchRef:
    name: my-elastic
  kibanaRef:
    name: my-kibana
  config:
    filebeat:
      autodiscover:
        providers:
          - type: kubernetes
            node: ${NODE_NAME}
            hints:
              enabled: true
              default_config:
                type: container
                paths: ['/var/log/containers/*${data.kubernetes.container.id}.log']
    processors:
      - add_cloud_metadata: {}
      - add_host_metadata: {}
    logging.json: true
  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: filebeat
        automountServiceAccountToken: true
        terminationGracePeriodSeconds: 30
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: true  # Allows to provide richer host metadata
        securityContext:
          runAsUser: 0
        containers:
          - name: filebeat
            env:
              - name: NODE_NAME
                valueFrom:
                  fieldRef:
                    fieldPath: spec.nodeName
            resources:
              requests:
                cpu: 100m
                memory: 600Mi
              limits:
                cpu: 100m
                memory: 600Mi
            volumeMounts:
              - name: varlogcontainers
                mountPath: /var/log/containers
              - name: varlogpods
                mountPath: /var/log/pods
              - name: varlibdockercontainers
                mountPath: /var/lib/docker/containers
        volumes:
          - name: varlogcontainers
            hostPath:
              path: /var/log/containers
          - name: varlogpods
            hostPath:
              path: /var/log/pods
          - name: varlibdockercontainers
            hostPath:
              path: /var/lib/docker/containers
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: filebeat
rules:
  - apiGroups: ['']
    resources: [nodes, namespaces, events, pods]
    verbs: [get, list, watch]
  - apiGroups: [batch]
    resources: [jobs]
    verbs: [get, list, watch]
  - apiGroups: [extensions]
    resources: [replicasets]
    verbs: [get, list, watch]
  - apiGroups: [apps]
    resources: [statefulsets, deployments, replicasets]
    verbs: [get, list, watch]
  - apiGroups: ['']
    resources: [nodes/stats]
    verbs: [get]
  - nonResourceURLs: [/metrics]
    verbs: [get]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: filebeat
  namespace: elastic
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: filebeat
subjects:
  - kind: ServiceAccount
    name: filebeat
    namespace: elastic
roleRef:
  kind: ClusterRole
  name: filebeat
  apiGroup: rbac.authorization.k8s.io

Is the issue I am seeing similar to what you are seeing, @kbujold ?

Did you converge on a working solution to get around this problem with filebeat?
Does PR 36736 address this problem?

I also see discussion about filebeat/metricbeat memory leaks in:

Are there any solutions posed in those issues which may solve my problem?

@gizas
Copy link
Contributor

gizas commented Oct 11, 2023

@rodrigc

For the current discussion the workaround that was proposed is to disable specific metadata by using add_resource_metadata config
See example below:

filebeat:
      autodiscover:
        providers:
          - type: kubernetes
            node: ${NODE_NAME}
            hints:
              enabled: true
              default_config:
                type: container
                paths: 
                   - /var/log/containers/*${data.kubernetes.container.id}.log
            add_resource_metadata:
              cronjob: false
              deployment: false

For your configuration I see:

requests:
    memory: 600Mi

Mind that according to your cluster and size this might not be enough so you should consider increasing and see where the memory stabilises for your setup. If it not stabilises, then we should see if we have a memory leak or not.

PR 36736 address other problem but it can help to reduce the overall memory consumption

@rodrigc
Copy link

rodrigc commented Oct 11, 2023

Some questions:

  1. Do you feel that if I add to my setup:
            add_resource_metadata:
              cronjob: false
              deployment: false

that I will achieve the benefit of what is being discussed in this issue? Or is that a wrong path, due to the fact that I don't have add_resource_metadata set right now?

  1. For
requests:
  memory: 600Mi

Is there a better way to figure out an optimal setting for this other than trial and error? For my setup, I need to control memory/CPU resources via a combination of Terraform + Kubernetes YAML, so I can't just arbitrarily change this value.

  1. In Re-use buffers to optimise memory allocation in fingerprint #36736 (comment), @rdner mentioned that the scope of the optimization in that PR is limited to the filestream input and only when the new fingerprint file identity is used. Would I see any benefit to that in my setup?

@gizas
Copy link
Contributor

gizas commented Oct 12, 2023

  1. Yes (as by default those fields are true)
  2. Usually the customers monitor the usage of their k8s cluster under load (either by a simple dashboard to showcase the CPU/Memory utilisaton of their cluster, either by kubernetes top pod command) and I would suggest to set the limits to something like 20% more than your usage. So your utilisation should be around 80%
  3. You are correct, you are using container input. Unless you change to filestream input you can not anticipate any improvement

@rodrigc
Copy link

rodrigc commented Oct 12, 2023

I upgraded my Elastic cluster to 8.10.3, and did not see any improveent due to 3.

I will investigate 1. and 2. as per your recommendations.

Thanks for the responses.

@kbujold
Copy link
Author

kbujold commented Oct 16, 2023

This seems like you dont receive logs at all

  1. Make sure that you folder /var/log/containers/ inside filebeat receives new log files. Exec in the filebeat pod and see
  2. Please check the filebeat for any erros for the autodiscovery input

We do, just not kubernetes fields. We see no errors from the filebeat pods

image

@kbujold
Copy link
Author

kbujold commented Oct 16, 2023

rodrigc

We fixed our issue with significantly increasing the filebeat's CPU limit so it would not run at 100%. With this the change we are not seeing anymore OOM pod retsarts and the filebeat memory is stable.

@gizas
Copy link
Contributor

gizas commented Oct 16, 2023

We do, just not kubernetes fields. We see no errors from the filebeat pods

@kbujold mind that the example you posted comes from the filebeat.input so it is expected that does not belong to any kubernetes autodiscovery and has not container or namespace. It is expected

filebeat.inputs:
- close_timeout: 5m
  enabled: true
  ...
  paths:
  - /var/log/*.log. < ----From this matching

@rodrigc
Copy link

rodrigc commented Oct 19, 2023

@gizas I see that you submitted this PR to elastic-agent 8.10.4: elastic/elastic-agent#3591
which disables metadata enrichment for deployments and cronjobs.
Is that aligned with the issue which you mentioned earlier in this GitHub issue?

If I upgrade to filebeat 8.10.4, will this default to disabling metadata enrichment for deployments and cronjobs,
so that I do not have to explicitly set it in my filebeat config?

@gizas
Copy link
Contributor

gizas commented Oct 19, 2023

@rodrigc indeed I backported this for elastic-agent but the backport for beats was done in #36880 (after 8.10.4 release)

(And this is the backport for 8.11 #36879)

So it will be available to next beats version 8.10.5 (when/if is released), either in 8.11.1

@rodrigc
Copy link

rodrigc commented Oct 19, 2023

Ah OK, thanks for that. I will try out 8.10.5 or 8.11.x when they come out.

As an aside, do you have any information on how to enable memory profiling in filebeat to do analysis similar to what was done here, with the graphs: https://discuss.elastic.co/t/filebeat-pods-keeps-increasing-memory-usage/325124

I currently have support ticket case 01497623 to cover filebeat memory issues that I am having, and am currently going back and forth with the support agent. I'd like to make faster progress to root-cause and solve this problem.

Thanks.

@gizas
Copy link
Contributor

gizas commented Oct 23, 2023

As an aside, do you have any information on how to enable memory profiling in filebeat

Add in filebeat:

http:
      enabled: true
      pprof.enabled: true
      port: 5066

To get the heap dump I'm redirecting the HTTP endpoint to a local port and then run a curl:

# in one terminal session
kubectl port-forward pod/filebeat-hfpnw 8080:5066
# in another
curl -s -v http://localhost:8080/debug/pprof/heap > FILENAME

Then to find information:
Install tool in your pc: Pprof
Eg pprof -http=localhost:8081 ./HEAP/elastic-agent_.pprof will enable an interactive web page for playing around

@K4S1
Copy link

K4S1 commented Nov 17, 2023

I updated from 8.10.4 -> 8.11.0

Seeing multiple windows hosts with RAM issues.
image

Where Metricbeat uses an insane amount of RAM, some servers i found consume almost 10GB of RAM.
Had services crashed on me:
Windows successfully diagnosed a low virtual memory condition. The following programs consumed the most virtual memory: metricbeat.exe (6544) consumed 9795993600 bytes, and sqlservr.exe (3880) consumed 462864384 bytes.

Seems so far I have only seen this effecting windows.

Edit:
Adding information:
Found multiple runners unsure if this is relevant:
image

Issues reflecting ram use between actual use on server and what is reported to Elastic:
image

The more i dive the more hosts I find with this issue.

@gizas
Copy link
Contributor

gizas commented Dec 18, 2023

Closing this as for now the initial request has been addressed. Also relevant issues for Windows hosts are being addressed in other cases.

@gizas gizas closed this as completed Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

No branches or pull requests

8 participants