-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Docker based) Elastic Agent's Metricbeat / Filebeat cannot connect to ES #20759
Comments
Pinging @elastic/ingest-management (Team:Ingest Management) |
@mdelapenya If you log into the ingest manager can you add the raw yaml of the agent to this issue?
@jfsiii and @michalpristas Can you take a look? |
Its using the default ootb config. we can check the (global to Ingest) ‘settings’ during the test to confirm what is set, but I’m unsure of what is exactly right in this Docker Containerized setup… GET the Kibana setting: we may need to explicitly set that in the test. to set new values do a PUT to we can run it to get the values. @mdelapenya |
here are some screenshots of the Ingest UI: and a cat of the Fleet.yml file from the agent host in the test:
and here is the /etc/elastic-agent/elastic-agent.yml
and here is the reference.yml:
|
From PH's assessment, it looks like it is not option 3. There is no output present in the .yml so the problem is either in the test still (running too fast, maybe Fleet isn't finished setting up?), or on the Fleet side in product code. |
so, added a 'sleep' after it posts the enroll command, and I ran the test and when it was paused I force-unenrolled the agent that was attempted. I then just for reference: Is this at all somehow relating to the certs problem we've seen in 7.9 usage? Cause I was curious of commit state, the Agent is 7.9 GA and Kibana is at this commit, which I guess looks right (for 7.9.x branch) as seen from the /status api call:
|
fyi - don't forget to checkout the '7.9.x' branch when researching, and then set: and then: then I tried to run the 7.9.X branch from cloud and its a different commit that's 11 days later of Kibana, so I don't know if we should try testing / validating on master too? To that end, I tried to pull master and ended up getting 7.9 code pulled, @mdelapenya so I'm not sure if the 'master' line of the e2e-testing repo is set right or if its me. I have the export variable set an it still pulled some 7.9 versions so... we can probably focus there until we get the framework figured out. |
Hey @EricDavisX, here they are the values:
{"success":true,"item":{"id":"f10dbc30-e69b-11ea-84b5-835bc3e12d87","agent_auto_upgrade":true,"package_auto_upgrade":true,"kibana_url":"http://kibana:5601"}}
{"items":[{"id":"e3816ffb-aa96-4571-a9c3-89f5149bb599","name":"default","is_default":true,"type":"elasticsearch","hosts":["http://elasticsearch:9200"]}],"page":1,"perPage":1000,"total":1,"success":true} It seems the ES outputs are properly set in the configuration. |
I can confirm that, for 8.0.0-SNAPSHOT, the agent receives this configuration (without outputs): # ================================ General =====================================
# Beats is configured under Fleet, you can define most settings
# from the Kibana UI. You can update this file to configure the settings that
# are not supported by Fleet.
fleet:
enabled: true
# agent.download:
# # source of the artifacts, requires elastic like structure and naming of the binaries
# # e.g /windows-x86.zip
# sourceURI: "https://artifacts.elastic.co/downloads/beats/"
# # path to the directory containing downloaded packages
# target_directory: "${path.data}/downloads"
# # timeout for downloading package
# timeout: 30s
# # file path to a public key used for verifying downloaded artifacts
# # if not file is present Elastic Agent will try to load public key from elastic.co website.
# pgpfile: "${path.data}/elastic.pgp"
# # install_path describes the location of installed packages/programs. It is also used
# # for reading program specifications.
# install_path: "${path.data}/install"
# agent.process:
# # minimal port number for spawned processes
# min_port: 10000
# # maximum port number for spawned processes
# max_port: 30000
# # timeout for creating new processes. when process is not successfully created by this timeout
# # start operation is considered a failure
# spawn_timeout: 30s
# agent.retry:
# # enabled determines whether retry is possible. Default is false.
# enabled: true
# # retries_count specifies number of retries. Default is 3.
# # Retry count of 1 means it will be retried one time after one failure.
# retries_count: 3
# # delay specifies delay in ms between retries. Default is 30s
# delay: 30s
# # max_delay specifies maximum delay in ms between retries. Default is 300s
# max_delay: 5m
# # Exponential determines whether delay is treated as exponential.
# # With 30s delay and 3 retries: 30, 60, 120s
# # Default is false
# exponential: false I'm going to perform these check:
and
|
Use case 1: 8.0.0-snapshot stack + 8.0.0-snapshot agent
Run tests for the enroll scenario$ git checkout master
$ OP_LOG_LEVEL=DEBUG godog -t "fleet_mode && enroll"
Use case 2: 7.9.0 stack + 7.9.0 agent
Run tests for the enroll scenario$ git checkout 7.9.x
$ OP_LOG_LEVEL=DEBUG godog -t "fleet_mode && enroll"
Use case 3: 8.0.0-snapshot stack + 7.9.0 agent
Run tests for the enroll scenario$ git checkout master
$ OP_LOG_LEVEL=DEBUG ELASTIC_AGENT_VERSION=7.9.0 godog -t "fleet_mode && enroll" elastic-agent.yml# ================================ General =====================================
# Beats is configured under Fleet, you can define most settings
# from the Kibana UI. You can update this file to configure the settings that
# are not supported by Fleet.
fleet:
enabled: true
# agent.download:
# # source of the artifacts, requires elastic like structure and naming of the binaries
# # e.g /windows-x86.zip
# sourceURI: "https://artifacts.elastic.co/downloads/beats/"
# # path to the directory containing downloaded packages
# target_directory: "${path.data}/downloads"
# # timeout for downloading package
# timeout: 30s
# # file path to a public key used for verifying downloaded artifacts
# # if not file is present Elastic Agent will try to load public key from elastic.co website.
# pgpfile: "${path.data}/elastic.pgp"
# # install_path describes the location of installed packages/programs. It is also used
# # for reading program specifications.
# install_path: "${path.data}/install"
# agent.process:
# # minimal port number for spawned processes
# min_port: 10000
# # maximum port number for spawned processes
# max_port: 30000
# # timeout for creating new processes. when process is not successfully created by this timeout
# # start operation is considered a failure
# spawn_timeout: 30s
# agent.retry:
# # enabled determines whether retry is possible. Default is false.
# enabled: true
# # retries_count specifies number of retries. Default is 3.
# # Retry count of 1 means it will be retried one time after one failure.
# retries_count: 3
# # delay specifies delay in ms between retries. Default is 30s
# delay: 30s
# # max_delay specifies maximum delay in ms between retries. Default is 300s
# max_delay: 5m
# # Exponential determines whether delay is treated as exponential.
# # With 30s delay and 3 retries: 30, 60, 120s
# # Default is false
# exponential: false fleet.ymlagent:
id: 953f1443-8490-43d1-ac1d-0f714d919071
fleet:
enabled: true
access_api_key: ZjF0ZEpIUUIxcTdrQWZmQS00U0U6MFVQWDliX0RTdHE4czcyYTJBSjhxdw==
kibana:
protocol: http
host: kibana:5601
timeout: 1m30s
ssl:
verification_mode: none
renegotiation: never
reporting:
threshold: 10000
check_frequency_sec: 30
agent:
id: "" Use case 4: 7.9.0 stack + 8.0.0-snapshot agent
Run tests for the enroll scenario$ git checkout 7.9.x
$ OP_LOG_LEVEL=DEBUG ELASTIC_AGENT_VERSION=8.0.0-SNAPSHOT godog -t "fleet_mode && enroll" The enrollment process fails because the version is not compatible:
elastic-agent.ymlNot configured by Fleet ###################### Agent Configuration Example #########################
# This file is an example configuration file highlighting only the most common
# options. The elastic-agent.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
######################################
# Fleet configuration
######################################
outputs:
default:
type: elasticsearch
hosts: [127.0.0.1:9200]
username: elastic
password: changeme
inputs:
- type: system/metrics
# The only two requirement are that it has only characters allowed in an Elasticsearch index name
# Index names must meet the following criteria:
# Lowercase only
# Cannot include \, /, *, ?, ", <, >, |, ` ` (space character), ,, #
# Cannot start with -, _, +
# Cannot be . or ..
data_stream.namespace: default
use_output: default
streams:
- metricset: cpu
# The only two requirement are that it has only characters allowed in an Elasticsearch index name
# Index names must meet the following criteria:
# Lowercase only
# Cannot include \, /, *, ?, ", <, >, |, ` ` (space character), ,, #
# Cannot start with -, _, +
# Cannot be . or ..
data_stream.dataset: system.cpu
- metricset: memory
data_stream.dataset: system.memory
- metricset: network
data_stream.dataset: system.network
- metricset: filesystem
data_stream.dataset: system.filesystem
# agent.monitoring:
# # enabled turns on monitoring of running processes
# enabled: true
# # enables log monitoring
# logs: true
# # enables metrics monitoring
# metrics: true
# # Allow fleet to reload his configuration locally on disk.
# # Notes: Only specific process configuration will be reloaded.
# agent.reload:
# # enabled configure the Elastic Agent to reload or not the local configuration.
# #
# # Default is true
# enabled: true
# # period define how frequent we should look for changes in the configuration.
# period: 10s
# management:
# # Mode of management, the Elastic Agent support two modes of operation:
# #
# # local: The Elastic Agent will expect to find the inputs configuration in the local file.
# #
# # Default is local.
# mode: "local"
# fleet:
# access_api_key: ""
# kibana:
# # kibana minimal configuration
# hosts: ["localhost:5601"]
# ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# # optional values
# #protocol: "https"
# #username: "elastic"
# #password: "changeme"
# #path: ""
# #ssl.verification_mode: full
# #ssl.supported_protocols: [TLSv1.0, TLSv1.1, TLSv1.2]
# #ssl.cipher_suites: []
# #ssl.curve_types: []
# reporting:
# # Reporting threshold indicates how many events should be kept in-memory before reporting them to fleet.
# #reporting_threshold: 10000
# # Frequency used to check the queue of events to be sent out to fleet.
# #reporting_check_frequency_sec: 30
# agent.download:
# # source of the artifacts, requires elastic like structure and naming of the binaries
# # e.g /windows-x86.zip
# sourceURI: "https://artifacts.elastic.co/downloads/beats/"
# # path to the directory containing downloaded packages
# target_directory: "${path.data}/downloads"
# # timeout for downloading package
# timeout: 30s
# # file path to a public key used for verifying downloaded artifacts
# # if not file is present agent will try to load public key from elastic.co website.
# pgpfile: "${path.data}/elastic.pgp"
# # install_path describes the location of installed packages/programs. It is also used
# # for reading program specifications.
# install_path: "${path.data}/install"
# agent.process:
# # timeout for creating new processes. when process is not successfully created by this timeout
# # start operation is considered a failure
# spawn_timeout: 30s
# # timeout for stopping processes. when process is not stopped by this timeout then the process.
# # is force killed
# stop_timeout: 30s
# agent.grpc:
# # listen address for the GRPC server that spawned processes connect back to.
# address: localhost
# # port for the GRPC server that spawned processes connect back to.
# port: 6789
# agent.retry:
# # Enabled determines whether retry is possible. Default is false.
# enabled: true
# # RetriesCount specifies number of retries. Default is 3.
# # Retry count of 1 means it will be retried one time after one failure.
# retriesCount: 3
# # Delay specifies delay in ms between retries. Default is 30s
# delay: 30s
# # MaxDelay specifies maximum delay in ms between retries. Default is 300s
# maxDelay: 5m
# # Exponential determines whether delay is treated as exponential.
# # With 30s delay and 3 retries: 30, 60, 120s
# # Default is false
# exponential: false
# Logging
# There are four options for the log output: file, stderr, syslog, eventlog
# The file output is the default.
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#agent.logging.level: info
# Enable debug output for selected components. To enable all selectors use ["*"]
# Other available selectors are "beat", "publish", "service"
# Multiple selectors can be chained.
#agent.logging.selectors: [ ]
# Send all logging output to stderr. The default is false.
agent.logging.to_stderr: true
# Send all logging output to syslog. The default is false.
#agent.logging.to_syslog: false
# Send all logging output to Windows Event Logs. The default is false.
#agent.logging.to_eventlog: false
# If enabled, Elastic-Agent periodically logs its internal metrics that have changed
# in the last period. For each metric that changed, the delta from the value at
# the beginning of the period is logged. Also, the total values for
# all non-zero internal metrics are logged on shutdown. The default is true.
#agent.logging.metrics.enabled: true
# The period after which to log the internal metrics. The default is 30s.
#agent.logging.metrics.period: 30s
# Logging to rotating files. Set logging.to_files to false to disable logging to
# files.
#agent.logging.to_files: true
#agent.logging.files:
# Configure the path where the logs are written. The default is the logs directory
# under the home path (the binary location).
#path: /var/log/elastic-agent
# The name of the files where the logs are written to.
#name: elastic-agent
# Configure log file size limit. If limit is reached, log file will be
# automatically rotated
#rotateeverybytes: 10485760 # = 10MB
# Number of rotated log files to keep. Oldest files will be deleted first.
#keepfiles: 7
# The permissions mask to apply when rotating log files. The default value is 0600.
# Must be a valid Unix-style file permissions mask expressed in octal notation.
#permissions: 0600
# Enable log file rotation on time intervals in addition to size-based rotation.
# Intervals must be at least 1s. Values of 1m, 1h, 24h, 7*24h, 30*24h, and 365*24h
# are boundary-aligned with minutes, hours, days, weeks, months, and years as
# reported by the local system clock. All other intervals are calculated from the
# Unix epoch. Defaults to disabled.
#interval: 0
# Rotate existing logs on startup rather than appending to the existing
# file. Defaults to true.
# rotateonstartup: true
# Set to true to log messages in JSON format.
#agent.logging.json: false
# Set to true, to log messages with minimal required Elastic Common Schema (ECS)
# information. Recommended to use in combination with `logging.json=true`
# Defaults to false.
#agent.logging.ecs: false fleet.ymlagent:
id: e941eade-2002-4b2f-a437-1e1bdb316d79 |
able to repro. first impression is that there is some race in setup, then agent starts metricbeat/filebeat... edit: |
As explained in elastic/e2e-testing#236, we found that the order to enable/enroll/start the agent is important. Closing as this issue, although reproducible, represents an use case not following the official instructions (https://www.elastic.co/guide/en/ingest-management/current/run-elastic-agent.html) on how to enrol the agent |
7.9 Beta 1 release
Operating System: Linux
Steps to Reproduce:
run the e2e-testing framework Ingest Manager 7.x (branch) tests
the tests work on 8.0 / master branch. which is confusing us currently.
Its possible this is a failure in the test to set up the configuration correctly, we can look at those steps with Manu (reporting on behalf of the e2e-testing framework that he and I and Ingest team are working). I want to get it in to track before losing it.
complaint:
I try to enroll an agent running 7.9.0, it never reaches the online status, it always stays into the Enrolling one
debugging so far:
Agent log snippet:
{"log.level":"debug","@timestamp":"2020-08-21T12:05:59.079Z","log.origin":{"file.name":"application/periodic.go","file.line":40},"message":"Failed to read configuration, error: could not emit configuration: fail to extract program configuration: invalid configuration missing outputs configuration: /go/src/github.com/elastic/beats/x-pack/elastic-agent/pkg/agent/program/program.go[123]: unknown error","ecs.version":"1.5.0"}
@mdelapenya about this comment:
we use the 'hostname' command in the container to retrieve the host name, and that value is reflected in the UI
From Manu: it seems that both metricbeat and filebeat are not able to connect to ES, as by default, the config file for the agent comes with 127.0.0.1:9200 hardcoded in the /etc/elastic-agent/elastic-agent.yml config file
I’m curious how this process is telling the beats how to connect to the ES because in 7.9, it uses 127.0.0.1
from what I can observe:
in 8.0.0-SNAPSHOT, Fleet overrides elastic-agent.yml config file, AND FB/MB are able to discover elasticsearch, which runs in http://elasticsearch:9200
in 7.9.0, Fleet overrides elastic-agent.yml config file, BUT FB/MB are not able to discover elasticsearch, as http://127.0.0.1 is used
Using lsof I can check that there are no out-of-the-box connections to the elasticsearch instance in the 7.9.x, but they do exist in 8.0.0-snapshot
The text was updated successfully, but these errors were encountered: