-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Redis and Kafka outputs to the full configs #1690
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -395,22 +395,206 @@ output.elasticsearch: | |
# Configure curve types for ECDHE based cipher suites | ||
#tls.curve_types: [] | ||
|
||
#------------------------------- Kafka output --------------------------------- | ||
#output.kafka: | ||
# The list of Kafka broker addresses to connect to. | ||
#hosts: ["localhost:9092"] | ||
|
||
# The Kafka topic used for produced events. If use_type is set to true, the | ||
# topic will not be used. | ||
#topic: beats | ||
|
||
# Set Kafka topic by event type. If use_type is false, the topic option must | ||
# be configured. The default is false. | ||
#use_type: false | ||
|
||
# The number of concurrent load-balanced Kafka output workers. | ||
#worker: 1 | ||
|
||
# The number of times to retry publishing an event after a publishing failure. | ||
# After the specified number of retries, the events are typically dropped. | ||
# Some Beats, such as Filebeat, ignore the max_retries setting and retry until | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
^^ Is this config file only for Filebeat? If so, can then this section can be deleted? This might remove some confusion when working between different beats. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JarenGlover The problem here is that this general part is coming from the libbeat config file, means it is generated. Like this we make sure we don't have to keep multiple files up-to-date with the config option. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ruflin aha .. makes sense. thx There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JarenGlover There are also other options which could need beat specific optimizations. Perhaps we add in the future some logic to the generation to accomodate this. Thanks for the inputs. |
||
# all events are published. Set max_retries to a value less than 0 to retry | ||
# until all events are published. The default is 3. | ||
#max_retries: 3 | ||
|
||
# The maximum number of events to bulk in a single Kafka request. The default | ||
# is 2048. | ||
#bulk_max_size: 2048 | ||
|
||
# The number of seconds to wait for responses from the Kafka brokers before | ||
# timing out. The default is 30s. | ||
#timeout: 30s | ||
|
||
# The maximum duration a broker will wait for number of required ACKs. The | ||
# default is 10s. | ||
#broker_timeout: 10s | ||
|
||
# Per Kafka broker number of messages buffered in output pipeline. The default | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps: The number of messages buffered on each Kafka broker. The default is 256. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I updated it to "The number of messages buffered for each Kafka broker. The default is 256." to make clear that it's the Beat that buffers, not the Kafka brokers. @dedemorton, let me know if I made it worse. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ^^ this is better wording ... when I first read it, I was thinking the buffering happen on kafka's side not beat's. 💯 |
||
# is 256. | ||
#channel_buffer_size: 256 | ||
|
||
# The keep-alive period for an active network connection. If 0s, keep-alives | ||
# are disabled. The default is 0 seconds. | ||
#keep_alive: 0 | ||
|
||
# Sets the output compression codec. Must be one of none, snappy and gzip. The | ||
# default is snappy. | ||
#compression: snappy | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. default is gzip. You have to jump some (small) hoops to get snappy support in your kafka brokers. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, it was wrong in the docs then, I'll fix it there as well. |
||
|
||
# The maximum permitted size of JSON-encoded messages. Bigger messages will be | ||
# dropped. The default value is 1000000 (bytes). This value should be equal to | ||
# or less than the broker’s message.max.bytes. | ||
#max_message_bytes: 1000000 | ||
|
||
# The ACK reliability level required from broker. 0=no response, 1=wait for | ||
# local commit, -1=wait for all replicas to commit. The default is 1. Note: | ||
# If set to 0, no ACKs are returned by Kafka. Messages might be lost silently | ||
# on error. | ||
#required_acks: 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. default should be 1 |
||
|
||
# The number of seconds to wait for new events between two producer API calls. | ||
#flush_interval: 1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not related to this PR, but for durations in my opinion we should always include the unit, means this would be 1s (@urso). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Be super carefull about just changing durations in config files without testing. Quite some code has not been adjusted to time.Duration support in ucfg. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. flush_interval uses time.Duration and can be changes to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was hoping so as this is a new implementation :-) |
||
|
||
# The configurable ClientID used for logging, debugging, and auditing | ||
# purposes. The default is "beats". | ||
#client_id: beats | ||
|
||
# Optional TLS. By default is off. | ||
# List of root certificates for HTTPS server verifications | ||
#tls.certificate_authorities: ["/etc/pki/root/ca.pem"] | ||
|
||
# Certificate for TLS client authentication | ||
#tls.certificate: "/etc/pki/client/cert.pem" | ||
|
||
# Client Certificate Key | ||
#tls.certificate_key: "/etc/pki/client/cert.key" | ||
|
||
# Controls whether the client verifies server certificates and host name. | ||
# If insecure is set to true, all server host names and certificates will be | ||
# accepted. In this mode TLS based connections are susceptible to | ||
# man-in-the-middle attacks. Use only for testing. | ||
#tls.insecure: true | ||
|
||
# Configure cipher suites to be used for TLS connections | ||
#tls.cipher_suites: [] | ||
|
||
# Configure curve types for ECDHE based cipher suites | ||
#tls.curve_types: [] | ||
|
||
#------------------------------- Redis output --------------------------------- | ||
#output.redis: | ||
# The list of Redis servers to connect to. If load balancing is enabled, the | ||
# events are distributed to the servers in the list. If one server becomes | ||
# unreachable, the events are distributed to the reachable servers only. | ||
#hosts: ["localhost:6379"] | ||
|
||
# The Redis port to use if hosts does not contain a port number. The default | ||
# is 6379. | ||
#port: 6379 | ||
|
||
# The name of the Redis list or channel the events are published to. The | ||
# default is filebeat. | ||
#index: filebeat | ||
|
||
# The password to authenticate with. The default is no authentication. | ||
#password: | ||
|
||
# The Redis database number where the events are published. The default is 0. | ||
#db: 0 | ||
|
||
# The Redis data type to use for publishing events. If the data type is list, | ||
# the Redis RPUSH command is used. If the data type is channel, the Redis | ||
# PUBLISH command is used. The default value is list. | ||
#datetype: list | ||
|
||
# The Redis host to connect to when using topology map support. Topology map | ||
# support is disabled if this option is not set. | ||
#host_topology: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we still need this config option? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like topology handling is still a feature of the Redis output. We could consider deprecating it or marking it experimental or something like that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did we ever test if this still works? :-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm ok leaving it in for the moment, we can check this in a later PR. |
||
|
||
# The password to use for authenticating with the Redis topology server. The | ||
# default is no authentication. | ||
#password_topology: | ||
|
||
# The Redis database number where the topology information is stored. The | ||
# default is 1. | ||
#db_topology: 1 | ||
|
||
# The number of workers to use for each host configured to publish events to | ||
# Redis. Use this setting along with the loadbalance option. For example, if | ||
# you have 2 hosts and 3 workers, in total 6 workers are started (3 for each | ||
# host). | ||
#worker: 1 | ||
|
||
# If set to true and multiple hosts or workers are configured, the output | ||
# plugin load balances published events onto all Redis hosts. If set to false, | ||
# the output plugin sends all events to only one host (determined at random) | ||
# and will switch to another host if the currently selected one becomes | ||
# unreachable. The default value is true. | ||
#loadbalance: true | ||
|
||
# The Redis connection timeout in seconds. The default is 5 seconds. | ||
#timeout: 5s | ||
|
||
# The number of times to retry publishing an event after a publishing failure. | ||
# After the specified number of retries, the events are typically dropped. | ||
# Some Beats, such as Filebeat, ignore the max_retries setting and retry until | ||
# all events are published. Set max_retries to a value less than 0 to retry | ||
# until all events are published. The default is 3. | ||
#max_retries: 3 | ||
|
||
# The maximum number of events to bulk in a single Redis request or pipeline. | ||
# The default is 2048. | ||
#bulk_max_size: 2048 | ||
|
||
# The URL of the SOCKS5 proxy to use when connecting to the Redis servers. The | ||
# value must be a URL with a scheme of socks5://. | ||
#proxy_url: | ||
|
||
# This option determines whether Redis hostnames are resolved locally when | ||
# using a proxy. The default value is false, which means that name resolution | ||
# occurs on the proxy server. | ||
#proxy_use_local_resolver: false | ||
|
||
# Optional TLS. By default is off. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps better: Optional TLS configuration options. TLS is off by default. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe add: redis itself does not support TLS. If deployed behind TLS endpoint (e.g. stunnel), TLS can be configured directly from within beats |
||
# List of root certificates for HTTPS server verifications | ||
#tls.certificate_authorities: ["/etc/pki/root/ca.pem"] | ||
|
||
# Certificate for TLS client authentication | ||
#tls.certificate: "/etc/pki/client/cert.pem" | ||
|
||
# Client Certificate Key | ||
#tls.certificate_key: "/etc/pki/client/cert.key" | ||
|
||
# Controls whether the client verifies server certificates and host name. | ||
# If insecure is set to true, all server host names and certificates will be | ||
# accepted. In this mode TLS based connections are susceptible to | ||
# man-in-the-middle attacks. Use only for testing. | ||
#tls.insecure: true | ||
|
||
# Configure cipher suites to be used for TLS connections | ||
#tls.cipher_suites: [] | ||
|
||
# Configure curve types for ECDHE based cipher suites | ||
#tls.curve_types: [] | ||
|
||
#------------------------------- File output ---------------------------------- | ||
#output.file: | ||
# Path to the directory where to save the generated files. The option is mandatory. | ||
# Path to the directory where to save the generated files. The option is | ||
# mandatory. | ||
#path: "/tmp/filebeat" | ||
|
||
# Name of the generated files. The default is `filebeat` and it generates files: `filebeat`, `filebeat.1`, `filebeat.2`, etc. | ||
# Name of the generated files. The default is `filebeat` and it generates | ||
# files: `filebeat`, `filebeat.1`, `filebeat.2`, etc. | ||
#filename: filebeat | ||
|
||
# Maximum size in kilobytes of each file. When this size is reached, the files are | ||
# rotated. The default value is 10240 kB. | ||
# Maximum size in kilobytes of each file. When this size is reached, the files | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we add a note that the file is also rotated on restart? |
||
# are rotated. The default value is 10240 kB. | ||
#rotate_every_kb: 10000 | ||
|
||
# Maximum number of files under path. When this number of files is reached, the | ||
# oldest file is deleted and the rest are shifted from last to first. The default | ||
# is 7 files. | ||
# Maximum number of files under path. When this number of files is reached, | ||
# the oldest file is deleted and the rest are shifted from last to first. The | ||
# default is 7 files. | ||
#number_of_files: 7 | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr This list is only about fetching metadata, not about actual endpoints events are published too
To be more exact, it's the initial list of configured kafka brokers to fetch cluster metadata from in order to identify all kafka brokers and how topics/partitions are distributed between brokers. The actual connections to kafka brokers is made on advertised names being included in metadata.
Just giving an example this failing badly:
mykafka
localhost
hosts
being configured tomykafka:9092
.Having this setup, beat will connect to mykafka:9092 in order to fetch the metadata. The metadata state, there is one (and only one) kafka broker being available on localhost:9092 -> beats will fail to publish events due to no kafka instance running on
localhost:9200
.If possible, it's recommended to configure multiple brokers (3 or 5 for example), in case the broker is down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@urso Hmm, can you suggest an alternate explanation for the setting, please? I took this from the docs, but if this can cause issues we should clarify in both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing it to "The list of Kafka broker addresses from where to fetch the cluster metadata.". What do you say, @urso?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
How about "The list of Kafka broker addresses from where to fetch the cluster metadata. The cluster metadata contain the actual kafka brokers events are published to."
Should be enough (even for docs). People knowing kafka what's meant, and people having no clue about kafka will run into issues anyway.