Retry support for fluent-bit plugin #2035
Labels
component/agent
keepalive
An issue or PR that will be kept alive and never marked as stale.
not-as-easy-as-it-looks
type/enhancement
Something existing could be improved
Is your feature request related to a problem? Please describe.
The fluent-bit plugin for Loki seems to always return the fluent-bit error status in case of problems. Instead, it should distinguish between recoverable and unrecoverable situations by returning either the retry or error status.
See also https://docs.fluentbit.io/manual/v/master/administration/scheduling-and-retries.
Having an unavailable Loki URL configured, the resulting logs indicate a drop of the log chunks. Instead, I would assume that a temporary downtime of Loki can be survived without dropping logs.
When configuring the fluent-bit forward plugin with an unknown URL, retries will happen for the log chunks dependent on the plugin configuration.
Describe the solution you'd like
Differentiate on recoverable errors and return the retry status instead, see also
loki/cmd/fluent-bit/out_loki.go
Line 129 in 4fd670d
Unavailability of an URL is recoverable, that at least seems to be the way how the official fluent-bit plugins have implemented it.
Describe alternatives you've considered
Additional context
I like to decide on my own how often a retry should happen before giving up by using the related configuration option
Retry_Limit
:The text was updated successfully, but these errors were encountered: