Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry support for fluent-bit plugin #2035

Closed
a-thaler opened this issue May 4, 2020 · 2 comments
Closed

Retry support for fluent-bit plugin #2035

a-thaler opened this issue May 4, 2020 · 2 comments
Labels
component/agent keepalive An issue or PR that will be kept alive and never marked as stale. not-as-easy-as-it-looks type/enhancement Something existing could be improved

Comments

@a-thaler
Copy link

a-thaler commented May 4, 2020

Is your feature request related to a problem? Please describe.

The fluent-bit plugin for Loki seems to always return the fluent-bit error status in case of problems. Instead, it should distinguish between recoverable and unrecoverable situations by returning either the retry or error status.
See also https://docs.fluentbit.io/manual/v/master/administration/scheduling-and-retries.

Having an unavailable Loki URL configured, the resulting logs indicate a drop of the log chunks. Instead, I would assume that a temporary downtime of Loki can be survived without dropping logs.

level=warn caller=client.go:241 id=0 component=client host=ogging-loki:3100 msg="error sending batch, will retry" status=-1 error="Post http://ogging-loki:3100/api/prom/push: dial tcp: lookup ogging-loki on 100.64.0.10:53: no such host"
level=error caller=client.go:246 id=0 component=client host=ogging-loki:3100 msg="final error sending batch" status=-1 error="Post http://ogging-loki:3100/api/prom/push: dial tcp: lookup ogging-loki on 100.64.0.10:53: no such host"

When configuring the fluent-bit forward plugin with an unknown URL, retries will happen for the log chunks dependent on the plugin configuration.

Describe the solution you'd like
Differentiate on recoverable errors and return the retry status instead, see also

// output.FLB_RETRY = retry to flush later.

Unavailability of an URL is recoverable, that at least seems to be the way how the official fluent-bit plugins have implemented it.

Describe alternatives you've considered

Additional context
I like to decide on my own how often a retry should happen before giving up by using the related configuration option Retry_Limit:

    [Output]
        Name             loki
        Match             loki.*
        Url                  http://ogging-loki:3100/api/prom/push
        Retry_Limit     False
        
@stale
Copy link

stale bot commented Jun 4, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Jun 4, 2020
@cyriltovena cyriltovena added the keepalive An issue or PR that will be kept alive and never marked as stale. label Jun 8, 2020
@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Jun 8, 2020
@cyriltovena
Copy link
Contributor

Native plugin is being built fluent/fluent-bit#994 we'll defer to it and stop supporting the loki plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/agent keepalive An issue or PR that will be kept alive and never marked as stale. not-as-easy-as-it-looks type/enhancement Something existing could be improved
Projects
None yet
Development

No branches or pull requests

2 participants