Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #23722 to 7.10: Fix leak caused by input runners created when checking their configuration #23805

Merged
merged 1 commit into from
Feb 5, 2021

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Feb 2, 2021

Cherry-pick of PR #23722 to 7.10 branch. Original message:

What does this PR do?

Stop input v1 runners created to check config.

CheckConfig for v1 inputs actually calls the constructors of the inputs. In some cases, as in the log input, the constructor creates resources that are never released unless the runner is stopped. This causes goroutines leaks with autodiscover or other dynamic configurations.

As discovered in #23658, this started to cause leaks since 7.8.0 (specifically since #16715), but I am not sure why this was not an issue before as config checkers were moved there but not really changed. Perhaps before this change input configs were not actually checked.

Considered alternatives

  • Avoid creating goroutines in v1 input builders. I may give a try to this option, but the problem I see is that Start() Run() doesn't return errors, and it may be too late to do some checks expected now in the builder.
  • Add a runner builder that creates a runner using fake contexts and connectors that can be controlled from CheckConfig. Not sure if this would be very different at the end, and we would need to rely on inputs releasing their resources if the context is done.
  • Migrate all inputs to v2. Long term is the best solution, but it can be a big effort to be done now, and we would still need a fix to backport to released branches.
  • Do nothing on CheckConfig for inputs. A feature would be lost.

Why is it important?

Avoid goroutines and other possible leaks with autodiscover in Filebeat.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works.
    • Not easy to add a test for this. It needs at least the module loader, the autodiscover process, and a v1 input. Doing it as system test would be flaky because there are several other goroutines.
    • Added tests to check that v1 inputs don't leak goroutines if their context is stopped after being created.
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • Confirm that all v1 inputs can be stopped without being started before.

How to test this PR locally

  • Start filebeat with docker (or kubernetes) autodiscover and -httpprof :6060.
  • Collect a base profile (curl http://localhost:6060/debug/pprof/goroutine -o /tmp/goroutines-base.prof).
  • Start a container, and check that filebeat starts collecting its logs.
  • Check with pprof that new goroutines have been started
    • go tool pprof -top -base /tmp/goroutines-base.prof http://localhost:6060/debug/pprof/goroutine
    • Look for goroutines like CloseOnSignal or SubOutlets.
  • Stop the container.
  • Confirm that eventually (~after cleanup_timeout) the started goroutines are not in pprof anymore.

Related issues

@jsoriano jsoriano added [zube]: In Review backport Team:Integrations Label for the Integrations team labels Feb 2, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Feb 2, 2021
…ation (elastic#23722)

Stop input v1 runners created to check config.

`CheckConfig` for v1 inputs actually calls the constructors of the inputs.
In some cases, as in the log input, the constructor creates resources that
are never released unless the runner is stopped. This causes goroutines
leaks with autodiscover or other dynamic configurations.

(cherry picked from commit e6bb5c9)
@jsoriano jsoriano force-pushed the backport_23722_7.10 branch from 5ad1c5b to 297e455 Compare February 2, 2021 12:10
@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 2, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: jsoriano commented: jenkins run the tests please

    • Start Time: 2021-02-03T12:39:50.249+0000
  • Duration: 48 min 32 sec

  • Commit: 297e455

Test stats 🧪

Test Results
Failed 0
Passed 11128
Skipped 2004
Total 13132

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 11128
Skipped 2004
Total 13132

@jsoriano
Copy link
Member Author

jsoriano commented Feb 3, 2021

jenkins run the tests please

@jsoriano jsoriano merged commit 85fe534 into elastic:7.10 Feb 5, 2021
@jsoriano jsoriano deleted the backport_23722_7.10 branch February 5, 2021 11:42
@zube zube bot removed the [zube]: Done label May 6, 2021
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…ation (elastic#23722) (elastic#23805)

Stop input v1 runners created to check config.

`CheckConfig` for v1 inputs actually calls the constructors of the inputs.
In some cases, as in the log input, the constructor creates resources that
are never released unless the runner is stopped. This causes goroutines
leaks with autodiscover or other dynamic configurations.

(cherry picked from commit 488e020)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants