Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a persistent cache for cloudfoundry metadata based on badger #20775

Merged
merged 18 commits into from
Oct 6, 2020

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Aug 25, 2020

What does this PR do?

Add a persistent cache on disk for Cloudfoundry metadata, based on badger.

Why is it important?

Improve restart of beats when monitoring very big Cloudfoundry deployments.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • Look for some way to avoid so much json unmarshalling.
  • Cache should be closed at some moment (to be continued in Fix leaks with metadata processors #16349).
  • Add settings to make it optional?
  • Use different store names for different Cloudfoundry deployments, to support monitoring multiple clusters from the same Beat (maybe not needed because it is difficult to have uuid collisions).

How to test this PR locally

  • Use add_cloudfoundry_metadata with Filebeat or Metricbeat.
  • Restart the Beat.
  • The Beat should reuse information on disk to enrich events without querying cloudfoundry API.
    A way to confirm it is to check the http connections done with a reverse proxy like mitmproxy.

Related issues

@jsoriano jsoriano added Team:Services (Deprecated) Label for the former Integrations-Services team Team:Platforms Label for the Integrations - Platforms team labels Aug 25, 2020
@jsoriano jsoriano self-assigned this Aug 25, 2020
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Aug 25, 2020
@jsoriano jsoriano force-pushed the cf-local-cache-badger branch from fd5d6a1 to 32ef2e5 Compare August 25, 2020 16:08
@elasticmachine
Copy link
Collaborator

elasticmachine commented Aug 25, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [jsoriano commented: jenkins run the tests please]

  • Start Time: 2020-10-05T22:50:54.940+0000

  • Duration: 78 min 48 sec

Test stats 🧪

Test Results
Failed 0
Passed 16271
Skipped 1343
Total 17614

@jsoriano jsoriano force-pushed the cf-local-cache-badger branch 2 times, most recently from 099abcf to 1d361e9 Compare August 26, 2020 13:47
@jsoriano jsoriano force-pushed the cf-local-cache-badger branch 2 times, most recently from a7ab5d3 to 70b9040 Compare August 31, 2020 17:32
@jsoriano jsoriano force-pushed the cf-local-cache-badger branch 4 times, most recently from 2d98602 to 18339c2 Compare September 18, 2020 19:14
@jsoriano
Copy link
Member Author

jenkins run the tests please

@jsoriano jsoriano force-pushed the cf-local-cache-badger branch 2 times, most recently from 1e7b0f4 to 124a931 Compare September 21, 2020 15:26
@jsoriano
Copy link
Member Author

jenkins run the tests please

@jsoriano jsoriano force-pushed the cf-local-cache-badger branch from 124a931 to 701fca2 Compare September 24, 2020 08:38
@jsoriano
Copy link
Member Author

@urso @exekias I am going to open this for review as I would like to reach 7.10, it'd be nice if you could take a look.

There are still some "controversial" points, but I think they can be considered non-blockers:

  • Database is not being properly closed, because we don't have a way to finalize processors. This is a more general problem (see Fix leaks with metadata processors #16349). After many tests I haven't seen it causing problems that require some manual operation. Main problem here is that one may lose some entries written during beat shutdown, but I don't think this is such a big problem for this kind of cache. So I think we can go on as is by now, and continue with Fix leaks with metadata processors #16349 as a later fix.
  • Caches are requested using package-level methods, and references are kept in a global registry, this is done to allow sharing the same cache, what is needed if multiple features use it. It is done at the package level to avoid adding some additional object in the main beater that needs to be passed to processors. There are methods to instantiate additional registries, this could be used for other uses or future refactors.
  • CBOR encoding is done at the end with github.com/ugorji/go/codec, that is also a dependency of badger. I found problems with the codec in github.com/elastic/go-structform. In any case the code for this is quite isolated, so we can refactor it later to use any other library if needed.

Let me know if you prefer this split in two PRs, one for the cache and another one for the changes for Cloud Foundry.

@jsoriano jsoriano marked this pull request as ready for review September 24, 2020 09:22
@jsoriano jsoriano added needs_backport PR is waiting to be backported to other branches. v7.10.0 labels Oct 5, 2020
@exekias
Copy link
Contributor

exekias commented Oct 5, 2020

LGTM, besides lint issues which need some attention

@jsoriano
Copy link
Member Author

jsoriano commented Oct 5, 2020

jenkins run the tests please

@jsoriano jsoriano merged commit 76905a2 into elastic:master Oct 6, 2020
@jsoriano jsoriano deleted the cf-local-cache-badger branch October 6, 2020 08:41
jsoriano added a commit to jsoriano/beats that referenced this pull request Oct 6, 2020
…stic#20775)

Cache on disk is used by add_cloudfoundry_metadata.
Cache is written into the beats data directory. Objects in cache
are serialized using CBOR encoding.
Badger DB is added as dependency.

(cherry picked from commit 76905a2)
@jsoriano jsoriano added test-plan Add this PR to be manual test plan and removed needs_backport PR is waiting to be backported to other branches. labels Oct 6, 2020
jsoriano added a commit that referenced this pull request Oct 6, 2020
…tadata based on badger (#21551)

Cache on disk is used by add_cloudfoundry_metadata.
Cache is written into the beats data directory. Objects in cache
are serialized using CBOR encoding.
Badger DB is added as dependency.

(cherry picked from commit 76905a2)
v1v added a commit to v1v/beats that referenced this pull request Oct 6, 2020
* upstream/master:
  [CI] Setup git config globally (elastic#21562)
  docs: update generate_fields_docs.py (elastic#21359)
  Add support for additional fields from V2 ALB logs (elastic#21540)
  Move Prometheus query & remote_write to GA (elastic#21507)
  feat: add a new step to run the e2e tests for certain parts of Beats (elastic#21100)
  [Elastic Agent] Add elastic agent ID and version to events from filebeat and metricbeat. (elastic#21543)
  Release cloudfoundry input and processor as GA (elastic#21525)
  [Packetbeat] New SIP protocol (elastic#21221)
  [Filebeat][New Module] Add support for Microsoft MTP / 365 Defender (elastic#21446)
  [Beats][pytest] Asserting if filebeat logs include errors (elastic#20999)
  junipersrx-module initial release (elastic#20017)
  Add a persistent cache for cloudfoundry metadata based on badger (elastic#20775)
  Add missing changelog entry for cisco umbrella (elastic#21550)
  [Elastic Agent] Add upgrade CLI to initiate upgrade of Agent locally (elastic#21425)
  Enable filestream input (elastic#21533)
  Add filestream input reader (elastic#21481)
  [CI] fix 'no matches found within 10000' (elastic#21466)
  Fix billing.go aws.GetStartTimeEndTime (elastic#21531)
@andresrc andresrc added the test-plan-added This PR has been added to the test plan label Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Platforms Label for the Integrations - Platforms team Team:Services (Deprecated) Label for the former Integrations-Services team test-plan Add this PR to be manual test plan test-plan-added This PR has been added to the test plan v7.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Cache metadata to local disk/db
4 participants