Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support elastic-agent-shipper in diskqueue #32258

Merged
merged 12 commits into from
Jul 20, 2022

Conversation

leehinman
Copy link
Contributor

@leehinman leehinman commented Jul 7, 2022

What does this PR do?

Adds support for elastic-agent-shipper as a client of diskqueue. Specifically

  • support messages.Event as well as pubplisher.Event
  • support serialization of Protobuf as well as CBOR

Why is it important?

Necessary for elastic-agent-shipper to use diskqueue

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

go test
go test -bench=100k -benchtime 1x -count 10 -timeout 600m -benchmem | tee  results.txt && benchstat results.txt

Related issues

@leehinman leehinman requested a review from a team as a code owner July 7, 2022 17:41
@leehinman leehinman requested review from cmacknz and fearful-symmetry and removed request for a team July 7, 2022 17:41
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 7, 2022
@leehinman leehinman requested a review from faec July 7, 2022 17:41
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jul 7, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-07-20T13:58:30.802+0000

  • Duration: 93 min 6 sec

Test stats 🧪

Test Results
Failed 0
Passed 22480
Skipped 1937
Total 24417

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@leehinman leehinman force-pushed the shipper_diskqueue branch from f4910f9 to d84e9d3 Compare July 8, 2022 15:44
@leehinman leehinman force-pushed the shipper_diskqueue branch from d84e9d3 to 0c384c8 Compare July 8, 2022 16:59
@leehinman leehinman added enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Jul 8, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 8, 2022
@jlind23 jlind23 requested review from belimawr and rdner July 11, 2022 06:55
libbeat/publisher/queue/diskqueue/benchmark_test.go Outdated Show resolved Hide resolved
libbeat/publisher/queue/diskqueue/benchmark_test.go Outdated Show resolved Hide resolved
libbeat/publisher/queue/diskqueue/serialize.go Outdated Show resolved Hide resolved
libbeat/publisher/queue/diskqueue/serialize_test.go Outdated Show resolved Hide resolved
libbeat/publisher/queue/diskqueue/serialize_test.go Outdated Show resolved Hide resolved
libbeat/publisher/queue/diskqueue/serialize_test.go Outdated Show resolved Hide resolved
@cmacknz
Copy link
Member

cmacknz commented Jul 11, 2022

I tried running go test -bench=1M -benchtime 1x -count 10 -timeout 600m -benchmem > results.txt && benchstat results.txt locally and it never seems to complete, or at least it seems to take longer than I'm willing to wait before start to believe it won't finish.

Is this expected? How long does generating the benchmarks usually take?

@leehinman
Copy link
Contributor Author

I tried running go test -bench=1M -benchtime 1x -count 10 -timeout 600m -benchmem > results.txt && benchstat results.txt locally and it never seems to complete, or at least it seems to take longer than I'm willing to wait before start to believe it won't finish.

Is this expected? How long does generating the benchmarks usually take?

On my MBP it takes ~40 seconds before I see any output. If I add -v it shows that go test (no benchmarks) gets run 10 times before it runs the benchmark tests and that accounts for the ~40 seconds.

On my machine it takes ~40 minutes for the command line you used to run.

I can look at reducing number of events, one thousand events was too small, it gives variances of 20%. One million has low variance 0-2% but it does take a long time.

@faec
Copy link
Contributor

faec commented Jul 19, 2022

I had initially thought of implementing this by supporting configurable serialization as in elastic/elastic-agent-shipper#41, so the shipper could just add a callback for whatever we end up needing... but I think your way is good for now, since full generality would raise more complicated questions and we might not end up needing it.

One question / concern about this approach: I still have the hope that (once all the glue is in place) our Publish RPC will be able to hand off still-encoded protobuf events to the queue, so that we only ever need to decode events once before they reach the output. Could this approach be extended to support that? (Not necessarily in this PR, I just want to make sure we could adopt this approach in the future, hopefully without needing to change the schema etc.) I could imagine the queue accepting already-encoded data with a configuration flag (that wouldn't need to touch the schema or segment files), so I think the answer is yes, I just wanted to raise the issue to make sure incidental choices don't accidentally prevent it.

@cmacknz cmacknz requested a review from belimawr July 19, 2022 17:15
@leehinman
Copy link
Contributor Author

One question / concern about this approach: I still have the hope that (once all the glue is in place) our Publish RPC will be able to hand off still-encoded protobuf events to the queue, so that we only ever need to decode events once before they reach the output. Could this approach be extended to support that? (Not necessarily in this PR, I just want to make sure we could adopt this approach in the future, hopefully without needing to change the schema etc.) I could imagine the queue accepting already-encoded data with a configuration flag (that wouldn't need to touch the schema or segment files), so I think the answer is yes, I just wanted to raise the issue to make sure incidental choices don't accidentally prevent it.

Haven't done it yet, but I think extending to support already encoded events would be straightforward. I think we would need:

  • New serialization format of SerializationNone
  • Add new flag for segment options ENABLE_SERIALIZATION_NONE
  • add case statement to type []byte in encode function, it would be a no op
  • add case statement for SerializationNone to Decode function, it would be a no op (assuming output takes []bytes)

@belimawr
Copy link
Contributor

I tried running go test -bench=1M -benchtime 1x -count 10 -timeout 600m -benchmem > results.txt && benchstat results.txt locally and it never seems to complete, or at least it seems to take longer than I'm willing to wait before start to believe it won't finish.
Is this expected? How long does generating the benchmarks usually take?

On my MBP it takes ~40 seconds before I see any output. If I add -v it shows that go test (no benchmarks) gets run 10 times before it runs the benchmark tests and that accounts for the ~40 seconds.

On my machine it takes ~40 minutes for the command line you used to run.

I can look at reducing number of events, one thousand events was too small, it gives variances of 20%. One million has low variance 0-2% but it does take a long time.

100k events run quite fast o my X1 Extreme.

ok      github.com/elastic/beats/v7/libbeat/publisher/queue/diskqueue   73.525s

One little thing, the command to run the benchmark in the PR description needs to be updated to match the 100k functions. We can also use tee so we can see the output in the console as well as generate the file.

 go test -bench=100k -benchtime 1x -count 10 -timeout 600m -benchmem | tee results.txt && benchstat results.txt

Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come minor details about the tests.

libbeat/publisher/queue/diskqueue/serialize_test.go Outdated Show resolved Hide resolved
libbeat/publisher/queue/diskqueue/serialize_test.go Outdated Show resolved Hide resolved
@leehinman
Copy link
Contributor Author

100k events run quite fast o my X1 Extreme.

ok      github.com/elastic/beats/v7/libbeat/publisher/queue/diskqueue   73.525s

:-) newer CPU, faster SSD & and a more efficient I/O scheduler will do that. It was 471.825s on my MBP

One little thing, the command to run the benchmark in the PR description needs to be updated to match the 100k functions. We can also use tee so we can see the output in the console as well as generate the file.

added tee to comment in benchmark & updated PR comment too.

Copy link
Contributor

@faec faec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

@leehinman leehinman merged commit 781d2e2 into elastic:main Jul 20, 2022
@leehinman leehinman deleted the shipper_diskqueue branch July 20, 2022 19:23
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
* support elastic-agent-shipper in diskqueue

Co-authored-by: Tiago Queiroz <[email protected]>
Co-authored-by: Craig MacKenzie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants