Skip to content
This repository has been archived by the owner on Sep 21, 2023. It is now read-only.

Update the shipper to work with the elastic-agent #185

Merged
merged 31 commits into from
Feb 9, 2023

Conversation

fearful-symmetry
Copy link
Contributor

@fearful-symmetry fearful-symmetry commented Nov 17, 2022

What does this PR do?

This closes #161 and updates the shipper's V2 client code to work with the elastic-agent. There's a number of missing pieces:

Because of the above issues, this code is somewhat untested. I would like to expand the available integration tests, but for now, the code is testable, with the note that some beat events will not be sent to the shipper due to elastic/beats#34319.

How to Test

In order to test, you can run elastic-agent in standalone mode with an output config like this:

outputs:
  default:
    log_level: debug
    type: elasticsearch
    hosts: [https://127.0.0.1:9200]
    #api-key: "example-key"
    username: elastic
    password: changeme
    shipper:
      enabled: true

You may also need to place the shipper binary and shipper.spec.yml file under the components/ subdirectory.

Other open Questions:

  • How should we report errors from the shipper? Should an error in a given shipper component also mark the input as unhealthy?

@fearful-symmetry fearful-symmetry added enhancement New feature or request Team:Elastic-Agent Label for the Agent team v8.6.0 labels Nov 17, 2022
@fearful-symmetry fearful-symmetry self-assigned this Nov 17, 2022
@fearful-symmetry fearful-symmetry requested a review from a team as a code owner November 17, 2022 22:27
@fearful-symmetry fearful-symmetry requested review from faec and leehinman and removed request for a team November 17, 2022 22:27
@fearful-symmetry fearful-symmetry marked this pull request as draft November 17, 2022 22:27
@mergify
Copy link
Contributor

mergify bot commented Nov 17, 2022

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @fearful-symmetry? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 17, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-02-09T20:11:34.542+0000

  • Duration: 16 min 59 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@mergify
Copy link
Contributor

mergify bot commented Nov 28, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b setup-agent-config upstream/setup-agent-config
git merge upstream/main
git push upstream setup-agent-config

@cmacknz
Copy link
Member

cmacknz commented Nov 29, 2022

Here's the draft PR for the same changes in Fleet: elastic/kibana#145755

@fearful-symmetry fearful-symmetry marked this pull request as ready for review January 24, 2023 19:32
Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, still need to test this myself.

go.mod Outdated Show resolved Hide resolved
config/config.go Outdated Show resolved Hide resolved
config/config.go Outdated Show resolved Hide resolved
config/config.go Outdated Show resolved Hide resolved
config/config.go Show resolved Hide resolved
"github.com/elastic/elastic-agent-shipper/config"
)

// I am not net sure how this should work or what it should do, but we need to read from that error channel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should log the error, it's a way for the agent to report failures making gRPC calls to somewhere that can log them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also reading this I realized that Beats doesn't do this elastic/beats#34381

controller/run.go Outdated Show resolved Hide resolved
controller/run.go Outdated Show resolved Hide resolved
controller/server.go Outdated Show resolved Hide resolved
controller/server.go Outdated Show resolved Hide resolved
@fearful-symmetry
Copy link
Contributor Author

Okay, so, there's two ongoing issues related to this PR:

  • I'm currently unsure of how SetStrictMode will work. It should come from the input unit, but I have no way of actually verifying that, and this code currently has the assumption that we get it from the output unit, since that's where most of the config is coming from. I'll file an issue about this after the PR is merged, but for now I'm assuming that it will be need to be set later by a different subsystem.
  • There's tons code that will need to be changed once we address the issue with how we want split configs across different units. For now, there's definitely some awkwardness in how the GRPC endpoint is being started/stopped that I'm assuming will go away once we have a dedicated input unit.

@cmacknz
Copy link
Member

cmacknz commented Jan 30, 2023

I'm currently unsure of how SetStrictMode will work. It should come from the input unit, but I have no way of actually verifying that, and this code currently has the assumption that we get it from the output unit, since that's where most of the config is coming from.

SetStrictMode is a development only option. We can consider removing it if it is adding complexity, or perhaps only support it when the config is loaded from a file if that is possible.

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next round of comments, many of them about testing.

config/config.go Outdated Show resolved Hide resolved
config/config.go Show resolved Hide resolved
config/config.go Outdated Show resolved Hide resolved
controller/units.go Outdated Show resolved Hide resolved
controller/units.go Outdated Show resolved Hide resolved
@@ -31,15 +43,27 @@ func LoadAndRun() error {
}

// RunUnmanaged runs the shipper out of a local config file without using the control protocol.
func RunUnmanaged(cfg config.ShipperConfig) error {
func RunUnmanaged(cfg config.ShipperRootConfig) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any tests for running in unmanaged mode? What is enforcing that it keeps working?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, I've been deprioritizing it a bit, since it took me so long to just get the shipper talking to other components. Will see if I can add a test or two, at least...

controller/run.go Outdated Show resolved Hide resolved
grpcserver/inputhandler.go Show resolved Hide resolved
grpcserver/server.go Outdated Show resolved Hide resolved
@@ -408,3 +405,7 @@ func (p *publisherMock) TryPublish(event *messages.Event) (queue.EntryID, error)
func (p *publisherMock) PersistedIndex() (queue.EntryID, error) {
return p.persistedIndex, nil
}

func (p *publisherMock) IsInitialized() bool {
return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should probably be a test of the case where this is false.

@fearful-symmetry
Copy link
Contributor Author

fearful-symmetry commented Feb 2, 2023

Okay, updating this as I go, in case more people want to look at it between now and tomorrow. Biggest thing that still needs to be done is expanding the tests.

@fearful-symmetry
Copy link
Contributor Author

Okay, added a bunch of tests. @cmacknz going through and verifying agent behavior, then writing tests, is proving to be a bit of a laborious process, if we're worried about blocking the performance testing, we might want to decide what cases we want to be handled here, and what we want to add in a followup PR.

config/config.go Outdated Show resolved Hide resolved
config/config.go Outdated Show resolved Hide resolved
// figure out if we need to initialize the gRPC endpoint
// TODO: this is another thing that will change with https://github.com/elastic/elastic-agent-shipper/issues/225,
// As we'll have a dedicated unit for the input, and we won't be using random input unit updates to see if we need to update the gRPC endpoint.
c.startgRPC(unit, conn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could change over the lifetime of the shipper. If for example a CA or server certificate expires and needs to be updated. But I think in general, both the server side and output side should expect to be restarted independently.

@cmacknz
Copy link
Member

cmacknz commented Feb 6, 2023

@cmacknz going through and verifying agent behavior, then writing tests, is proving to be a bit of a laborious process, if we're worried about blocking the performance testing, we might want to decide what cases we want to be handled here, and what we want to add in a followup PR.

I would rather only do the performance tests once we are convinced this implementation is correct.

@fearful-symmetry
Copy link
Contributor Author

Alright, we should be covering all the scenarios we want in testing now. Going to do a few more tests against actual elastic-agent tomorrow, just to make sure I didn't miss anything.

@fearful-symmetry
Copy link
Contributor Author

Alright, tested with elastic-agent and flipping various units on and off. With all the bugs in the shipper output/ES components, actual end-to-end testing is sort of a non-starter, but most of the fixes with that are in-progress.

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of mostly minor comments, this is looking good.

config/config.go Show resolved Hide resolved
controller/controller_client.go Outdated Show resolved Hide resolved
controller/controller_client.go Outdated Show resolved Hide resolved
controller/controller_client.go Show resolved Hide resolved
controller/helpers.go Show resolved Hide resolved
controller/run.go Outdated Show resolved Hide resolved
controller/run.go Outdated Show resolved Hide resolved
controller/run.go Outdated Show resolved Hide resolved
controller/units.go Outdated Show resolved Hide resolved
controller/units.go Outdated Show resolved Hide resolved
@fearful-symmetry fearful-symmetry merged commit aa99fdd into elastic:main Feb 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request Team:Elastic-Agent Label for the Agent team v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Synchronize the shipper configuration model with what is implemented in the agent
5 participants