Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: consume APM configuration over the elastic-agent control protocol #12508

Closed
wants to merge 3 commits into from

Conversation

kruskall
Copy link
Member

Motivation/summary

Consume APM configuration over the elastic-agent control protocol. Pin elastic-agent-client version to 7.6.0 until beats is updated.

Create a new instrumentation package to retrieve the tracer dynamically. It might need some changes in go-docappender and other packages to allow updating the tracer.

Checklist

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

Testing is a nightmare and can't be easily performed. We can't run elastic-agent locally and the ea-managed apm-server system test is broken.
Current approach is to start the ea-managed smoke test, manually install elastic-agent, upload apm-server binary to replace the apm-server component and restart elastic-agent.

Related issues

@kruskall kruskall changed the title Feat/control protocol feat: consume APM configuration over the elastic-agent control protocol Jan 29, 2024
Copy link
Contributor

mergify bot commented Jan 29, 2024

This pull request does not have a backport label. Could you fix it @kruskall? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Jan 29, 2024
@cmacknz
Copy link
Member

cmacknz commented Jan 29, 2024

Testing is a nightmare and can't be easily performed. We can't run elastic-agent locally

Best solution to this is a VM, I like https://multipass.run/ especially if you are a Mac. Makes dealing with Ubuntu VMs as easy as containers.

Comment on lines +69 to +72
c, _, err := client.NewV2FromReader(os.Stdin, ver)
if err != nil {
return nil, fmt.Errorf("failed to create new v2 client: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In libbeat we do this in https://github.com/elastic/beats/blob/main/x-pack/libbeat/management/managerV2.go#L176-L183

		// Normal Elastic-Agent-Client initialisation
		agentClient, _, err = client.NewV2FromReader(os.Stdin, client.VersionInfo{
			Name:    "beat-v2-client",
			Version: version.GetDefaultVersion(),
			Meta: map[string]string{
				"commit":     version.Commit(),
				"build_time": version.BuildTime().String(),
			},

This is called during initialization from https://github.com/elastic/beats/blob/c5e79a25d05d5bdfa9da4d187fe89523faa42afc/libbeat/cmd/instance/beat.go#L837-L841

	// initialize config manager
	b.Manager, err = management.NewManager(b.Config.Management, reload.RegisterV2)
	if err != nil {
		return err
	}

In APM Server I see this is already called in

// Initialize central config manager.
manager, err := management.NewManager(b.Config.Management, reload.RegisterV2)
if err != nil {
return err
}
b.Manager = manager

So what I suspect is happening is that you are reading an empty proto message from stdin because the one the agent wrote is already consumed. This is causing the Services field to be nil failing the check in https://github.com/elastic/elastic-agent-client/blob/17d2b257ff148cd86de88252359bef0f28df9ab8/pkg/client/reader.go#L86

The io.ReadAll must just be reading EOF if you aren't getting an error back, which seems weird but possible.

Regardless of the actual mechanism causing this to fail you are trying to initialize the client a second time which is likely the source of your problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! 🙇

@v1v
Copy link
Member

v1v commented Feb 9, 2024

run docs-build

@kruskall
Copy link
Member Author

This requires changes in beats, I'll close this for now

@kruskall kruskall closed this Jun 12, 2024
@kruskall kruskall deleted the feat/control-protocol branch June 12, 2024 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants