[AWS] Fix ec2 host.cpu.usage + include instance metadata and rate metrics for linked monitoring accounts #35717

kaiyan-sheng · 2023-06-08T01:48:43Z

What does this PR do?

This PR is fixing two bugs:

calculating host.cpu.usage metric for EC2: apparently it is looking for the wrong metric name. It should be aws.ec2.metrics.CPUUtilization.avg.
addHostFields and calculateRate functions are getting skipped when collecting metrics from linked accounts: because when adding metadata, we are making an additional API call to list all EC2 instances in that account and use that list/return value to add metadata. During this process, we only add metadata to instances from the monitoring account because DescribeInstances does not have access to linked source accounts. This case, we should move addHostFields and calculateRate functions to the beginning when adding meta data. This way, both monitoring account and linked source accounts instances will have host fields and rate. The actual metadata can be added later just for the instances in monitoring account.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Relates [AWS] Add support for collecting metrics from linked cloudwatch accounts using cross-account monitoring integrations#6253

mergify · 2023-06-08T01:49:18Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @kaiyan-sheng? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

elasticmachine · 2023-06-08T01:54:34Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2023-06-15T19:17:46.561+0000
Duration: 74 min 10 sec

Test stats 🧪

Test	Results
Failed	0
Passed	1490
Skipped	110
Total	1600

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

tommyers-elastic · 2023-06-08T08:27:06Z

thanks kaiyan! why does moving addHostFields and calculateRate to the top of the loop have an effect on linked account instances? the instances still wont appear in the list returned from getInstancesPerRegion?

x-pack/metricbeat/module/aws/cloudwatch/metadata/ec2/ec2.go

kaiyan-sheng · 2023-06-13T17:22:38Z

@tommyers-elastic WDYT about adding the value from dynamic label ${PROP('Period')} into aws.cloudwatch.period field? Is it worth keeping the value? I added it to help calculate the rate metrics for EC2.

mergify · 2023-06-14T10:13:40Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix_ec2_cpu upstream/fix_ec2_cpu
git merge upstream/main
git push upstream fix_ec2_cpu

tommyers-elastic · 2023-06-15T09:32:39Z

x-pack/metricbeat/module/aws/cloudwatch/metadata/ec2/ec2.go

+		}
+
+		// add instance ID from dimension value
+		dimInstanceID, _ := events[eventIdentifier].RootFields.GetValue("aws.dimensions.InstanceId")


does this always exist in the dimensions? should we check the error here just in case?

Yeah good point! Will add it!

I did some testing on this, when aws.dimensions.InstanceId doesn't exist, dimInstanceID returns as nil. Then events[eventIdentifier].RootFields.Put("cloud.instance.id", dimInstanceID) will just put nil as the value for cloud.instance.id which won't show up in the event.

😬 that wouldn't be ideal - i wonder if we could fall back to the instance ID we from the loop var?

I added the error check here to make sure only put value to cloud.instance.id when aws.dimensions.InstanceId exists. We can't use instance ID from the inner for loop var because that for loop is going through all the instances metadata from DescribeInstances API. We are trying to use the two for loops to match events with the dimension InstanceId to the same ID from DescribeInstances.

x-pack/metricbeat/module/aws/cloudwatch/metadata/ec2/ec2.go

tommyers-elastic · 2023-06-15T09:55:22Z

@kaiyan-sheng if the period isn't used for anything in kibana, i don't think we should add it. we can always add it later if we find a need.

since we now have period and account name as dyamic labels, do we have any idea if this adds any additional overhead to the getmetricdata calls?

kaiyan-sheng · 2023-06-15T20:55:48Z

since we now have period and account name as dyamic labels, do we have any idea if this adds any additional overhead to the getmetricdata calls?

@tommyers-elastic Good point! I made a bash script and ran the AWS GetMetricData CLI call several times for both with and without getting the dynamic labels. The average time spent on GetMetricData is about the same.

for example some data points look like this:

Average Time without dynamic labels: 1040 ms
Average Time with dynamic label accountID and period: 1033 ms

and

Average Time without dynamic labels: 1036 ms
Average Time with dynamic label accountID and period: 1043 ms

By looking at several sets of output, I think we can say adding the dynamic label doesn't add overhead on the API call.

…rics for linked monitoring accounts (#35717) (cherry picked from commit 36aa884)

…e metadata and rate metrics for linked monitoring accounts (#35816) * [AWS] Fix ec2 host.cpu.usage + include instance metadata and rate metrics for linked monitoring accounts (#35717) (cherry picked from commit 36aa884) --------- Co-authored-by: kaiyan-sheng <[email protected]>

…e metadata and rate metrics for linked monitoring accounts (#35817) * [AWS] Fix ec2 host.cpu.usage + include instance metadata and rate metrics for linked monitoring accounts (#35717) (cherry picked from commit 36aa884) Co-authored-by: kaiyan-sheng <[email protected]>

Fix ec2 host.cpu.usage

d7cce34

kaiyan-sheng requested a review from a team as a code owner June 8, 2023 01:48

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 8, 2023

mergify bot assigned kaiyan-sheng Jun 8, 2023

kaiyan-sheng added 2 commits June 7, 2023 19:49

add changelog

f4ccc45

Merge branch 'main' into fix_ec2_cpu

5ec74bd

add host fields and calculate rate before adding metadata

be216af

kaiyan-sheng added Team:Cloud-Monitoring Label for the Cloud Monitoring team backport-v8.8.0 Automated backport with mergify labels Jun 8, 2023

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 8, 2023

kaiyan-sheng added needs_team Indicates that the issue/PR needs a Team:* label backport-v8.7.0 Automated backport with mergify labels Jun 8, 2023

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 8, 2023

add comment

a680294

aspacca approved these changes Jun 8, 2023

View reviewed changes

tommyers-elastic reviewed Jun 8, 2023

View reviewed changes

x-pack/metricbeat/module/aws/cloudwatch/metadata/ec2/ec2.go Outdated Show resolved Hide resolved

kaiyan-sheng added 3 commits June 8, 2023 15:54

Make sure host/rate are added for all instances

ca08e73

remove debug print

f3d2ed9

add aws.cloudwatch.period

d8f99db

tommyers-elastic reviewed Jun 15, 2023

View reviewed changes

x-pack/metricbeat/module/aws/cloudwatch/metadata/ec2/ec2.go Show resolved Hide resolved

tommyers-elastic reviewed Jun 15, 2023

View reviewed changes

x-pack/metricbeat/module/aws/cloudwatch/metadata/ec2/ec2.go Outdated Show resolved Hide resolved

tommyers-elastic self-requested a review June 15, 2023 09:52

tommyers-elastic changed the title ~~[AWS] Fix ec2 host.cpu.usage~~ [AWS] Fix ec2 host.cpu.usage + include instance metadata and rate metrics for linked monitoring accounts Jun 15, 2023

kaiyan-sheng added 4 commits June 15, 2023 11:59

remove aws.cloudwatch.period

1ae2015

Merge branch 'main' into fix_ec2_cpu

87b6024

Merge branch 'main' into fix_ec2_cpu

0437b2f

run mage update

3887d11

kaiyan-sheng merged commit 36aa884 into elastic:main Jun 19, 2023

kaiyan-sheng deleted the fix_ec2_cpu branch June 19, 2023 14:16

mergify bot pushed a commit that referenced this pull request Jun 19, 2023

[AWS] Fix ec2 host.cpu.usage + include instance metadata and rate met…

1dc5cc2

…rics for linked monitoring accounts (#35717) (cherry picked from commit 36aa884)

mergify bot mentioned this pull request Jun 19, 2023

[8.7](backport #35717) [AWS] Fix ec2 host.cpu.usage + include instance metadata and rate metrics for linked monitoring accounts #35816

Merged

mergify bot pushed a commit that referenced this pull request Jun 19, 2023

[AWS] Fix ec2 host.cpu.usage + include instance metadata and rate met…

1cb21b6

…rics for linked monitoring accounts (#35717) (cherry picked from commit 36aa884)

mergify bot mentioned this pull request Jun 19, 2023

[8.8](backport #35717) [AWS] Fix ec2 host.cpu.usage + include instance metadata and rate metrics for linked monitoring accounts #35817

Merged

reakaleek mentioned this pull request Jul 19, 2023

Fix ironbank validation in 8.8 #36115

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AWS] Fix ec2 host.cpu.usage + include instance metadata and rate metrics for linked monitoring accounts #35717

[AWS] Fix ec2 host.cpu.usage + include instance metadata and rate metrics for linked monitoring accounts #35717

kaiyan-sheng commented Jun 8, 2023 •

edited

Loading

mergify bot commented Jun 8, 2023

elasticmachine commented Jun 8, 2023 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

tommyers-elastic commented Jun 8, 2023

kaiyan-sheng commented Jun 13, 2023

mergify bot commented Jun 14, 2023

tommyers-elastic Jun 15, 2023

kaiyan-sheng Jun 15, 2023

kaiyan-sheng Jun 15, 2023

tommyers-elastic Jun 15, 2023

kaiyan-sheng Jun 15, 2023

tommyers-elastic commented Jun 15, 2023

kaiyan-sheng commented Jun 15, 2023 •

edited

Loading

[AWS] Fix ec2 host.cpu.usage + include instance metadata and rate metrics for linked monitoring accounts #35717

[AWS] Fix ec2 host.cpu.usage + include instance metadata and rate metrics for linked monitoring accounts #35717

Conversation

kaiyan-sheng commented Jun 8, 2023 • edited Loading

What does this PR do?

Checklist

Related issues

mergify bot commented Jun 8, 2023

elasticmachine commented Jun 8, 2023 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

tommyers-elastic commented Jun 8, 2023

kaiyan-sheng commented Jun 13, 2023

mergify bot commented Jun 14, 2023

tommyers-elastic Jun 15, 2023

Choose a reason for hiding this comment

kaiyan-sheng Jun 15, 2023

Choose a reason for hiding this comment

kaiyan-sheng Jun 15, 2023

Choose a reason for hiding this comment

tommyers-elastic Jun 15, 2023

Choose a reason for hiding this comment

kaiyan-sheng Jun 15, 2023

Choose a reason for hiding this comment

tommyers-elastic commented Jun 15, 2023

kaiyan-sheng commented Jun 15, 2023 • edited Loading

kaiyan-sheng commented Jun 8, 2023 •

edited

Loading

elasticmachine commented Jun 8, 2023 •

edited by jenkins-beats-ci bot

Loading

kaiyan-sheng commented Jun 15, 2023 •

edited

Loading