Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Cloudwatch metrics streams #956

Closed
exekias opened this issue Apr 28, 2021 · 9 comments · Fixed by elastic/apm-server#6380
Closed

Add support for Cloudwatch metrics streams #956

exekias opened this issue Apr 28, 2021 · 9 comments · Fixed by elastic/apm-server#6380
Assignees
Labels
enhancement New feature or request Integration:aws AWS Team:Integrations Label for the Integrations team

Comments

@exekias
Copy link

exekias commented Apr 28, 2021

Amazon recently announced their new feature to allow cloudwatch metrics to be streamed live to several destinations, including Kinesis Firehose:
https://aws.amazon.com/blogs/aws/cloudwatch-metric-streams-send-aws-metrics-to-partners-and-to-your-apps-in-real-time/

We want to check the feasibility of this approach and potentially add support for it.

@exekias exekias added enhancement New feature or request Team:Integrations Label for the Integrations team Integration:aws AWS labels Apr 28, 2021
@elasticmachine
Copy link

Pinging @elastic/integrations (Team:Integrations)

@kaiyan-sheng
Copy link
Contributor

kaiyan-sheng commented Oct 11, 2021

I just set up CloudWatch metric streams to send cloudwatch monitoring metrics to Amazon Firehose delivery stream with an HTTP endpoint as the destination. The output format I chose for cloudwatch metric stream is JSON. It can also be set to OpenTelemetry 0.7 format.

The metrics look like this:

{"metric_stream_name":"cloudwatch-metric-stream-us-east-1","account_id":"123456789","region":"us-east-1","namespace":"AWS/Logs","metric_name":"IncomingBytes","dimensions":{"LogGroupName":"RDSOSMetrics"},"timestamp":1633982880000,"value":{"count":2.0,"sum":16755.0,"max":9403.0,"min":7352.0},"unit":"Bytes"}
{"metric_stream_name":"cloudwatch-metric-stream-us-east-1","account_id":"123456789","region":"us-east-1","namespace":"AWS/RDS","metric_name":"EngineUptime","dimensions":{"DBInstanceIdentifier":"database-1-instance-1"},"timestamp":1633982880000,"value":{"count":1.0,"sum":3.1746941E7,"max":3.1746941E7,"min":3.1746941E7},"unit":"Seconds"}
{"metric_stream_name":"cloudwatch-metric-stream-us-east-1","account_id":"123456789","region":"us-east-1","namespace":"AWS/RDS","metric_name":"RollbackSegmentHistoryListLength","dimensions":{"DBClusterIdentifier":"database-1","Role":"READER"},"timestamp":1633982880000,"value":{"count":1.0,"sum":0.0,"max":0.0,"min":0.0},"unit":"Count"}

I tested with the endpoint we are adding for Firehose logs in apm-server and it can ingest these metrics as messages similar to logs just fine.

{
  "_index": ".ds-metrics-firehose-default-2021.10.11-000001",
  "_type": "_doc",
  "_id": "sQdWcXwBuxVZ-WLZ2TTB",
  "_score": 1,
  "_source": {
    "@timestamp": "2021-10-11T21:49:00.000Z",
    "data_stream.type": "metrics",
    "data_stream.dataset": "firehose",
    "ecs": {
      "version": "1.12.0"
    },
    "message": "{\"metric_stream_name\":\"cloudwatch-metric-stream-us-east-1\",\"account_id\":\"123456789\",\"region\":\"us-east-1\",\"namespace\":\"AWS/Logs\",\"metric_name\":\"IncomingBytes\",\"dimensions\":{\"LogGroupName\":\"RDSOSMetrics\"},\"timestamp\":1633988940000,\"value\":{\"count\":2.0,\"sum\":16752.0,\"max\":9400.0,\"min\":7352.0},\"unit\":\"Bytes\"}",
    "cloud": {
      "origin": {
        "account.id": "123456789",
        "region": "us-east-1"
      }
    },
    "processor": {
      "name": "metric",
      "event": "metric"
    },
    "metricset.name": "IncomingBytes",
    "data_stream.namespace": "default",
    "service": {
      "origin": {
        "name": "deliverystream/test-cloudwatch-metric-streams",
        "id": "arn:aws:firehose:us-east-1:123456789:deliverystream/test-cloudwatch-metric-streams"
      }
    },
    "observer": {
      "version_major": 8,
      "ephemeral_id": "9b601d97-4cac-4482-8081-c76c17f9a748",
      "hostname": "ip-172-31-84-43.ec2.internal",
      "id": "1df86227-5ac9-4127-84a2-655ab45b3d76",
      "type": "apm-server",
      "version": "8.0.0"
    }
  }
}

But I think it would be good to parse these metrics and store them under different fields. I think we can leverage the APM metricset to some extend. For example: metricset.name can be aws.logs.IncomingBytes.avg based on the namespace and `metric_name from the message field.

@axw Do you think it's OK to add a processMetric function into the firehose handler code to map CloudWatch metrics to use metricset fields? Also since we can set cloudwatch metric streams format to OpenTelemetry 0.7, can we leverage any Otel work that's already done in apm server? TIA!

@axw
Copy link
Member

axw commented Oct 12, 2021

@axw Do you think it's OK to add a processMetric function into the firehose handler code to map CloudWatch metrics to use metricset fields? Also since we can set cloudwatch metric streams format to OpenTelemetry 0.7, can we leverage any Otel work that's already done in apm server? TIA!

Yes I think we should add a method to process metrics. I don't think we should use the metric name in the metricset name, as that would force all metrics into their own metricset which I don't think would be desirable. I would expect the following mapping:

{"metric_stream_name":"cloudwatch-metric-stream-us-east-1","account_id":"123456789","region":"us-east-1","namespace":"AWS/Logs","metric_name":"IncomingBytes","dimensions":{"LogGroupName":"RDSOSMetrics"},"timestamp":1633982880000,"value":{"count":2.0,"sum":16755.0,"max":9403.0,"min":7352.0},"unit":"Bytes"}

  • cloud.account.id: 123456789
  • cloud.region: us-east-1
  • metricset.name: AWS/Logs:RDSOSMetrics (maybe? not sure about this)
  • IncomingBytes.value_count: 2
  • IncomingBytes.sum: 16755
  • IncomingBytes.max: 9403
  • IncomingBytes.min: 7352

In this case, IncomingBytes is a summary metric with sub-fields, and should be mapped using aggregate-metric-double. We don't currently have support for mapping these metric types unfortunately

In theory we should be able to use the OpenTelemetry format, decoding it and exporting/calling code in https://github.com/elastic/apm-server/blob/master/processor/otel/metrics.go. I don't think OpenTelemetry 0.7 will work though, as there have been breaking changes to OTLP metrics recently.

@kaiyan-sheng
Copy link
Contributor

@axw Actually all cloudwatch metrics are in summary type. They all have different statistic methods as subfields. Should I spend some time to add summary into metricset.go? Or I can just flatten the fields like you suggested (IncomingBytes.max) and use MetricsetSample to store the units and values.

Use the same sample CloudWatch metric, I think metricset.name should include both the dimension key and value since the CloudWatch metrics can have multiple keys for the same namespace. Maybe metricset.name can be AWS/Logs: LogGroupName:RDSOSMetrics?

@axw
Copy link
Member

axw commented Oct 13, 2021

Actually all cloudwatch metrics are in summary type. They all have different statistic methods as subfields. Should I spend some time to add summary into metricset.go? Or I can just flatten the fields like you suggested (IncomingBytes.max) and use MetricsetSample to store the units and values.

I would prefer that we add general support for summaries, but it's not straightforward currently. We really need elastic/elasticsearch#74145 to be able to dynamically map summaries where the statistic sub-fields vary. Also, we'll need elastic/elasticsearch#72536 for recording units.

Do we need support for custom metrics at the moment, or is it enough to support only well-defined AWS service metrics?

Use the same sample CloudWatch metric, I think metricset.name should include both the dimension key and value since the CloudWatch metrics can have multiple keys for the same namespace. Maybe metricset.name can be AWS/Logs: LogGroupName:RDSOSMetrics?

Generally we map metric dimensions to labels. Eventually these should be indicated as TSDB dimensions when we start using the upcoming time series indexing mode (elastic/elasticsearch#74450). If we were to do that, then we would end up with:

  • metricset.name: AWS/Logs
  • labels.LogGroupName: RDSOSMetrics

But "AWS/Logs" as a metricset.name is probably too broad, and not useful by itself. I think what we probably should be doing here is identifying metrics that come from AWS services, assigning them an appropriate metricset.name, and data_stream.dataset based on the AWS service. For this example I think that would be:

  • metricset.name: RDSOSMetrics
  • data_stream.dataset: aws.rds.osmetrics (rather than "firehose")

Then you can define a mapping for the known metrics.

@kaiyan-sheng
Copy link
Contributor

Do we need support for custom metrics at the moment, or is it enough to support only well-defined AWS service metrics?

We will only support metrics reported by AWS services for now. No custom metrics at this point :)

Sorry I'm still having problems with what we should use for metricset.name and data_stream.dataset. I don't think the sample metric from AWS/Logs is helpful. Let me use a more common AWS EC2 CPU utilization metric as an example in the draft PR then:
elastic/apm-server#6380

@kaiyan-sheng
Copy link
Contributor

Initial PR elastic/apm-server#6380 for adding CloudWatch metric streams support through firehose endpoint is merged. But I will keep this issue open till testing on Cloud is done.

@kaiyan-sheng
Copy link
Contributor

Testing /firehose endpoint can be done with 8.0.0-SNAPSHOT deployment in GCP us-west2 now. Here are the CloudWatch metric streams data ingested into Elasticsearch:
Screen Shot 2021-11-10 at 8 47 10 AM

@RichiCoder1
Copy link

It looks like support for this was removed from the APM Service. Is it planned to land elsewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Integration:aws AWS Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants