Configurable metrics-apm datastream pattern. #8182

evilezh · 2022-05-21T15:02:42Z

Problem in short:
I couldn't find a way to tell - not to split metrics per service. Now it is weird pattern - metrics-apm.app.<service-name>-<ns> .
Now imagine - I've 100 services ... in 5 namespaces. which would create 500 data streams ... each data stream would have ILM ...

None of those indexes will properly fill up, neither lifecycle will be ok. I would prefer in my case to use common index for all metrics-apm and recycle with single policy.

The text was updated successfully, but these errors were encountered:

axw · 2022-05-23T01:22:43Z

@evilezh thanks for opening the issue. We have been thinking about adding something like this, but no concrete plans yet. We might come back with some questions for you in the future, when we've prioritised this.

axw · 2022-05-25T06:18:02Z

One thing I should have noted earlier: for builtin metrics (i.e. those measured by Elastic APM agents) specifically, we will start sending these to a common data stream per namespace as of 8.3. See #7520.

glucaci · 2022-05-25T08:33:57Z

We have also a problem with this one and starting seeing JVM memory pressure on all our cluster because of it.

The problem is if you have like 200 services, which we will reach soon and a index rollover of 10 days you will reach only with metrics index 600 shards / month and if you want to keep the data for 3-4 months than you need a lot of RAM which is not justifiable related to the used disk space.

I see it as a high priority issue to have the possibility to merge all the metrics indexes in one as for logs and traces.

Thanks!

glucaci · 2022-11-11T13:45:44Z

@axw any news about this feature?

axw · 2022-11-14T03:08:52Z

@glucaci since 8.3 we've been sending Elastic APM agent built-in metrics to a common data stream. Custom metrics still go to service-specific data streams, and we don't yet have a solution for splitting them out.

Are you on a recent version of the stack? Are you still observing issues?

glucaci · 2022-11-22T10:11:52Z

Currently we don't have any memory issues but the cluster has the maximum shards allocation which means we cannot create any additional shard. (e.g. adding a watcher).
95% of the indices that we have are application metrics .ds-metrics-apm.app.the-application-name

There is any plans to do this in a standard way for the Elastic APM agent and Open-Telemetry ?

We have a temporary solution which I didn't tried yet from a support ticket, which implies to change the metrics ingestion pipeline and add the following script

{
  "script": {
    "source": """
      ctx["data_stream.dataset"] = "apm.app.all";
      ctx["_index"] = "metrics-apm.app.all-" + ctx["data_stream.namespace"];
    """
  }
}

If this is a good solution why is not coming in the elastic release?

Thanks!

axw · 2022-11-22T10:18:54Z

@glucaci which version of the stack are you on? Would you be able to share a document for a few different .ds-metrics-apm.app.the-application-name indices (for different values of "the-application-name")? It may help us identify whether there's a bug with the current metrics-combining code, or whether it's just the custom metrics that we don't yet have a solution for.

We have a temporary solution which I didn't tried yet from a support ticket, which implies to change the metrics ingestion pipeline and add the following script
...
If this is a good solution why is not coming in the elastic release?

This is a workaround that won't work in all situations. It will work if there is no overlap between the metrics across the different services, or if they overlap but the metric definitions do not conflict. If there are conflicts, then it would prevent ingestion.

glucaci · 2022-11-22T11:39:25Z

Sure, bellow you can see a document from one of the apps.

{
  "_index": ".ds-metrics-apm.app.api_1",
  "_id": "i4vUnoQBHKZw1ZUhanQN",
  "_version": 1,
  "_score": 0,
  "_source": {
    "agent": {
      "name": "dotnet",
      "version": "1.4.0.599"
    },
    "process.runtime.dotnet.gc.committed": 84,
    "data_stream.namespace": "default",
    "data_stream.type": "metrics",
    "processor": {
      "name": "metric",
      "event": "metric"
    },
    "labels": {
      "service_namespace": "Api_1"
    },
    "metricset.name": "app",
    "observer": {
      "hostname": "55923b643526",
      "id": "06d79c18-d317-400f-b8e2-9a74b8974db4",
      "type": "apm-server",
      "ephemeral_id": "351565c4-2459-49ae-a13b-40a07173d9f7",
      "version": "8.5.1"
    },
    "@timestamp": "2022-11-22T10:13:50.595Z",
    "ecs": {
      "version": "1.12.0"
    },
    "service": {
      "node": {
        "name": "8e2ba94d-f741-4d24-ab4b-d9e58ca1bfbe"
      },
      "environment": "DEV",
      "name": "Api_1 Api",
      "language": {
        "name": "unknown"
      },
      "version": "1.52.0.0"
    },
    "data_stream.dataset": "apm.app.api_1",
    "event": {
      "agent_id_status": "missing",
      "ingested": "2022-11-22T10:13:51Z"
    }
  },
  "fields": {
    "service.environment": ["DEV"],
    "process.runtime.dotnet.gc.committed": [84],
    "service.name": ["Api_1 Api"],
    "data_stream.namespace": ["default"],
    "processor.name": ["metric"],
    "service.node.name": ["8e2ba94d-f741-4d24-ab4b-d9e58ca1bfbe"],
    "service.language.name": ["unknown"],
    "observer.hostname": ["55923b643526"],
    "data_stream.type": ["metrics"],
    "metricset.name": ["app"],
    "event.ingested": ["2022-11-22T10:13:51.000Z"],
    "observer.id": ["06d79c18-d317-400f-b8e2-9a74b8974db4"],
    "@timestamp": ["2022-11-22T10:13:50.595Z"],
    "service.version": ["1.52.0.0"],
    "observer.ephemeral_id": ["351565c4-2459-49ae-a13b-40a07173d9f7"],
    "observer.version": ["8.5.1"],
    "observer.type": ["apm-server"],
    "ecs.version": ["1.12.0"],
    "data_stream.dataset": ["apm.app.api_1"],
    "processor.event": ["metric"],
    "agent.name": ["dotnet"],
    "agent.version": ["1.4.0.599"],
    "event.agent_id_status": ["missing"],
    "labels.service_namespace": ["Api_1"]
  }
}

The document is the same for all the apps but with different fields which are exported with the open-telemetry instrumentation for dotnet runtime

In this case it will work the workaround?

The same metrics format are used also by the java open-telemetry instrumentation. There are any plans to align also the Elastic APM agent with the ones from open-telemetry and create a "standard" ingestion?

Thanks!

axw · 2022-11-24T05:09:54Z

Thanks @glucaci!

In this case it will work the workaround?

The metrics look like they shouldn't collide with any others - I think the workaround is safe in this case.

The same metrics format are used also by the java open-telemetry instrumentation. There are any plans to align also the Elastic APM agent with the ones from open-telemetry and create a "standard" ingestion?

We do have some plans to map OpenTelemetry metrics to the ones our agents produce. We already do this for JVM runtime metrics, but we haven't yet done it for .NET/CLR metrics. Although we map the JVM metrics, we also record the original OTel metrics; so this means we're still creating application-specific data streams for OTel-instrumented Java programs. I think we'll need to revisit that decision.

nyp-cgranata · 2023-10-17T18:34:30Z

We are experiencing the same issue as the folks above.

@axw Have there been any updates?

axw · 2023-10-18T00:27:01Z

@nyp-cgranata we recently added support for configurable routing via ingest pipelines: #10991

In the not too distant future we intend to make at least following changes:

migrate metrics (initially custom/application metrics, then the rest) to Elasticsearch's time series data streams
stop producing service-specific metrics data streams by default; users would still be able to opt into this using the reroute processor
flatten metric field names (e.g. a.b.c would no longer be considered a JSON object hierarchy, but instead a flat name with dots in it) to avoid mapping conflicts: Support metrics with dots in their names apm#347

paulevanmr · 2024-10-31T11:52:22Z

Hi - any updates on this?

Have just run into this issue myself. We only have 17 services, but ideally want to collate everything to a pattern along the lines of:

apm-metrics-{ns} as our requirements are quite simple and we can split things out by queries.

Using otel SDK > otel collector > apm > elasticsearch

evilezh added the enhancement label May 21, 2022

simitt added 8.4-candidate 8.5-candidate and removed 8.4-candidate labels May 25, 2022

simitt removed the 8.5-candidate label Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable metrics-apm datastream pattern. #8182

Configurable metrics-apm datastream pattern. #8182

evilezh commented May 21, 2022 •

edited

Loading

axw commented May 23, 2022

axw commented May 25, 2022

glucaci commented May 25, 2022

glucaci commented Nov 11, 2022

axw commented Nov 14, 2022

glucaci commented Nov 22, 2022

axw commented Nov 22, 2022

glucaci commented Nov 22, 2022

axw commented Nov 24, 2022

nyp-cgranata commented Oct 17, 2023

axw commented Oct 18, 2023

paulevanmr commented Oct 31, 2024

Configurable metrics-apm datastream pattern. #8182

Configurable metrics-apm datastream pattern. #8182

Comments

evilezh commented May 21, 2022 • edited Loading

axw commented May 23, 2022

axw commented May 25, 2022

glucaci commented May 25, 2022

glucaci commented Nov 11, 2022

axw commented Nov 14, 2022

glucaci commented Nov 22, 2022

axw commented Nov 22, 2022

glucaci commented Nov 22, 2022

axw commented Nov 24, 2022

nyp-cgranata commented Oct 17, 2023

axw commented Oct 18, 2023

paulevanmr commented Oct 31, 2024

evilezh commented May 21, 2022 •

edited

Loading