Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable metrics-apm datastream pattern. #8182

Open
evilezh opened this issue May 21, 2022 · 12 comments
Open

Configurable metrics-apm datastream pattern. #8182

evilezh opened this issue May 21, 2022 · 12 comments

Comments

@evilezh
Copy link

evilezh commented May 21, 2022

Problem in short:
I couldn't find a way to tell - not to split metrics per service. Now it is weird pattern - metrics-apm.app.<service-name>-<ns> .
Now imagine - I've 100 services ... in 5 namespaces. which would create 500 data streams ... each data stream would have ILM ...

None of those indexes will properly fill up, neither lifecycle will be ok. I would prefer in my case to use common index for all metrics-apm and recycle with single policy.

@axw
Copy link
Member

axw commented May 23, 2022

@evilezh thanks for opening the issue. We have been thinking about adding something like this, but no concrete plans yet. We might come back with some questions for you in the future, when we've prioritised this.

@axw
Copy link
Member

axw commented May 25, 2022

One thing I should have noted earlier: for builtin metrics (i.e. those measured by Elastic APM agents) specifically, we will start sending these to a common data stream per namespace as of 8.3. See #7520.

@glucaci
Copy link

glucaci commented May 25, 2022

We have also a problem with this one and starting seeing JVM memory pressure on all our cluster because of it.

The problem is if you have like 200 services, which we will reach soon and a index rollover of 10 days you will reach only with metrics index 600 shards / month and if you want to keep the data for 3-4 months than you need a lot of RAM which is not justifiable related to the used disk space.

I see it as a high priority issue to have the possibility to merge all the metrics indexes in one as for logs and traces.

Thanks!

@glucaci
Copy link

glucaci commented Nov 11, 2022

@axw any news about this feature?

@axw
Copy link
Member

axw commented Nov 14, 2022

@glucaci since 8.3 we've been sending Elastic APM agent built-in metrics to a common data stream. Custom metrics still go to service-specific data streams, and we don't yet have a solution for splitting them out.

Are you on a recent version of the stack? Are you still observing issues?

@glucaci
Copy link

glucaci commented Nov 22, 2022

Currently we don't have any memory issues but the cluster has the maximum shards allocation which means we cannot create any additional shard. (e.g. adding a watcher).
95% of the indices that we have are application metrics .ds-metrics-apm.app.the-application-name

There is any plans to do this in a standard way for the Elastic APM agent and Open-Telemetry ?

We have a temporary solution which I didn't tried yet from a support ticket, which implies to change the metrics ingestion pipeline and add the following script

{
  "script": {
    "source": """
      ctx["data_stream.dataset"] = "apm.app.all";
      ctx["_index"] = "metrics-apm.app.all-" + ctx["data_stream.namespace"];
    """
  }
}

If this is a good solution why is not coming in the elastic release?

Thanks!

@axw
Copy link
Member

axw commented Nov 22, 2022

@glucaci which version of the stack are you on? Would you be able to share a document for a few different .ds-metrics-apm.app.the-application-name indices (for different values of "the-application-name")? It may help us identify whether there's a bug with the current metrics-combining code, or whether it's just the custom metrics that we don't yet have a solution for.

We have a temporary solution which I didn't tried yet from a support ticket, which implies to change the metrics ingestion pipeline and add the following script
...
If this is a good solution why is not coming in the elastic release?

This is a workaround that won't work in all situations. It will work if there is no overlap between the metrics across the different services, or if they overlap but the metric definitions do not conflict. If there are conflicts, then it would prevent ingestion.

@glucaci
Copy link

glucaci commented Nov 22, 2022

Sure, bellow you can see a document from one of the apps.

{
  "_index": ".ds-metrics-apm.app.api_1",
  "_id": "i4vUnoQBHKZw1ZUhanQN",
  "_version": 1,
  "_score": 0,
  "_source": {
    "agent": {
      "name": "dotnet",
      "version": "1.4.0.599"
    },
    "process.runtime.dotnet.gc.committed": 84,
    "data_stream.namespace": "default",
    "data_stream.type": "metrics",
    "processor": {
      "name": "metric",
      "event": "metric"
    },
    "labels": {
      "service_namespace": "Api_1"
    },
    "metricset.name": "app",
    "observer": {
      "hostname": "55923b643526",
      "id": "06d79c18-d317-400f-b8e2-9a74b8974db4",
      "type": "apm-server",
      "ephemeral_id": "351565c4-2459-49ae-a13b-40a07173d9f7",
      "version": "8.5.1"
    },
    "@timestamp": "2022-11-22T10:13:50.595Z",
    "ecs": {
      "version": "1.12.0"
    },
    "service": {
      "node": {
        "name": "8e2ba94d-f741-4d24-ab4b-d9e58ca1bfbe"
      },
      "environment": "DEV",
      "name": "Api_1 Api",
      "language": {
        "name": "unknown"
      },
      "version": "1.52.0.0"
    },
    "data_stream.dataset": "apm.app.api_1",
    "event": {
      "agent_id_status": "missing",
      "ingested": "2022-11-22T10:13:51Z"
    }
  },
  "fields": {
    "service.environment": ["DEV"],
    "process.runtime.dotnet.gc.committed": [84],
    "service.name": ["Api_1 Api"],
    "data_stream.namespace": ["default"],
    "processor.name": ["metric"],
    "service.node.name": ["8e2ba94d-f741-4d24-ab4b-d9e58ca1bfbe"],
    "service.language.name": ["unknown"],
    "observer.hostname": ["55923b643526"],
    "data_stream.type": ["metrics"],
    "metricset.name": ["app"],
    "event.ingested": ["2022-11-22T10:13:51.000Z"],
    "observer.id": ["06d79c18-d317-400f-b8e2-9a74b8974db4"],
    "@timestamp": ["2022-11-22T10:13:50.595Z"],
    "service.version": ["1.52.0.0"],
    "observer.ephemeral_id": ["351565c4-2459-49ae-a13b-40a07173d9f7"],
    "observer.version": ["8.5.1"],
    "observer.type": ["apm-server"],
    "ecs.version": ["1.12.0"],
    "data_stream.dataset": ["apm.app.api_1"],
    "processor.event": ["metric"],
    "agent.name": ["dotnet"],
    "agent.version": ["1.4.0.599"],
    "event.agent_id_status": ["missing"],
    "labels.service_namespace": ["Api_1"]
  }
}

The document is the same for all the apps but with different fields which are exported with the open-telemetry instrumentation for dotnet runtime
image

In this case it will work the workaround?

The same metrics format are used also by the java open-telemetry instrumentation. There are any plans to align also the Elastic APM agent with the ones from open-telemetry and create a "standard" ingestion?

Thanks!

@axw
Copy link
Member

axw commented Nov 24, 2022

Thanks @glucaci!

In this case it will work the workaround?

The metrics look like they shouldn't collide with any others - I think the workaround is safe in this case.

The same metrics format are used also by the java open-telemetry instrumentation. There are any plans to align also the Elastic APM agent with the ones from open-telemetry and create a "standard" ingestion?

We do have some plans to map OpenTelemetry metrics to the ones our agents produce. We already do this for JVM runtime metrics, but we haven't yet done it for .NET/CLR metrics. Although we map the JVM metrics, we also record the original OTel metrics; so this means we're still creating application-specific data streams for OTel-instrumented Java programs. I think we'll need to revisit that decision.

@nyp-cgranata
Copy link

We are experiencing the same issue as the folks above.

@axw Have there been any updates?

@axw
Copy link
Member

axw commented Oct 18, 2023

@nyp-cgranata we recently added support for configurable routing via ingest pipelines: #10991

In the not too distant future we intend to make at least following changes:

  • migrate metrics (initially custom/application metrics, then the rest) to Elasticsearch's time series data streams
  • stop producing service-specific metrics data streams by default; users would still be able to opt into this using the reroute processor
  • flatten metric field names (e.g. a.b.c would no longer be considered a JSON object hierarchy, but instead a flat name with dots in it) to avoid mapping conflicts: Support metrics with dots in their names apm#347

@paulevanmr
Copy link

Hi - any updates on this?

Have just run into this issue myself. We only have 17 services, but ideally want to collate everything to a pattern along the lines of:

apm-metrics-{ns} as our requirements are quite simple and we can split things out by queries.

Using otel SDK > otel collector > apm > elasticsearch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants