Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Logs Intake API (POC) #8757

Closed
simitt opened this issue Jul 29, 2022 · 1 comment · Fixed by #9068
Closed

Add Logs Intake API (POC) #8757

simitt opened this issue Jul 29, 2022 · 1 comment · Fixed by #9068
Assignees
Milestone

Comments

@simitt
Copy link
Contributor

simitt commented Jul 29, 2022

The AWS lambda extension is already forwarding metrics and trace signals to the APM Server. Now the extension is adding support for collecting logs (elastic/apm-aws-lambda#256). The APM server should be extended to also support proxying these logs to Elasticsearch.

This issue can be used as meta issue and be further split up, if necessary.

The POC should include:

  • Extend the http server to support receiving logs events (provide a logs endpoint or add logs event support to existing event intake API).
  • Add JSON spec for the logs format
  • Parse & process log events
  • Add a dedicated data stream, including mappings, ILM policy and ingest pipeline to the apmpackage.
  • Ship logs into one common logs-apm.* ES data stream for this POC.

Expected outcome

  • have a working prototype
  • identify further work items and potential bottlenecks or issues.

Out of Scope

  • Define final data stream strategy.
  • Full end-to-end test coverage
@felixbarny
Copy link
Member

felixbarny commented Sep 7, 2022

JSON spec for the log events

I think we can keep the JSON spec very simple or maybe not even have a JSON spec. That's because there shouldn't be any required structure for logs and any top-level fields may be added.

The only field we may want to require is @timestamp, but in the context of LX, we're also thinking of ways to automatically set it to the ingest time if it's absent. For ECS JSON logs, which APM Agents will send we can, however, assume that there's always an @timestamp field.

What we could also do is to treat log events as an opaque string in APM Server and put the raw JSON that the agents send in the message field. Then, we can parse the message with an ingest pipeline in ES. I’ve created a POC of such a pipeline: https://github.com/elastic/integrations/pull/2972/files#diff-5ac510f65f030ff03cdc0585fe10b87569bcc6ffe9cb352be0b2097c3a3d77f9

Dynamic mappings (stretch goal)

When it comes to mappings, we need to take into account that ECS logs may add arbitrary key/value pairs to the root of the document. We could re-use some of the mappings I did for the Elasticsearch log ingestion API POC: https://github.com/elastic/elasticsearch/pull/88181/files#diff-0fba05a9236d9aa3866db886f0b6c24759be01a775023e1663b8f44f91951e62

The general idea is that only a handful of fields are indexed. Dynamic fields are added as runtime fields (dynamic: runtime).

I realize that this is quite a big shift in the model. Historically, we were having very strict models for the data that go into the APM indices. I think it makes a lot of sense for traces to be highly structured. But logs are more unstructured/semi-structured by nature.

We consider runtime dynamic mappings as a stretch goal in 8.5, but only if it's very clear that it won't cause any instability for the existing OTLP logs support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment