Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Libbeat] ECS Logging #15544

Closed
ph opened this issue Jan 14, 2020 · 25 comments · Fixed by #28573
Closed

[Libbeat] ECS Logging #15544

ph opened this issue Jan 14, 2020 · 25 comments · Fixed by #28573
Assignees
Labels
breaking change Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:Integrations Label for the Integrations team v8.0.0

Comments

@ph
Copy link
Contributor

ph commented Jan 14, 2020

Motivation

We want to move to ECS log format and get benefits of a uniform logging schema on all the products. This allows us to develop ECS aware visualization of behavior like correlating metrics or events between products. The change in format is a breaking change and we should target it for 8.0.

History of logging in beats

Initially, beats were using a custom logging library and after we partially moved to uber's zap library. I believe the initial reason to move to this library was to reduce the scope of things we have to maintain, JSON output, better namespace support and also get a speed increase benefit when we are actively writing a lot of log events.

Note, I did use the word partially because we never completed the initial intended complete migration. We wanted to remove all global loggers instantiation and correctly create namespace per concerns, sadly we still have a lot of global loggers and not all logger instances have the right context attached to it which complicate debugging and support.

How we will migrate to ECS?

I think we have a few options to migrate to the ECS log format.

Option 1

Keep using the uber Zap library and add a custom formatter on top of it to add 10-15 ECS fields to each logs statement.

Pros Cons
Low impact on current code Hard to extend when more typed fields are added (IP Address, Source)
Rely on external library reduced changes Do not improve error reporting, the logs are still flat and we don't have an easy way to add contextual data.
When using the Structured export format it doesn't necessarily mean that we produce valid JSON.

Option 2

A few month ago @urso, @ruflin and I talked about how we could better target ECS log format. With that @urso created a POC called ECSlog.

Pros Cons
Can use any fields From ECS It's a complete library to maintain (exported, file rotation)
Typesafety (generated from ECS definition) Not stable yet.
Allow both machine readable error message and human readable error message
Strong emphasis on context embedded/traceability
Implement an ECS friendly error type
We can make the old logging statement works as is and gracefully upgrade
Enforce valid content with types

Option 3

Could it be possible to mix Option 1 and Option 2, meaning that we delegate file rotation and exporter to Zap but we enforce everything on top of it. This actually mean that Zab mostly deal with raw bytes or string? I don't know if its possibe, WDYT @urso?

Next steps

In all the option above we need to write new codes, either we write a new formatter or bring ECSLog to stable, or a mix. We have to keep in mind, moving to a structured format mean that we will have to deal with backward compatibility.

I surely miss pros and cons and I would like your opinion @urso @graphaelli @ruflin @andresrc @webmat

related to #15129, elastic/apm-server#3155

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest (Project:fleet)

@ph
Copy link
Contributor Author

ph commented Jan 14, 2020

For me, its the matter of effort we want to put upfront and the benefits we can get:

Option 1: is indeed a simple formatter but offer less control to the author, with the growing addition of modules and the move to the agent this doesn't give us a lot of finer control on what and on what form the data is send to Fleet. Is that OK? Maybe yes maybe no.

Option 2: This indeed gives us more work in the code we have to maintain, but this gives us freedom while enforcing type-safety. In bonus we get better contextual error reporting when implemented. And the machine vs human-readable format is a plus, look at the following example.

log.Errorf("Failed to access '%{file}'", ecs.File.Path("test.txt"))

Produces

{
  ...
  "file": {
    "path": "test.txt"
  },
  "log": {
    ...
  },
  "message": "Failed to access 'test.txt'"
}

I strongly suggest you look at the examples in the readme.

Both options 1 and 2 can be gracefully added, keeping old unstructured

@ph
Copy link
Contributor Author

ph commented Jan 14, 2020

Note, I might be a bit biased, @michalpristas and myself did play with parts of the ECSlogs framework, we did sadly not use it to the full potential because we wanted to go back to what libbeat was using. We did find a few issues which are now fixed.

@urso
Copy link

urso commented Jan 15, 2020

Well, I'm biased towards option 2, but it definitely needs some more testing. It also adds serilog like format strings.

About file rotation: We might consider to use stdout/stderr more by default. On linux it is kind of the standard (systemd and containers prefer this). Well, not sure windows event log would be good target or if we should continue to prefer file rotation.

The rotation in ecslog always adds a timestamp to rotated files. This is important if you want to collect the logs using filebeat. Without adding timestamps (default in Beats) filebeat can block rotation to happen, because Windows doesn't allow log files to be rotated/deleted as long as filebeat is still reading them. Currently in Beats one needs to configure an rotation interval in order to have timestamps added to the rotated files. The zap project does not provide file rotation support. The file rotation we have is a custom implementation within libbeat. I think the file rotation implementation in ecslog also better separates triggers, rotation, layout and also support gzipping old files.

Ecslog also has better support for logging errors + supports wrapping/returning errors with an optional ECS based context. Finally ECS fields from error messages can be added to the log event.

Option 3 was what I originally experimented with. Have a meta-logger that uses other loggers (zap/zerolog/logrus/...) as backend. But I finally opted against this solution. Besides being able to format a log message, the core functionality that differentiates structured logging libraries is the context and how fields can be added. This can even impact encoding.

For example zap just appends fields (more like streaming new fields as events into the context). Nesting support is limited to: have a field with a custom object or create a nesting level at the end of the stream (one can not remove the nesting level anymore). Fields with same names do appear multiple times in a document, which is not supported by all JSON parsers. At worst events might not be indexible in ES, which was one of the main goals of ecslog. Elasticsearch fails to parse the JSON document if we try this: PUT /testdoc {"field": 1, "field": 2}. Although it is an artificial example, events like this are easily producible with zap.

Logrus on the otherhand uses a map for it's context that is copied and updated whenever fields are added (rather expensive in comparison).

ecslog uses a hybrid of zap and logrus. One adds fields like zap, but when logging we sort the fields and remove duplicates. The sorting can be cached for future log operations. ecslog contexts can be combined (they are a tree internall), without copying and iterating contexts over and over again.

ecslog also differentiates between standardized fields and custom fields, allowing the layout to handle adhoc custom fields by developers to be handled differently. For example always encode custom fields to an embedded string to not break mappings in Elasticsearch.

Finally having a custom context type requires a custom logger. Still using other loggers as backend would require context to be streamed to the backend logger, that needs to build a custom context to be thrown away right after each time we want to log.

@webmat
Copy link
Contributor

webmat commented Jan 17, 2020

The goal of the ECS logging libraries is twofold. As mentioned in Option 1, the MVP is to at least support 10-15 fields that can be populated transparently by a logging formatter. This MVP level lowers the bar to adoption of the ECS logging library in question. You just enable it, and the basics are done for you with very little effort. But there is also a desire to let developers to log any arbitrary ECS field as well, with these libraries.

So my understanding is that the mix of option 1 and 2 could actually yield a reuseable Golang logging library that hits both targets, and can be added to the list of ECS Logging libraries we offer customers here https://github.com/elastic/ecs-logging 😀🤞🏻?

cc @philkra

@urso
Copy link

urso commented Jan 21, 2020

This issue is not about the ecs-logging project, multiple library support or multiple language support. It is about ECS based logging in Beats in particular. With this in mind let's talk about what we want to expect from the logging support in Beats in the future. Some features/requirements coming to mind:

  1. Structured logging is a must
  2. Logs must follow ECS and must be indexible in ECS
    a. Logger should never create events that can not be indexed or not parsed by Elasticsearch
    b. Logger must fill in default ECS fields seamlessly
    c. Logger or Library must guarantee type safety when logging ECS fields (ensure events are always indexible)
    d. Besides standardized fields, logger must still support custom fields
    • custom fields could be standardized as well (ECS + Beats logging schema)
    • fully privte custom fields must be handled differently (e.g. embedd json object as string) to guanratee 'always indexible'
  3. "Reasonable" performance
  4. Multiple outputs support:
  • standard out/error (depending on level) for containers, systemd, journald
  • windows event logs
  • macOS OSLog
  • file rotation: (configure triggers, rotation with gzip, ...)
  • log event routing:
    • e.g. send Error/Critical to stderr, other logs to stdout
    • e.g. send all logs to rotated log file, but Error/Critical to windows events logs
  1. Selector based log-level
  • all selectors start with default log level
  • log level can be changed in Beats config file
  • log level can be changed via HTTP API (Problem: how to define and discover existing selectors?)
  1. Good support for testing:
  • create logger instance writing it's logs to testing.T
  • create logger instance writing logs to a byte buffer or an array of map[string]interface{}
  • => no global logging
  1. Good error support:
  • errors can have context
  • errors can wrap other errors.

@ph
Copy link
Contributor Author

ph commented Jan 21, 2020

Log level and selectors changed dynamically for Agent. That would be useful for remote debugging. Imagine an inspect button in the UI that displays live log.

@urso
Copy link

urso commented Mar 2, 2020

Option 3

Could it be possible to mix Option 1 and Option 2, meaning that we delegate file rotation and exporter to Zap but we enforce everything on top of it. This actually mean that Zab mostly deal with raw bytes or string? I don't know if its possibe, WDYT @urso?

This functionality is not provided by zap. Only Fields + JSON encoder are provided by Zap. Almost everything else is custom in Beats.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@botelastic
Copy link

botelastic bot commented Sep 14, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added the Stalled label Sep 14, 2021
@Mpdreamz
Copy link
Member

@ruflin Is this something we can still look into before 8.0?

Switching default logging formats post 8.0 becomes a trickier process.

@botelastic botelastic bot removed the Stalled label Oct 11, 2021
@ruflin ruflin added the Team:Elastic-Agent Label for the Agent team label Oct 12, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Elastic-Agent)

@ruflin
Copy link
Contributor

ruflin commented Oct 12, 2021

I added the Elastic Agent team label and ping @jlind23 on timing questions ;-)

@jlind23
Copy link
Collaborator

jlind23 commented Oct 12, 2021

Which option was the chosen one finally?

@jlind23 jlind23 added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Oct 12, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@Mpdreamz
Copy link
Member

For more context. Both Elasticsearch and Kibana will emit ECS logs OOTB with 8.0, APM server already does for a while. It would be a missed opportunity if we don't make the same move observability wide.

@jlind23
Copy link
Collaborator

jlind23 commented Oct 12, 2021

@Mpdreamz can we include this change in 8.0? Doing it for 7.16 is impossible as we are almost running out of time.

@ruflin
Copy link
Contributor

ruflin commented Oct 13, 2021

@Mpdreamz What is apm-server using for the ecs logs? If it works well we should follow the same path in Beats.

@Mpdreamz
Copy link
Member

I believe APM-server uses zap but is not yet on https://github.com/elastic/ecs-logging-go-zap and uses a custom formatter. We ultimately want to hop on ecs-logging-go-zap as well.

cc @simitt let me know if the above is not true :)

@simitt
Copy link
Contributor

simitt commented Oct 14, 2021

I added basic ECS logging support to the custom logp handling in libbeat some time ago #17974, and it is using ecs-logging-go-zap under the hood.

This can be enabled by setting logging.ecs:true, which apm-server already defaults to since a couple minor versions. It is up to every beat to do the same.
When enabled, It adds the ECS log information according to the MVP spec, so only minimal ECS information for now. But every beat can add additional information when adding a log line. For example, apm-server sets additional log information in the middleware. Every beat is still responsible to not add non-ecs compatible fields in its logs.

So, basically it should be as easy as turning on logging.ecs:true by default in every beat to get minimal ECS information added to the logs.

@Mpdreamz
Copy link
Member

Thanks for the correction @simitt 👍

@jlind23 if this can go into 8.0 that would be a great outcome 👍

@andresrc andresrc added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Oct 14, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@Mpdreamz
Copy link
Member

Also just to confirm Elastic Agent uses whatever the default is in libbeat correct? So this would apply to Elastic Agent too?

@ruflin
Copy link
Contributor

ruflin commented Oct 18, 2021

Elastic Agent itself is not a beat and not directly based on libbeat. @michalpristas could you comment the logging behaviour?

@michel-laterman
Copy link
Contributor

Associated PRs:

#28573 - remove code for 8.0
elastic/observability-docs#1208 - remove setting from elastic-agent docs for 8.0
#28772 - deprecate settings for 7.16
elastic/observability-docs#1206 - mark settings as deprecated in elastic-agent docs for 7.16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:Integrations Label for the Integrations team v8.0.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants