Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'expand_keys' option to JSON input/processor #22849

Merged
merged 11 commits into from
Dec 14, 2020

Conversation

axw
Copy link
Member

@axw axw commented Dec 2, 2020

What does this PR do?

Add an 'expand_keys' option to Filebeat's JSON input, and
to the decode_json_fields processor. If true, the decoded
JSON objects' keys will be recursively expanded, changing
dotted keys into a hierarchical object structure.

Objects will be recursively merged. In case of duplicate keys
at any level, the values must both be objects or an error will
result; decoding will fail, and the existing error handling
mechanisms will apply.

This is an alternative to #20489. The main differences are:

  • errors are easily observed using the existing add_error_key options
  • objects are expanded in-place, minimising overhead in the case of non-dotted fields
  • expansion conflicts are considered a JSON decoding error, and the decoded JSON
    fields are not added to the event. This prevents conflicts when indexing in Elasticsearch,
    which will again try to expand the dotted fields and lead to a mapping conflict.

Why is it important?

See #17021

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Build filebeat, then run the following (first is valid, second is an example of a conflict.)

echo '{"log.level": "info", "log.logger": "blah"}' | ./filebeat -c /dev/null --strict.perms=false -E output.console.enabled=true -E filebeat.inputs='[{type: stdin, enabled:true, json.keys_under_root:true, json.add_error_key:true, json.expand_keys: true}]'

{"@timestamp":"2020-12-02T09:46:06.525Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.0.0"},"log":{"offset":0,"file":{"path":""},"level":"info","logger":"blah"},"input":{"type":"stdin"},"ecs":{"version":"1.6.0"},"host":{"name":"goat"},"agent":{"ephemeral_id":"6ffa00bd-6833-4f31-a10b-7c45f023b021","id":"3b32c173-8abf-47ea-a202-1b5d05cd8ff4","name":"goat","type":"filebeat","version":"8.0.0"}}
echo '{"log.level": "info", "log.level.name": "blah"}' | ./filebeat -c /dev/null --strict.perms=false -E output.console.enabled=true -E filebeat.inputs='[{type: stdin, enabled:true, json.keys_under_root:true, json.add_error_key:true, json.expand_keys: true}]'
{"@timestamp":"2020-12-02T09:46:09.498Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.0.0"},"error":{"message":"cannot expand \"log.level.name\": found conflicting key","type":"json"},"log":{"offset":0,"file":{"path":""}},"input":{"type":"stdin"},"ecs":{"version":"1.6.0"},"host":{"name":"goat"},"agent":{"ephemeral_id":"cee72247-8697-4404-b6c5-b0992742fb7c","id":"3b32c173-8abf-47ea-a202-1b5d05cd8ff4","name":"goat","type":"filebeat","version":"8.0.0"}}

Related issues

Closes #17021
Replaces #20489

Add an 'expand_keys' option to Filebeat's JSON input, and
to the decode_json_fields processor. If true, the decoded
JSON objects' keys will be recursively expanded, changing
dotted keys into a hierarchical object structure.

If there are two keys which expand to the same, then they
must both be objects or an error will result, decoding will
fail, and the existing error handling mechanisms will apply.
@axw axw added enhancement v8.0.0 Team:Services (Deprecated) Label for the former Integrations-Services team v7.11.0 labels Dec 2, 2020
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Dec 2, 2020
Copy link
Member

@felixbarny felixbarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@elasticmachine
Copy link
Collaborator

elasticmachine commented Dec 2, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #22849 updated

  • Start Time: 2020-12-11T01:38:45.238+0000

  • Duration: 53 min 19 sec

Test stats 🧪

Test Results
Failed 0
Passed 17417
Skipped 1379
Total 18796

Steps errors 2

Expand to view the steps failures

Terraform Apply on x-pack/metricbeat/module/aws
  • Took 0 min 15 sec . View more details on here
Terraform Apply on x-pack/metricbeat/module/aws
  • Took 0 min 15 sec . View more details on here

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 17417
Skipped 1379
Total 18796

Copy link
Member

@graphaelli graphaelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, I like this approach much better than as a processor

//
// Note that ExpandFields is descructive, and in the case of an error the
// map may be left in a semi-expanded state.
func ExpandFields(m common.MapStr) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible / make sense to make this a method on common.MapStr instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It certainly is. I gathered from #20489 (comment) that @urso would prefer not to add to MapStr, but I can rearrange if preferred. Unless there's an expectation of reuse I typically avoid adding to common types/packages to avoid creating huge interfaces.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there's an expectation of reuse I typically avoid adding to common types/packages to avoid creating huge interfaces.

Agreed. The MapStr interface is too big with redundant functionality at times. I'd rather have a small interface for Events in the future with a set of functions that operate on the public interface.

If ExpandFields is not used somewhere else I would not export it (keep package interface smaller).
For consistency, if it is supposed to be used in other places, move it (as function) to the libbeat/common/mapstr.go. The libbeat/common package is where MapStr and helpers for MapStr currently live in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, it's not needed elsewhere (I originally thought it would be) - so I'll unexport it.

@axw axw marked this pull request as ready for review December 3, 2020 02:13
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@axw
Copy link
Member Author

axw commented Dec 3, 2020

I have made corresponding change to the default configuration files

Is it premature to add this? Is it a common enough use case that we should include it in every config file?

@simitt
Copy link
Contributor

simitt commented Dec 3, 2020

I have made corresponding change to the default configuration files

Is it premature to add this? Is it a common enough use case that we should include it in every config file?

IMO it would be nice to have this added and I don't see any backwards compatibility issues - in case of a conflict, with current behavior the event cannot not be ingested, with this change the json object itself still gets dropped but an error information gets ingested. This could be helpful to find logging issues.

@felixbarny
Copy link
Member

Is it premature to add this? Is it a common enough use case that we should include it in every config file?

++ on having that in by default. I'd even suggest extending the dos a bit and mention that when using ECS loggers it's preferred to set this to true.

@axw
Copy link
Member Author

axw commented Dec 4, 2020

IMO it would be nice to have this added and I don't see any backwards compatibility issues - in case of a conflict, with current behavior the event cannot not be ingested, with this change the json object itself still gets dropped but an error information gets ingested. This could be helpful to find logging issues.

To be clear I was just asking if we should mention the config in the config file, not turn it on by default. If we were to turn this feature on by default there would be a subtle backwards-compatibility issue: the document _source would have a different structure. I'm not sure what guarantees we make there, I'll wait for guidance from @urso.

++ on having that in by default. I'd even suggest extending the dos a bit and mention that when using ECS loggers it's preferred to set this to true.

I've added the config to filebeat.reference.yml. I agree that it would be nice to mention it should be enabled when using ECS logging, but there's not currently a very good anchor point for a link. There's https://www.elastic.co/guide/en/ecs-logging/java/current/index.html, but no encapsulating "ECS Logging" topic. I'll defer until we have a good place to link to (CC @bmorelli25)

@felixbarny
Copy link
Member

There's an open issue for an overview documentation for ECS logging: elastic/ecs-logging#31.

For the time being, we might just link to https://github.com/elastic/ecs-logging?

@axw
Copy link
Member Author

axw commented Dec 7, 2020

There's an open issue for an overview documentation for ECS logging: elastic/ecs-logging#31.

Nice!

For the time being, we might just link to https://github.com/elastic/ecs-logging?

Good idea, I've added a sentence to the docs: "This setting should be enabled when the input is produced by an ECS logger."

Copy link
Member

@bmorelli25 bmorelli25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown link --> Asciidoc link

//
// Note that ExpandFields is descructive, and in the case of an error the
// map may be left in a semi-expanded state.
func ExpandFields(m common.MapStr) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there's an expectation of reuse I typically avoid adding to common types/packages to avoid creating huge interfaces.

Agreed. The MapStr interface is too big with redundant functionality at times. I'd rather have a small interface for Events in the future with a set of functions that operate on the public interface.

If ExpandFields is not used somewhere else I would not export it (keep package interface smaller).
For consistency, if it is supposed to be used in other places, move it (as function) to the libbeat/common/mapstr.go. The libbeat/common package is where MapStr and helpers for MapStr currently live in.

} else {
oldMap, oldIsMap := getMap(old)
if !oldIsMap {
return fmt.Errorf("cannot expand %q: found conflicting key", k)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to happen on type conflict only. I think we have similar cases in metricbeat. In that case we modify the key for old to be <k>.value. If the new object has a field named value we can drop old (because it is overwritten).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is effectively the same as #22849 (comment) - or is this something else?

The intended behaviour is to recursively merge objects, returning an error if there are two matching keys which either both have scalar values, or with one having a scalar value and one having an object value. This is intentionally strict for the first implementation; we could later either relax by default, or add options to relax.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentionally strict for the first implementation; we could later either relax by default, or add options to relax.

I'm ok if we follow up with this one later on.

Yeah, the two comments belong together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #23135 to track this.

libbeat/common/jsontransform/expand.go Show resolved Hide resolved
libbeat/common/jsontransform/expand.go Show resolved Hide resolved
logger := logp.NewLogger("jsonhelper")
if expandKeys {
if err := ExpandFields(keys); err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we error here keys is in an unknown state. Do we need to clone keys before this call in order to keep it intact on error? Logging the original document would be needed for users to understand why things did go wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting the original document to show up under message, like when a JSON decoding error occurs. That doesn't happen though. Is there any reason why we shouldn't do that, for a consistent debugging experience, instead of logging?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to respond to the first part:

if we error here keys is in an unknown state. Do we need to clone keys before this call in order to keep it intact on error?

As long as we include the original input (in message), I don't see a need. I'm not intimately familiar with Filebeat though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON decoder in the log input does not store the original raw line in the message field. One can configure a custom message field (which is extracted from the json document, not the original line), but by default message will not be set.

Anyways, the new fields are merged into the event after expansion via WriteJSONFields. Neither the processor nor the json decoder in the log input reference to any fields in keys (or keys itself). This should make the operation safe. No need to clone.

@axw axw requested a review from urso December 14, 2020 08:19
@urso
Copy link

urso commented Dec 14, 2020

To be clear I was just asking if we should mention the config in the config file, not turn it on by default.

Yes, we should mention it and keep it turned off by default.

@urso urso added the needs_backport PR is waiting to be backported to other branches. label Dec 14, 2020
@urso urso merged commit 4f4a553 into elastic:master Dec 14, 2020
@urso urso self-assigned this Dec 14, 2020
urso pushed a commit to urso/beats that referenced this pull request Dec 14, 2020
Co-authored-by: Brandon Morelli <[email protected]>
(cherry picked from commit 4f4a553)
@urso urso removed the needs_backport PR is waiting to be backported to other branches. label Dec 14, 2020
@graphaelli graphaelli mentioned this pull request Dec 14, 2020
9 tasks
urso pushed a commit that referenced this pull request Dec 14, 2020
…cessor (#23104)

Co-authored-by: Brandon Morelli <[email protected]>
(cherry picked from commit 4f4a553)

Co-authored-by: Andrew Wilkins <[email protected]>
@axw axw deleted the json-expand-fields branch December 15, 2020 03:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Team:Services (Deprecated) Label for the former Integrations-Services team v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add de-dot processor that converts dotted field names to nested objects
7 participants