Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[elastic-agent] Update package with additional data streams #1298

Merged
merged 3 commits into from
Aug 5, 2021

Conversation

ruflin
Copy link
Contributor

@ruflin ruflin commented Jul 8, 2021

List of data streams was missing created by Elastic Agent for monitoring.

@elasticmachine
Copy link

elasticmachine commented Jul 8, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-08-05T06:52:04.796+0000

  • Duration: 9 min 27 sec

  • Commit: 7de1829

Test stats 🧪

Test Results
Failed 0
Passed 22
Skipped 0
Total 22

Trends 🧪

Image of Build Times

Image of Tests

@michalpristas
Copy link
Contributor

list of processes agent is able to run

apm-server
endpoint
filebeat
fleet-server
heartbeat
metricbeat
osquerybeat
packetbeat

i think endpoint, heartbeat and packetbeat are missing,

@ruflin
Copy link
Contributor Author

ruflin commented Jul 8, 2021

I assume the dataset names for apm-server and fleet-server both use _ instead of -?

@ruflin
Copy link
Contributor Author

ruflin commented Jul 8, 2021

Why is the dataset for logs called logs-elastic_agent and the one for metrics metrics-elastic_agent.elastic_agent?

@andrewkroh
Copy link
Member

What about the type: log data streams for these processes? Could this cover those as well.

@ruflin
Copy link
Contributor Author

ruflin commented Jul 8, 2021

Yes, I just hadn't pushed the most recent change yet as I'm still trying to find some issues around why not all are recognised properly. I pushed the most recent changes but it is not fully working yet. @andrewkroh can you have a quick look if this is what you meant?

@andrewkroh
Copy link
Member

Yes, that is what I was expected assuming there is coverage for all the processes Michal listed above. 👍 Not sure why that one just has dataset: elastic_agent. That seemed wrong.

One other concern I have is around indexing the structured logs from Beats. There could be mapping conflicts since there is no guarantee as to what a structured log field may contain (e.g. calls to log.Debugw("something happened", "metric", 32.6) vs log.Infow("booting up", "metric", MapStr{"time": 18.3})). One log line might include a key called metric that is an number and another log line metric that is an object and 💥 .

My personal approach to this when collecting Beat logs is to decode the message into json. Then copy out the known fields like @timestamp, log.level, message, and et cetera. Then leave all the other fields under json and index the json field as flattened. The prevents both conflicts and mapping explosions.

@ruflin
Copy link
Contributor Author

ruflin commented Jul 9, 2021

It seems we ingest logs into logs-elastic_agent-default and metrics into metrics-elastic_agent.elastic_agent-default. I agree, this is odd and had a quick conversation with @michalpristas . @michalpristas I wonder if we should consider this a bug and change either metrics or log. After going GA we can't really change it anymore. Thoughts? @andrewkroh Which one do you prefer?

An alternative instead of using the json flattened trick could be to just not index all the other fields and use runtime fields if query on it is needed?

@andrewkroh
Copy link
Member

I wonder if we should consider this a bug and change either metrics or log... Which one do you prefer?

I prefer convention used in naming metrics-elastic_agent.elastic_agent-default. Despite being a little redundant, that seems more consistent with all other data stream names.

@andrewkroh
Copy link
Member

Relates: #953

@ruflin
Copy link
Contributor Author

ruflin commented Jul 15, 2021

Reminder for myself: Currently still missing are endpoint and auditbeat logs and metrics.

@ruflin
Copy link
Contributor Author

ruflin commented Jul 15, 2021

I'm making progress here. By now all processes/data_streams with metrics and logs should be part of the package.

  • Use elastic_agent.elastic_agent instead of elastic_agent: It would have been nice to make this change but it is too late. It requires changes in Elastic Agent and Kibana. It is something we will try to clean up later. The package contains mapping for both options for now so we can potentially migrate later.
  • Metric Fields: For all metrics, I added the default metrics fields. I'm not sure if all processes expose the same metrics (for example endpoint) so this might need further cleanup later on
  • Logs: The message field and elastic-agent fields are added
  • Dynamic false: Dynamic fields is disabled. This means only the fields specified in the package are index. I expect that for all the other fields runtime fields have to be used on query time. Over time we can add more fields if needed. @andrewkroh Thoughts?

@ruflin
Copy link
Contributor Author

ruflin commented Jul 15, 2021

Currently getting the following error on CI, investigating:

[2021-07-15T12:10:32.419Z] 2021/07/15 12:10:32 DEBUG installing package...

[2021-07-15T12:10:32.419Z] 2021/07/15 12:10:32 DEBUG POST http://127.0.0.1:5601/api/fleet/epm/packages/elastic_agent-0.9.0

[2021-07-15T12:10:32.419Z] Error: error running package asset tests: could not complete test run: can't install the package: can't install the package: could not install package; API status code = 409; response body = {"statusCode":409,"error":"Conflict","message":"Concurrent installation or upgrade of elastic_agent-0.9.0 detected, aborting."}

I'll mark the PR as ready for review to get more eyes / opinions on it.

@ruflin ruflin marked this pull request as ready for review July 15, 2021 14:12
@andrewkroh
Copy link
Member

Dynamic false: Dynamic fields is disabled. This means only the fields specified in the package are index. I expect that for all the other fields runtime fields have to be used on query time. Over time we can add more fields if needed. @andrewkroh Thoughts?

That sounds like a safe approach to begin with for these log data streams. 👍

@ruflin
Copy link
Contributor Author

ruflin commented Jul 16, 2021

@jen-huang @afgomez Do you have any idea why we get the following error on CI? "Conflict","message":"Concurrent installation or upgrade of elastic_agent-0.9.0 detected, aborting."}. If I setup the stack manually / elastic-package it works. Could it be in some way related to default packages or similar? When exactly is this error shown?

List of data streams was missing created by Elastic Agent for monitoring.

push logs one too

add endpoint and auditbeat

cleanup container metrics

update changelog

update changelog and formatting

rename endpoint to endpoint_security
@ruflin ruflin force-pushed the update-elastic_agent-package branch from 0a7467a to 0948aad Compare August 3, 2021 11:57
@ruflin
Copy link
Contributor Author

ruflin commented Aug 3, 2021

I incremented the version to 1.1.0 on this as 1.0 has been released and changed the compatibility to 7.15 so we can get this out quickly for testing.

@ruflin
Copy link
Contributor Author

ruflin commented Aug 5, 2021

@nchaulet During development I hit the following error:

[2021-07-15T12:10:32.419Z] Error: error running package asset tests: could not complete test run: can't install the package: can't install the package: could not install package; API status code = 409; response body = {"statusCode":409,"error":"Conflict","message":"Concurrent installation or upgrade of elastic_agent-0.9.0 detected, aborting."}

I found now the reason and it was that I had specified the same data stream twice. Of course this is a bug in the package and I'm glad Kibana / CI complained but I'm a bit surprised by the error. Is this expected?

There is also an other error I saw (unfortunately did not copy) that told me foo@custom template already exists which I think was for the duplicated data stream. As @Custom for an upgrade always exists I was wondering if this could be a potential issue?

@ruflin ruflin requested review from mtojek and michalpristas August 5, 2021 06:56
@ruflin
Copy link
Contributor Author

ruflin commented Aug 5, 2021

This one should be ready for a round of reviews. It puts the minimal mappings in place and nothing more. Over time we should add more mappings for the ones we need.

Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I didn't analyze all fields.

elasticsearch:
index_template:
mappings:
dynamic: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: is it enabled by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is enabled by default in all data streams. There were some discussions in the past around turning this off especially now with runtime fields around.

@ruflin ruflin merged commit e91271a into elastic:master Aug 5, 2021
@ruflin ruflin deleted the update-elastic_agent-package branch August 5, 2021 11:34
@@ -0,0 +1,12 @@
- name: data_stream.type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtojek Sorry, only stumbled over this after merging. Where did we land on who is responsible to set the type? Will the tooling do this or Fleet?

What about module and dataset? There was never an Elastic Agent module does it mean it can be skipped? @andrewkroh ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about module and dataset?

There's no requirement to set them. I think the cost is really low since they are constant_keyword so I personally would set them to have consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I add module here, what would you set the value here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants