Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Allow configuration of dataset value for Input packages #145903

Closed
7 of 9 tasks
kpollich opened this issue Nov 21, 2022 · 10 comments · Fixed by #147015
Closed
7 of 9 tasks

[Fleet] Allow configuration of dataset value for Input packages #145903

kpollich opened this issue Nov 21, 2022 · 10 comments · Fixed by #147015
Assignees
Labels
Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@kpollich
Copy link
Member

kpollich commented Nov 21, 2022

Child of #133296

For packages with type: input, users should be able to provide a new or existing dataset for their logs/metrics. This is a common use case when ingesting custom logs that conform to another integration's field structure. For instance, ingesting custom logs from S3 and routing them through the Nginx integration's ingest process is cumbersome today, but this feature will make this more seamless.

Implementation

  • API
    • Create an API endpoint (or re-use one for the Fleet /data_streams API if possible) to list all existing dataset values
    • Ensure the create/edit integration policy APIs can handle the presence of a data_stream.dataset field
  • UI
    • For integrations with type: input, display the dataset form field for the data_stream.dataset variable as a "top-level" field
    • dataset input is pre-populated with all existing datasets. Each dropdown option is of the format <type>-<dataset>, default dropdown value is generic
      • Any dropdown values that start with the same type as the selected integration appear at the top of the list
        • e.g. If I'm adding a policy for the log integration, options that begin with log-* should appear first
    • The "index templates" and "ingest pipelines" customization UI is not displayed for input packages
    • Ensure that the output_permissions list in the agent policy sent to Fleet Server includes the configured dataset
    • For integrations with type: integration, set the data_stream.dataset field value via a hidden input

Designs

Policy editor

image

dataset dropdown

image

@kpollich kpollich added the Team:Fleet Team label for Observability Data Collection Fleet team label Nov 21, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@juliaElastic
Copy link
Contributor

The "index templates" and "ingest pipelines" customization UI is not displayed for input packages

@kpollich Could you clarify what do you mean by these?

For integrations with type: integration, set the data_stream.dataset field value via a hidden input

Do you mean to set the dataset to generic and not allow editing by the user?

@kpollich
Copy link
Member Author

kpollich commented Dec 5, 2022

Could you clarify what do you mean by these?

When editing integration package policies, we show the existing @package and @custom index templates and ingest pipelines in a table UI that links out to the stack management UI. This shouldn't be displayed for input packages. This might already be the case.

Do you mean to set the dataset to generic and not allow editing by the user?

The dataset value should come from each input on these policies. e.g. nginx.access or similar. We should set dataset to this value explicitly. e.g. the custom log integration currently defines a datastream.dataset variable to accomplish this here: https://github.com/elastic/integrations/blob/main/packages/log/data_stream/log/manifest.yml#L14-L20

@juliaElastic
Copy link
Contributor

juliaElastic commented Dec 6, 2022

When editing integration package policies, we show the existing @Package and @Custom index templates and ingest pipelines in a table UI that links out to the stack management UI. This shouldn't be displayed for input packages. This might already be the case.

  1. Thanks, got it, these show up only on the Edit integration policy page. This is already implemented:
    {/* Only show datastream pipelines and mappings on edit and not for input packages*/}
    {isPackagePolicyEdit && !isInputOnlyPackage && (

The dataset value should come from each input on these policies. e.g. nginx.access or similar. We should set dataset to this value explicitly. e.g. the custom log integration currently defines a datastream.dataset variable to accomplish this here: https://github.com/elastic/integrations/blob/main/packages/log/data_stream/log/manifest.yml#L14-L20

  1. @kpollich So for integration packages we want to hide the dataset field that is showing up currently? e.g. Custom Logs is integration type in EPR.
    For some packages like nginx, the dataset var is already not visible on the UI.
    Is there any disadvantages of leaving the dataset field visible where it is by default?

image

  1. Another question: I noticed that the /data_streams API only returns the created data streams, that are already ingesting data. E.g. I only see prefixes with system and elastic_agent in my local machine.
    Aren't we supposed to show the datasets of all installed packages?

Any dropdown values that start with the same type as the selected integration appear at the top of the list
e.g. If I'm adding a policy for the log integration, options that begin with log-* should appear first

  1. Do we mean the data stream name here? The datasets never start with logs- prefix. e.g. index: "logs-elastic_agent-default", dataset: "elastic_agent".

Ensure that the output_permissions list in the agent policy sent to Fleet Server includes the configured dataset

  1. This seems to be working already, the output_permissions show the index template of the selected dataset.

image

image

@kpollich
Copy link
Member Author

kpollich commented Dec 6, 2022

So for integration packages we want to hide the dataset field that is showing up currently? e.g. Custom Logs is integration type in EPR.

No, we don't need to hide this field. I was more trying to make sure that data_stream.dataset was handled the same way on input packages and integration packages to prevent future breakages or divergence in their behavior. This seems like we have everything set up correctly around this field for now.

Another question: I noticed that the /data_streams API only returns the created data streams, that are already ingesting data. E.g. I only see prefixes with system and elastic_agent in my local machine.
Aren't we supposed to show the datasets of all installed packages?

The intent was to show every data stream, regardless of whether it's received data or not. I'm not sure if this possible, though, thinking about it more. Data streams may be lazily initialized by Elasticsearch. e.g. logs-nginx.* may not exist until the first document is written to that data stream. I'm fine if this just uses the existing data stream API for the first implementation, even if it's limited in this way. If it's an Elasticsearch limitation, there's not much we can do about it anyway outside of building a list of data streams dynamically from the list of all installed packages and their policies? Seems like that'd be a massively expensive operation though.

Looking at the stack management UI, this does seem to be an Elasticsearch limitation. Data streams don't actually "exist" until a document is written for the first time. Let's not put too much effort into working around this limitation.

Do we mean the data stream name here? The datasets never start with logs- prefix. e.g. index: "logs-elastic_agent-default", dataset: "elastic_agent".

You're right - I was under the impression the custom logs package used its package name (log) as part of the dataset, but it simply provides the generic dataset by default as we've mentioned here. The requirement still stands though, but consider that datasets like nginx.access should appear at the top of the list when configuring the nginx integration. Does this sound reasonable?

@juliaElastic
Copy link
Contributor

juliaElastic commented Dec 6, 2022

The requirement still stands though, but consider that datasets like nginx.access should appear at the top of the list when configuring the nginx integration. Does this sound reasonable?

@kpollich I see, so we can match on the package name, not the data stream name.

@kpollich
Copy link
Member Author

kpollich commented Dec 6, 2022

Yep - thanks for all the well-formed questions to clarify ❤️

@juliaElastic
Copy link
Contributor

juliaElastic commented Dec 6, 2022

I noticed something that might be a bug:
Some datasets have multiple data stream definitions e.g. elastic_agent.fleet_server has logs- and metrics-, but the generated output_permissions only include the logs- one. I think in this case we want to include both, right?

image

image

image

@kpollich
Copy link
Member Author

kpollich commented Dec 6, 2022

Hmm that does seem like an issue, but I'm not 100% familiar with the permissions model here. @nchaulet could you maybe take a look?

@juliaElastic
Copy link
Contributor

juliaElastic commented Dec 6, 2022

After checking the code, I found that the logic of building output_permissions it taking the data streams with a matching types from the package policy input (e.g. logs in case of Custom logs), which makes sense, because then we don't need permissions to metrics. So it doesn't look like a bug after all.

https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/server/services/agent_policies/package_policies_to_agent_permissions.ts#L74-L113

image

juliaElastic added a commit that referenced this issue Dec 6, 2022
## Summary

Closes #145903

Added datasets combo box to input type packages `Dataset name` variable
with the option of creating a new one.

Using the existing `/data_streams` API to show the list of all datasets.

Package policy create/edit API already supports setting the value of
`data_stream.dataset` (input packages should have this variable as
described in #133296)

To verify:
- Start local EPR with input type packages as described
[here](#140035 (comment))
- Add `Custom Logs` integration
- Verify that Dataset name displays a dropdown and the selected value is
persisted.
- Verify that a new value can be entered in Dataset name
- Verify that Edit integration policy displays the existing value and
allows selecting/creating another one.

<img width="924" alt="image"
src="https://user-images.githubusercontent.com/90178898/205680787-3ef7da08-f5f0-4f05-b8d7-3a1c0a6a3d56.png">
<img width="1008" alt="image"
src="https://user-images.githubusercontent.com/90178898/205679497-935fe450-ce78-4f0b-943e-58e7f851f44b.png">
<img width="1006" alt="image"
src="https://user-images.githubusercontent.com/90178898/205679589-fedbbe0e-2c4d-4c00-986f-34ec5c2eb2f6.png">

Added ordering of datasets to move up those that start with the package
name e.g. `system*` datasets come first if adding a `system`
integration. Other than that ordering datasets alphabetically.

<img width="482" alt="image"
src="https://user-images.githubusercontent.com/90178898/205924837-a9807c92-2fe4-431a-88c6-f161d00812fb.png">

The rest of the requirements seem to be already implemented, see
[comments](#145903 (comment))

### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants