[Fleet] Allow configuration of `dataset` value for Input packages #145903

kpollich · 2022-11-21T17:51:47Z

For packages with type: input, users should be able to provide a new or existing dataset for their logs/metrics. This is a common use case when ingesting custom logs that conform to another integration's field structure. For instance, ingesting custom logs from S3 and routing them through the Nginx integration's ingest process is cumbersome today, but this feature will make this more seamless.

Implementation

API
- Create an API endpoint (or re-use one for the Fleet /data_streams API if possible) to list all existing dataset values
- Ensure the create/edit integration policy APIs can handle the presence of a data_stream.dataset field
UI
- For integrations with type: input, display the dataset form field for the data_stream.dataset variable as a "top-level" field
- dataset input is pre-populated with all existing datasets. Each dropdown option is of the format <type>-<dataset>, default dropdown value is generic
  - Any dropdown values that start with the same type as the selected integration appear at the top of the list
    - e.g. If I'm adding a policy for the log integration, options that begin with log-* should appear first
- The "index templates" and "ingest pipelines" customization UI is not displayed for input packages
- Ensure that the output_permissions list in the agent policy sent to Fleet Server includes the configured dataset
- For integrations with type: integration, set the data_stream.dataset field value via a hidden input

Designs

Policy editor

`dataset` dropdown

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-11-21T17:51:49Z

Pinging @elastic/fleet (Team:Fleet)

juliaElastic · 2022-12-05T16:16:05Z

The "index templates" and "ingest pipelines" customization UI is not displayed for input packages

@kpollich Could you clarify what do you mean by these?

For integrations with type: integration, set the data_stream.dataset field value via a hidden input

Do you mean to set the dataset to generic and not allow editing by the user?

kpollich · 2022-12-05T16:30:05Z

Could you clarify what do you mean by these?

When editing integration package policies, we show the existing @package and @custom index templates and ingest pipelines in a table UI that links out to the stack management UI. This shouldn't be displayed for input packages. This might already be the case.

Do you mean to set the dataset to generic and not allow editing by the user?

The dataset value should come from each input on these policies. e.g. nginx.access or similar. We should set dataset to this value explicitly. e.g. the custom log integration currently defines a datastream.dataset variable to accomplish this here: https://github.com/elastic/integrations/blob/main/packages/log/data_stream/log/manifest.yml#L14-L20

juliaElastic · 2022-12-06T10:29:51Z

When editing integration package policies, we show the existing @Package and @Custom index templates and ingest pipelines in a table UI that links out to the stack management UI. This shouldn't be displayed for input packages. This might already be the case.

Thanks, got it, these show up only on the Edit integration policy page. This is already implemented:

kibana/x-pack/plugins/fleet/public/applications/fleet/sections/agent_policy/create_package_policy_page/components/steps/components/package_policy_input_stream.tsx

Lines 327 to 328 in 983f473

    
           {/* Only show datastream pipelines and mappings on edit and not for input packages*/} 
        
           {isPackagePolicyEdit && !isInputOnlyPackage && (

The dataset value should come from each input on these policies. e.g. nginx.access or similar. We should set dataset to this value explicitly. e.g. the custom log integration currently defines a datastream.dataset variable to accomplish this here: https://github.com/elastic/integrations/blob/main/packages/log/data_stream/log/manifest.yml#L14-L20

@kpollich So for integration packages we want to hide the dataset field that is showing up currently? e.g. Custom Logs is integration type in EPR.
For some packages like nginx, the dataset var is already not visible on the UI.
Is there any disadvantages of leaving the dataset field visible where it is by default?

Another question: I noticed that the /data_streams API only returns the created data streams, that are already ingesting data. E.g. I only see prefixes with system and elastic_agent in my local machine.
Aren't we supposed to show the datasets of all installed packages?

Any dropdown values that start with the same type as the selected integration appear at the top of the list
e.g. If I'm adding a policy for the log integration, options that begin with log-* should appear first

Do we mean the data stream name here? The datasets never start with logs- prefix. e.g. index: "logs-elastic_agent-default", dataset: "elastic_agent".

Ensure that the output_permissions list in the agent policy sent to Fleet Server includes the configured dataset

This seems to be working already, the output_permissions show the index template of the selected dataset.

kpollich · 2022-12-06T13:01:15Z

So for integration packages we want to hide the dataset field that is showing up currently? e.g. Custom Logs is integration type in EPR.

No, we don't need to hide this field. I was more trying to make sure that data_stream.dataset was handled the same way on input packages and integration packages to prevent future breakages or divergence in their behavior. This seems like we have everything set up correctly around this field for now.

Another question: I noticed that the /data_streams API only returns the created data streams, that are already ingesting data. E.g. I only see prefixes with system and elastic_agent in my local machine.
Aren't we supposed to show the datasets of all installed packages?

The intent was to show every data stream, regardless of whether it's received data or not. I'm not sure if this possible, though, thinking about it more. Data streams may be lazily initialized by Elasticsearch. e.g. logs-nginx.* may not exist until the first document is written to that data stream. I'm fine if this just uses the existing data stream API for the first implementation, even if it's limited in this way. If it's an Elasticsearch limitation, there's not much we can do about it anyway outside of building a list of data streams dynamically from the list of all installed packages and their policies? Seems like that'd be a massively expensive operation though.

Looking at the stack management UI, this does seem to be an Elasticsearch limitation. Data streams don't actually "exist" until a document is written for the first time. Let's not put too much effort into working around this limitation.

Do we mean the data stream name here? The datasets never start with logs- prefix. e.g. index: "logs-elastic_agent-default", dataset: "elastic_agent".

You're right - I was under the impression the custom logs package used its package name (log) as part of the dataset, but it simply provides the generic dataset by default as we've mentioned here. The requirement still stands though, but consider that datasets like nginx.access should appear at the top of the list when configuring the nginx integration. Does this sound reasonable?

juliaElastic · 2022-12-06T13:14:34Z

The requirement still stands though, but consider that datasets like nginx.access should appear at the top of the list when configuring the nginx integration. Does this sound reasonable?

@kpollich I see, so we can match on the package name, not the data stream name.

kpollich · 2022-12-06T13:20:23Z

Yep - thanks for all the well-formed questions to clarify ❤️

juliaElastic · 2022-12-06T13:35:46Z

I noticed something that might be a bug:
Some datasets have multiple data stream definitions e.g. elastic_agent.fleet_server has logs- and metrics-, but the generated output_permissions only include the logs- one. I think in this case we want to include both, right?

kpollich · 2022-12-06T13:48:47Z

Hmm that does seem like an issue, but I'm not 100% familiar with the permissions model here. @nchaulet could you maybe take a look?

juliaElastic · 2022-12-06T13:57:10Z

After checking the code, I found that the logic of building output_permissions it taking the data streams with a matching types from the package policy input (e.g. logs in case of Custom logs), which makes sense, because then we don't need permissions to metrics. So it doesn't look like a bug after all.

https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/server/services/agent_policies/package_policies_to_agent_permissions.ts#L74-L113

## Summary Closes #145903 Added datasets combo box to input type packages `Dataset name` variable with the option of creating a new one. Using the existing `/data_streams` API to show the list of all datasets. Package policy create/edit API already supports setting the value of `data_stream.dataset` (input packages should have this variable as described in #133296) To verify: - Start local EPR with input type packages as described [here](#140035 (comment)) - Add `Custom Logs` integration - Verify that Dataset name displays a dropdown and the selected value is persisted. - Verify that a new value can be entered in Dataset name - Verify that Edit integration policy displays the existing value and allows selecting/creating another one. <img width="924" alt="image" src="https://user-images.githubusercontent.com/90178898/205680787-3ef7da08-f5f0-4f05-b8d7-3a1c0a6a3d56.png"> <img width="1008" alt="image" src="https://user-images.githubusercontent.com/90178898/205679497-935fe450-ce78-4f0b-943e-58e7f851f44b.png"> <img width="1006" alt="image" src="https://user-images.githubusercontent.com/90178898/205679589-fedbbe0e-2c4d-4c00-986f-34ec5c2eb2f6.png"> Added ordering of datasets to move up those that start with the package name e.g. `system*` datasets come first if adding a `system` integration. Other than that ordering datasets alphabetically. <img width="482" alt="image" src="https://user-images.githubusercontent.com/90178898/205924837-a9807c92-2fe4-431a-88c6-f161d00812fb.png"> The rest of the requirements seem to be already implemented, see [comments](#145903 (comment)) ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

kpollich added the Team:Fleet Team label for Observability Data Collection Fleet team label Nov 21, 2022

kpollich mentioned this issue Nov 21, 2022

[Fleet] Add support for input type packages #133296

Closed

12 tasks

kpollich assigned juliaElastic Nov 29, 2022

juliaElastic mentioned this issue Dec 5, 2022

[Fleet] show dataset combo box for input packages #147015

Merged

2 tasks

juliaElastic closed this as completed in #147015 Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Allow configuration of `dataset` value for Input packages #145903

[Fleet] Allow configuration of `dataset` value for Input packages #145903

kpollich commented Nov 21, 2022 •

edited by juliaElastic

Loading

elasticmachine commented Nov 21, 2022

juliaElastic commented Dec 5, 2022

kpollich commented Dec 5, 2022

juliaElastic commented Dec 6, 2022 •

edited

Loading

kpollich commented Dec 6, 2022

juliaElastic commented Dec 6, 2022 •

edited

Loading

kpollich commented Dec 6, 2022

juliaElastic commented Dec 6, 2022 •

edited

Loading

kpollich commented Dec 6, 2022

juliaElastic commented Dec 6, 2022 •

edited

Loading

[Fleet] Allow configuration of dataset value for Input packages #145903

[Fleet] Allow configuration of dataset value for Input packages #145903

Comments

kpollich commented Nov 21, 2022 • edited by juliaElastic Loading

Implementation

Designs

Policy editor

dataset dropdown

elasticmachine commented Nov 21, 2022

juliaElastic commented Dec 5, 2022

kpollich commented Dec 5, 2022

juliaElastic commented Dec 6, 2022 • edited Loading

kpollich commented Dec 6, 2022

juliaElastic commented Dec 6, 2022 • edited Loading

kpollich commented Dec 6, 2022

juliaElastic commented Dec 6, 2022 • edited Loading

kpollich commented Dec 6, 2022

juliaElastic commented Dec 6, 2022 • edited Loading

[Fleet] Allow configuration of `dataset` value for Input packages #145903

[Fleet] Allow configuration of `dataset` value for Input packages #145903

kpollich commented Nov 21, 2022 •

edited by juliaElastic

Loading

`dataset` dropdown

juliaElastic commented Dec 6, 2022 •

edited

Loading

juliaElastic commented Dec 6, 2022 •

edited

Loading

juliaElastic commented Dec 6, 2022 •

edited

Loading

juliaElastic commented Dec 6, 2022 •

edited

Loading