Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Honor index_mode: time_series setting during package installation #146804

Closed
Tracked by #132818
kpollich opened this issue Dec 1, 2022 · 19 comments · Fixed by #148292
Closed
Tracked by #132818

[Fleet] Honor index_mode: time_series setting during package installation #146804

kpollich opened this issue Dec 1, 2022 · 19 comments · Fixed by #148292
Assignees
Labels
Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@kpollich
Copy link
Member

kpollich commented Dec 1, 2022

Ref elastic/integrations#4749
Ref #144974

If a data stream includes index_mode: time_series in a package manifest, it should have its mappings/settings updated appropriately to support TSDB. This is the same process called out for synthetic source in #141211.

EDIT (Julia): To confirm the scope of this issue based on the discussion in the comments:

  • enable TSBD settings on package installation, if index_mode: time_series is set in the package manifest
  • enable synthetic_source with index_mode: time_series by default on package installation
@kpollich kpollich added the Team:Fleet Team label for Observability Data Collection Fleet team label Dec 1, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@ruflin
Copy link
Contributor

ruflin commented Dec 12, 2022

As this has not been release yet, it gives us some time to discuss the defaults. In Elasticsearch, index_mode and synthetic_source are two separate configs. But my expectation is, that whenever index_mode: time_seriees is used, synthetic_source should be enabled by default. A package developer could still opt out of this by disabling synthetic source in the package.

What this means that in a package to use time_series and synthetic source, only time series needs to be enabled. This has an effect on #147034 as what the default behaviours are would change slightly as soon as this is implemented. @nchaulet

@martijnvg @jpountz Curious to hear your thoughts on the above proposal to have for data streams we configure through packages, have synthetic source enabled by default for time_series data streams.

@martijnvg
Copy link
Member

@ruflin I think it makes sense that for time series data streams / indices that synthetic source is enabled by default. I think most of the time synthetic source will be enabled in this context. In cases where the downsides of synthetic source (losing accuracy in _source and higher costs in synthesising _source during fetch phase or get api) start to matter then for these cases users can opt out. I will open an issue in the ES repository.

@martijnvg
Copy link
Member

I've opened: elastic/elasticsearch#92319

@ruflin
Copy link
Contributor

ruflin commented Dec 13, 2022

Thanks for the input @martijnvg It seems no matter of the outcome of elastic/elasticsearch#92319 we can enable it for integrations by default. If ES has it eventually by default too, even better.

@amitkanfer
Copy link

@jlind23 let's make sure this is not pushed to sprint 6, please... it's a blocker for o11y

@joshdover
Copy link
Contributor

We need to make sure that the issue mentioned in elastic/integrations#4749 (comment) is resolved as part of closing this issue. index_mode should be applied on package installation, even if there is no associated integration policy yet created.

In general, I still find the location of this setting in the integration policy settings confusing for both our developers and users. The setting applies to all instances of the integration today and is not a policy-level setting, but that's not how it's exposed in our UI and API (someone should double check me on the API part).

I would still prefer there at least be a separate API for setting these package-level settings, if not also a separate UI for it as well. I imagine it should be similar to the proposed API in the "Customize API" section from this issue description: #121118

@ruflin
Copy link
Contributor

ruflin commented Dec 15, 2022

I would still prefer there at least be a separate API for setting these package-level settings, if not also a separate UI for it as well. I imagine it should be similar to the proposed API in the "Customize API" section from this issue description: #121118

++. I think API is a must to make it clear this is not a policy feature. UI we can figure out in a second step.

@juliaElastic
Copy link
Contributor

So as part of this issue, we want to:

  • enable TSBD settings on package installation, if index_mode: time_series is set in the package manifest
  • enable synthetic_source with index_mode: time_series by default on package installation

Can we move the API changes to a new issue? As it would increase the scope substantially, I think it is more important to solve the original issue first to unblock Observability.

@ruflin
Copy link
Contributor

ruflin commented Dec 16, 2022

@juliaElastic I think this is fine assuming these aren't APIs that are exposed to users and changing these could be a breaking change.

At the same time, I think it is important we address it quickly afterwards. I started to dig a bit into the code #147684 and I'm a bit concerned that not having the "data management" fully separate could lead to potential bugs.

@juliaElastic
Copy link
Contributor

These settings are currently part of our generic /package_policies API, though it is missing from openapi spec. This API will soon be GA, so it would be best to move out soon.

POST kbn:/api/fleet/package_policies
{
  "policy_id": "<agent_policy_id>",
  "package": {
    "name": "system",
    "version": "1.20.4",
    "experimental_data_stream_features": [
      {
        "data_stream": "logs-system.application",
        "features": {
          "synthetic_source": true,
          "tsdb": true
        }
      }
    ]
  },

@nchaulet
Copy link
Member

nchaulet commented Dec 16, 2022

Thinking loud here, do we really need a new API to support these features or should we leverage more the @custom component template for these? It's a way to customize datastreams that user already know and well documented.

@ruflin
Copy link
Contributor

ruflin commented Dec 16, 2022

should we leverage more the @custom

I assume in this scenario, @custom would only be used if the user made changes in the policy. Anything specified in the package itself would go into the "managed" templates still. Example: If index_mode: time_series is set in the package, @custom would be empty.

@nchaulet
Copy link
Member

nchaulet commented Dec 16, 2022

I assume in this scenario, @Custom would only be used if the user made changes in the policy. Anything specified in the package itself would go into the "managed" templates still. Example: If index_mode: time_series is set in the package, @Custom would be empty.

Yes the default behavior defined in the package will go in the managed component template, what I was think is more to replace the experimental_data_stream_features by using the @custom component template

@joshdover
Copy link
Contributor

joshdover commented Dec 16, 2022

Using @custom (and ES component template APIs) could work as long as we always use that as the source of truth in UIs. I wouldn't want to have some package-level setting in the Fleet data model that could get out of sync with @custom templates on each data stream and then the UI doesn't know what to show.

Long-term we need a way to optionally define a component template with overrides to meet the various granularity needs as discussed in #146792 (comment).

If we go with the ES API approach, we would need to work out a way to define these templates later as well. Ideally the index template API's composedOf field would allow referencing components that don't yet exist and only use them when they do. This would be similar to the ES pipeline ignore_missing_pipeline feature we added for custom ingest pipelines. EDIT: I opened

I do like how flexible and clean that would be over modifying the package-defined fields. However, the Fleet API still needs some way to expose this functionality to the user who may not have Elasticsearch access. I don't think the package_policy API is the right one since it applies to the whole package. I'd prefer to have something at the dataset level since the templates are dataset-level today.

@nchaulet
Copy link
Member

I don't think the package_policy API is the right one since it applies to the whole package. I'd prefer to have something at the dataset level since the templates are dataset-level today.

It's clearly not the right place. We could have a customize API in fleet where you can pass the level that you want to customize /integration/dataset/namespace and this would update the correct @custom component template (if we have composable @custom template at every level

@joshdover
Copy link
Contributor

#148057 should be fixed as well, else this will be quite a useless feature. We need to be able to leverage TSDB on all of the supported field types.

@ruflin
Copy link
Contributor

ruflin commented Jan 25, 2023

@kpollich @nchaulet What will happen if a package has index_mode: time_series and is deployed in 8.5 as an example? Will it be accepted and just us the old index mode and eventually when the user upgrades to 8.7, TSDB will be used (or at least on the next rollover)? I hope this is the case because it would mean, we can update existing packages to use TSDB without having them to make 8.7 compatible only.

@nchaulet
Copy link
Member

@kpollich @nchaulet What will happen if a package has index_mode: time_series and is deployed in 8.5 as an example? Will it be accepted and just us the old index mode and eventually when the user upgrades to 8.7, TSDB will be used (or at least on the next rollover)? I hope this is the case because it would mean, we can update existing packages to use TSDB without having them to make 8.7 compatible only.

Yes it will be ignored in 8.5, it will probably be applied somehow in 8.6 if the package policy is edited in the UI as there was some code in the UI forcing an experimental datastream feature if index_mode: time_series (Link to the code), and it will works well in 8.7.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants