Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Add experimental toggle for enabling doc-value-only indexing to data streams #144357

Closed
7 tasks
joshdover opened this issue Nov 1, 2022 · 12 comments · Fixed by #149131
Closed
7 tasks
Assignees
Labels
loe:medium Medium Level of Effort Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@joshdover
Copy link
Contributor

joshdover commented Nov 1, 2022

Related to #132818

Leveraging the prework done in #140095, we should add a toggle for "doc-only-value" indexing on a data stream. This should be done by updating the data stream's index template mappings so that every double and long field has index: false set.

Read more on doc-value-only fields: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/doc-values.html#doc-value-only-fields

Fleet should provide two experimental toggles for opting data streams into this functionality:

  1. One toggle that adds index: false to all "numeric" (e.g. double and long) mappings
  2. One toggle that adds index: false to all other compatible fields (keyword, date, ip, boolean, geo_point)

Each toggle controls the settings for corresponding mappings in the data stream's index template.

At least for testing purposes, having both options would likely be helpful to gauge the storage, indexing, and query performance tradeoffs.

Interaction with index: false in package manifests

Currently, integration developers can provide index: false for various fields included in an integrations mappings. If any value is provided for the index setting (either true or false), Fleet must honor it and avoid overwriting the value. The package manifest is the source of truth for all indexing settings.

In summary:

  • If index: undefined for a given field, it may be overridden by the toggle state
  • If index: true | false for a given field, it may not be overridden by the toggle state

Implementation

  • Create a numeric doc-values-only toggle under the "experimental indexing settings" section of the Fleet policy editor
    • When enabled, the corresponding data stream's index template mappings will include index: false for all double and long properties
    • When disabled, the index setting is unset from the index template's mappings
  • Create a other doc-values-only toggle under the "experimental indexing settings" section of the Fleet policy editor
    • When enabled, mappings for keyword, date, ip, boolean, and geo_point fields are updated to provide index: false in their settings
    • When disabled, the index value for the above mapping types is unset
  • Both toggles do not override existing index settings as provided by the given package's manifest
@joshdover joshdover added the Team:Fleet Team label for Observability Data Collection Fleet team label Nov 1, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@ruflin
Copy link
Contributor

ruflin commented Dec 5, 2022

If I set index: false in my fields.yml today in the integration, will Fleet respect it? @jsoriano This is different from the toggle proposed here which we should have also in addition but I can see the two play together / compete with each other. Even if there is some generic logic or translation of the mappings to use index: false, what is defined in the package itself per field must be respected.

@kpollich
Copy link
Member

kpollich commented Dec 5, 2022

@ruflin - I think index: false is supported as of #145410, but my understanding might be a little shallow. Does this look like it does what you need?

@ruflin
Copy link
Contributor

ruflin commented Dec 6, 2022

I did run a quick test run and index: false currently works as expected.

fields.yml

    - name: foo.bar
      type: long
      index: false
      description: |
        Test field.  

Resulting template:

"foo": {
  "properties": {
    "bar": {
      "index": false,
      "type": "long"
    }
  }
},

#145410 implies that the feature is supported, I assume it has been for a while. This is good.

During implementation of this issue we must ensure to have logic in place to not overwrite these values if they are set. Something like:

  • index:* not set: Fleet UI controls the outcome of the field
  • index:false or index:true: Values are kept even if toggle in the UI for this field

Same applies to doc_values field.

@joshdover
Copy link
Contributor Author

@kpollich can we update this issue to reflect the remaining work and how the toggle and package spec fields should work together?

@kpollich
Copy link
Member

kpollich commented Jan 3, 2023

@joshdover - Made some updates to the description above. Let me know if this accurate based on your's and @ruflin's comments above and elsewhere.

@ruflin
Copy link
Contributor

ruflin commented Jan 4, 2023

Most important for me is, whoever takes on this task is not just checking the exact spec described in this issue for implementation but checks against Elasticsearch docs what fields are supported, which ones not and why. I want to make sure we don't miss some fields that we forgot to put into the issue or have changed since the issue was written.

@juliaElastic
Copy link
Contributor

@joshdover @ruflin @kpollich These switches are reversible, right? So if enabled, can they be changed back?

@ruflin
Copy link
Contributor

ruflin commented Jan 18, 2023

AFAIK yes. But I assume it requires a rollover to take effect.

@juliaElastic
Copy link
Contributor

We could add a tooltip or link to the docs to remind the users that rollover is needed.

@joshdover
Copy link
Contributor Author

There's a dedicated issue for this open: #143448, prioritized for a 2 sprints from now.

@juliaElastic
Copy link
Contributor

juliaElastic commented Jan 18, 2023

@ruflin @joshdover The docs are a little confusing whether this list is an exhaustive list that supports doc-value-only:
"Numeric types, date types, the boolean type, ip type, geo_point type and the keyword type can also be queried".

I am wondering if by date types, do we mean only date or date_nanos as well?

Later the doc says that wildcard doesn't support this feature. I wonder if this means that all other types are supported.
"You cannot disable doc values for wildcard fields."

I guess I could go through all existing types to check, alternatively check if the mapping type has index field in the code.

juliaElastic added a commit that referenced this issue Jan 25, 2023
## Summary

Closes #144357

WIP. Review can be started, but still requires a lot of testing and
fixing the issue below.

How to test locally:
- Turn on `experimentalDataStreamSettings` feature flag
- Go to Add integration, System integration
- On the first data stream, turn on the Doc value only switches, Save
- The mapping changes are visible under Stack Management / Index
Management / Component Templates e.g. `logs-system.auth@package`
- The numeric switch sets `index:false` on all numeric field mappings
(long, double, etc.)
- The other switch sets `index:false` on all other field type mappings
that support it (keyword, ip, date, etc.)
- The new mappings will take effect after rollover

<img width="475" alt="image"
src="https://user-images.githubusercontent.com/90178898/213206641-13ead2fc-f079-407c-9c0e-c58f99dd4903.png">
<img width="1037" alt="image"
src="https://user-images.githubusercontent.com/90178898/213495546-9962c458-590b-4787-bf2d-9f19abea3f67.png">

What works:
- When turning the new doc-value-only numeric and other checkboxes on or
off, the corresponding mapping changes are done in the component
template
- The logic also reads the package spec's template and preserves the
`index:false` values regardless of the switch (tested manually by
setting `@timestamp` field to `index:false` in the template, there is
also the `original` field in `logs-system.auth@package` stream that is
set to `index:false` in the package by default.

```
"original": {
    "index": false,
    "doc_values": false,
    "type": "keyword"
},
```

Pending:
- Add/update unit and integration tests to verify the mapping change
logic - DONE
- Manual testing (turning the switches on/off, create/update package
policy, upgrade package) - DONE
- Clarify TODOs in the code about the supported types - DONE
- Hitting an issue when turning on `doc-value-only` for "other" types
(keyword, date, etc.). Could be that one of the fields doesn't support
`index:false` setting. Didn't experience this when turning on only the
numeric types. - FIXED
```
illegal_argument_exception: [illegal_argument_exception] Reason: updating component template [logs-system.auth@package] results in invalid composable template [logs-system.auth] after templates are merged
```

EDIT: found the root cause of this: `Caused by:
java.lang.IllegalArgumentException: data stream timestamp field
[@timestamp] is not indexed`


### Checklist

Delete any items that are not applicable to this PR.

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

Co-authored-by: Kibana Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
loe:medium Medium Level of Effort Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants