Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet]: Does not allow package upgrade to disable TSDB #157345

Closed
lalit-satapathy opened this issue May 10, 2023 · 17 comments · Fixed by #157395 or #157869
Closed

[Fleet]: Does not allow package upgrade to disable TSDB #157345

lalit-satapathy opened this issue May 10, 2023 · 17 comments · Fixed by #157395 or #157869
Assignees
Labels
blocker bug Fixes for quality problems that affect the customer experience Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@lalit-satapathy
Copy link

Version: 8.8.0-snapshot and mode of run: elastic-package.

As part of the 8.8.0 TSDB rollout testing; need a mechanism to disable TSDB on a package.

Here is the usage model:

  • Certain packages in 8.8.0 will have index_mode "time_series" enabled.
  • If any issues found during the run, we should have an ability to disable TSDB on the package.

Two options to disable TSDB:

  • Manual command to be run in dev tools (Fleet to confirm the command usage for this)
    • Need the confirmation that the manual commands provide a mechanism to go back from TSDB mode to non TSDB mode including rollover.
  • Upgrade option to install a new package with TSDB disabled.

We find that the second option (Upgrade a new package with TSDB enabled) has some errors, which completely blocks this path for users and testing.

Screenshot 2023-05-11 at 4 28 18 AM

We need this to be fixed to proceed with testing.

@lalit-satapathy lalit-satapathy added bug Fixes for quality problems that affect the customer experience blocker Team:Fleet Team label for Observability Data Collection Fleet team labels May 10, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@lalit-satapathy
Copy link
Author

CC: @ruflin @andresrc @mlunadia

@hop-dev
Copy link
Contributor

hop-dev commented May 11, 2023

this is the error we receive on the fleet side, digging into it now

[2023-05-11T11:12:08.087+01:00][ERROR][plugins.fleet] ResponseError: illegal_argument_exception
	Caused by:
		invalid_index_template_exception: index_template [metrics-nginx.stubstatus] invalid, cause [Validation Failed: 1: [index.mode=time_series] requires a non-empty [index.routing_path];]
	Root causes:
		illegal_argument_exception: updating component template [metrics-nginx.stubstatus@package] results in invalid composable template [metrics-nginx.stubstatus] after templates are merged
    at KibanaTransport.request (/Users/markhopkin/dev/kibana/node_modules/@elastic/transport/src/Transport.ts:535:17)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    ```

@hop-dev
Copy link
Contributor

hop-dev commented May 11, 2023

An update:

The problem is that index_mode is being set on the index template itsself and the routing_path on the @package component template.
When we go to update the package we first update the component templates, then the index template afterwards. The problem with that is that for this particular update, we try to update the component template to remove routing_path but the validation correctly detects that index_mode is still set to time series, so rejects

I am looking at the implications of changing it so that we update the index template first, or move index mode to the component template

@kpollich
Copy link
Member

move index mode to the component template

I think this should be our preference, but I recall having issues setting index_mode at the component template level during the initial implementation here. Perhaps that's not relevant anymore with stability improvements around TSDB features in Elasticsearch.

@ruflin
Copy link
Contributor

ruflin commented May 11, 2023

A more general note: I thought we don't need to set routing_path anymore, so this might be a leftover?

@juliaElastic
Copy link
Contributor

Fleet doesn't set routing_path, it is being generated by elasticsearch when the index.mode:time_series is set.

@juliaElastic
Copy link
Contributor

I am looking at the implications of changing it so that we update the index template first, or move index mode to the component template

I think both of these are quite risky to change for a patch in 8.8.
We could add a fix to update the index template before the component template only if the time_series setting is being removed from an existing index template.

@juliaElastic
Copy link
Contributor

The fix is ready and approved, should be auto merged when the build finishes.

juliaElastic added a commit to juliaElastic/kibana that referenced this issue May 11, 2023
## Summary

Fixes elastic#157345

To test:

Install `nginx-1.12.0-beta` which has `index.mode:time_series`.

```
POST http://elastic:changeme@localhost:5601/api/fleet/epm/packages/nginx-1.12.0-beta
kbn-xsrf: kibana

{
   "force": true
 }
```

Upgrade to `nginx-1.12.1-beta` which has `index.mode:time_series`
removed.

Upload this package built manually:

[nginx-1.12.1-beta.zip](https://github.com/elastic/kibana/files/11452945/nginx-1.12.1-beta.zip)

```
curl -XPOST -H 'content-type: application/zip' -H 'kbn-xsrf: true' http://localhost:5601/api/fleet/epm/packages -u elastic:changeme --data-binary @nginx-1.12.1-beta.zip
```

The package should install successfully and time_series should be
removed from the index template `metrics-nginx.stubstatus`

WIP: update tests


### Checklist


- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue May 11, 2023
## Summary

Fixes elastic#157345

To test:

Install `nginx-1.12.0-beta` which has `index.mode:time_series`.

```
POST http://elastic:changeme@localhost:5601/api/fleet/epm/packages/nginx-1.12.0-beta
kbn-xsrf: kibana

{
   "force": true
 }
```

Upgrade to `nginx-1.12.1-beta` which has `index.mode:time_series`
removed.

Upload this package built manually:

[nginx-1.12.1-beta.zip](https://github.com/elastic/kibana/files/11452945/nginx-1.12.1-beta.zip)

```
curl -XPOST -H 'content-type: application/zip' -H 'kbn-xsrf: true' http://localhost:5601/api/fleet/epm/packages -u elastic:changeme --data-binary @nginx-1.12.1-beta.zip
```

The package should install successfully and time_series should be
removed from the index template `metrics-nginx.stubstatus`

WIP: update tests

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

(cherry picked from commit a0ee7b6)
@lalit-satapathy
Copy link
Author

There is a new snapshot build. In the latest snapshot release, I dont get this error.

However, Index rollover did not happen after package upgrade. @kpollich index rollover should be automatically done right?

We are continuing our testing and need more time.

@juliaElastic
Copy link
Contributor

It seems that the rollover is only happening when the TSDB is toggled on the integration policy: #149967

We can work on a fix, the question is, should this be flagged as a blocker for 8.8? As the last BC is being built today.

@lalit-satapathy
Copy link
Author

We don't want to ask uses to do manual rollover, when a package upgrade is done. It is understandable as part of manual steps. Should not rollover work as part of the package upgrade?

@jlind23
Copy link
Contributor

jlind23 commented May 16, 2023

@juliaElastic Could you please work on a fix for this and raise it as a blocker in the 8.8 dev issue?

@juliaElastic
Copy link
Contributor

juliaElastic commented May 16, 2023

As discussed with Julien, I think we wouldn't consider this as a blocker, since there is a workaround to manually roll over, and AFAIK this issue would happen for a few beta packages that has TSDB on and now turning it off in a new version. Any concerns @lalit-satapathy?

@ruflin
Copy link
Contributor

ruflin commented May 16, 2023

This is a blocker for us to widely rollout TSDB. The basic assumption is, if there are packages that encounter issues with TSDB, we can quickly rollout a new version of the package with TSDB disabled. This means, users only need to upgrade a package. If this is not working, it means we need to ask all users that have this integration installed to manually rollover the data streams that are effected.

@andresrc
Copy link

@juliaElastic We are using Beta packages for testing, as soon as 8.8.0 is out we will start shipping GA packages, and some of them contain multiple data streams, we cannot tell users to manually roll over everything.

@juliaElastic juliaElastic reopened this May 16, 2023
@juliaElastic
Copy link
Contributor

Fix WIP: #157869

kpollich added a commit that referenced this issue May 16, 2023
… installed (#157869)

## Summary

Fixes #157345

When a package with a changed `index.mode` or `source.mode` setting is
installed, Fleet will now automatically perform a rollover to ensure the
correct setting is present on the resulting backing index.

There is an issue with Elasticsearch wherein toggling these settings
back and forth will incur a backing index range overlap error. See
elastic/elasticsearch#96163.

To test
1. Install the `system` integration at version `1.28.0`
2. Create an integration policy for the `system` integration (a standard
default agent policy will do)
3. Enroll an agent in this policy, and allow it to ingest some data
4. Confirm that there are documents present in the
`metrics-system.cpu-default` data stream, and note its backing index via
Stack Management
5. Create a new `1.28.1` version of the `system` integration where
`elasticsearch.index_mode: time_series` is set and install it via
`elastic-package install --zip`
6. Confirm that a rollover occurs and the backing index for the
`metrics-system.cpu-default` data stream has been updated

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Kibana Machine <[email protected]>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue May 16, 2023
… installed (elastic#157869)

## Summary

Fixes elastic#157345

When a package with a changed `index.mode` or `source.mode` setting is
installed, Fleet will now automatically perform a rollover to ensure the
correct setting is present on the resulting backing index.

There is an issue with Elasticsearch wherein toggling these settings
back and forth will incur a backing index range overlap error. See
elastic/elasticsearch#96163.

To test
1. Install the `system` integration at version `1.28.0`
2. Create an integration policy for the `system` integration (a standard
default agent policy will do)
3. Enroll an agent in this policy, and allow it to ingest some data
4. Confirm that there are documents present in the
`metrics-system.cpu-default` data stream, and note its backing index via
Stack Management
5. Create a new `1.28.1` version of the `system` integration where
`elasticsearch.index_mode: time_series` is set and install it via
`elastic-package install --zip`
6. Confirm that a rollover occurs and the backing index for the
`metrics-system.cpu-default` data stream has been updated

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Kibana Machine <[email protected]>
(cherry picked from commit 22e3847)
kibanamachine referenced this issue May 16, 2023
…ged is installed (#157869) (#157916)

# Backport

This will backport the following commits from `main` to `8.8`:
- [[Fleet] Rollover data streams when package w/ TSDB setting changed is
installed (#157869)](#157869)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Kyle
Pollich","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-05-16T18:16:14Z","message":"[Fleet]
Rollover data streams when package w/ TSDB setting changed is installed
(#157869)\n\n## Summary\r\n\r\nFixes
https://github.com/elastic/kibana/issues/157345\r\n\r\nWhen a package
with a changed `index.mode` or `source.mode` setting is\r\ninstalled,
Fleet will now automatically perform a rollover to ensure the\r\ncorrect
setting is present on the resulting backing index.\r\n\r\nThere is an
issue with Elasticsearch wherein toggling these settings\r\nback and
forth will incur a backing index range overlap error.
See\r\nhttps://github.com/elastic/elasticsearch/issues/96163.\r\n\r\nTo
test\r\n1. Install the `system` integration at version `1.28.0`\r\n2.
Create an integration policy for the `system` integration (a
standard\r\ndefault agent policy will do)\r\n3. Enroll an agent in this
policy, and allow it to ingest some data\r\n4. Confirm that there are
documents present in the\r\n`metrics-system.cpu-default` data stream,
and note its backing index via\r\nStack Management\r\n5. Create a new
`1.28.1` version of the `system` integration
where\r\n`elasticsearch.index_mode: time_series` is set and install it
via\r\n`elastic-package install --zip`\r\n6. Confirm that a rollover
occurs and the backing index for the\r\n`metrics-system.cpu-default`
data stream has been updated\r\n\r\n### Checklist\r\n\r\nDelete any
items that are not applicable to this PR.\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine
<[email protected]>","sha":"22e38472f6f05f9e72d97e74ff8328565da4d53b","branchLabelMapping":{"^v8.9.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Fleet","backport:prev-minor","v8.9.0"],"number":157869,"url":"https://github.com/elastic/kibana/pull/157869","mergeCommit":{"message":"[Fleet]
Rollover data streams when package w/ TSDB setting changed is installed
(#157869)\n\n## Summary\r\n\r\nFixes
https://github.com/elastic/kibana/issues/157345\r\n\r\nWhen a package
with a changed `index.mode` or `source.mode` setting is\r\ninstalled,
Fleet will now automatically perform a rollover to ensure the\r\ncorrect
setting is present on the resulting backing index.\r\n\r\nThere is an
issue with Elasticsearch wherein toggling these settings\r\nback and
forth will incur a backing index range overlap error.
See\r\nhttps://github.com/elastic/elasticsearch/issues/96163.\r\n\r\nTo
test\r\n1. Install the `system` integration at version `1.28.0`\r\n2.
Create an integration policy for the `system` integration (a
standard\r\ndefault agent policy will do)\r\n3. Enroll an agent in this
policy, and allow it to ingest some data\r\n4. Confirm that there are
documents present in the\r\n`metrics-system.cpu-default` data stream,
and note its backing index via\r\nStack Management\r\n5. Create a new
`1.28.1` version of the `system` integration
where\r\n`elasticsearch.index_mode: time_series` is set and install it
via\r\n`elastic-package install --zip`\r\n6. Confirm that a rollover
occurs and the backing index for the\r\n`metrics-system.cpu-default`
data stream has been updated\r\n\r\n### Checklist\r\n\r\nDelete any
items that are not applicable to this PR.\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine
<[email protected]>","sha":"22e38472f6f05f9e72d97e74ff8328565da4d53b"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.9.0","labelRegex":"^v8.9.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/157869","number":157869,"mergeCommit":{"message":"[Fleet]
Rollover data streams when package w/ TSDB setting changed is installed
(#157869)\n\n## Summary\r\n\r\nFixes
https://github.com/elastic/kibana/issues/157345\r\n\r\nWhen a package
with a changed `index.mode` or `source.mode` setting is\r\ninstalled,
Fleet will now automatically perform a rollover to ensure the\r\ncorrect
setting is present on the resulting backing index.\r\n\r\nThere is an
issue with Elasticsearch wherein toggling these settings\r\nback and
forth will incur a backing index range overlap error.
See\r\nhttps://github.com/elastic/elasticsearch/issues/96163.\r\n\r\nTo
test\r\n1. Install the `system` integration at version `1.28.0`\r\n2.
Create an integration policy for the `system` integration (a
standard\r\ndefault agent policy will do)\r\n3. Enroll an agent in this
policy, and allow it to ingest some data\r\n4. Confirm that there are
documents present in the\r\n`metrics-system.cpu-default` data stream,
and note its backing index via\r\nStack Management\r\n5. Create a new
`1.28.1` version of the `system` integration
where\r\n`elasticsearch.index_mode: time_series` is set and install it
via\r\n`elastic-package install --zip`\r\n6. Confirm that a rollover
occurs and the backing index for the\r\n`metrics-system.cpu-default`
data stream has been updated\r\n\r\n### Checklist\r\n\r\nDelete any
items that are not applicable to this PR.\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine
<[email protected]>","sha":"22e38472f6f05f9e72d97e74ff8328565da4d53b"}}]}]
BACKPORT-->

Co-authored-by: Kyle Pollich <[email protected]>
jasonrhodes pushed a commit that referenced this issue May 17, 2023
… installed (#157869)

## Summary

Fixes #157345

When a package with a changed `index.mode` or `source.mode` setting is
installed, Fleet will now automatically perform a rollover to ensure the
correct setting is present on the resulting backing index.

There is an issue with Elasticsearch wherein toggling these settings
back and forth will incur a backing index range overlap error. See
elastic/elasticsearch#96163.

To test
1. Install the `system` integration at version `1.28.0`
2. Create an integration policy for the `system` integration (a standard
default agent policy will do)
3. Enroll an agent in this policy, and allow it to ingest some data
4. Confirm that there are documents present in the
`metrics-system.cpu-default` data stream, and note its backing index via
Stack Management
5. Create a new `1.28.1` version of the `system` integration where
`elasticsearch.index_mode: time_series` is set and install it via
`elastic-package install --zip`
6. Confirm that a rollover occurs and the backing index for the
`metrics-system.cpu-default` data stream has been updated

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Kibana Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker bug Fixes for quality problems that affect the customer experience Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
8 participants