Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Several errors on 8.0 upgrade #126113

Closed
mostlyjason opened this issue Feb 21, 2022 · 9 comments
Closed

[Fleet] Several errors on 8.0 upgrade #126113

mostlyjason opened this issue Feb 21, 2022 · 9 comments
Labels
bug Fixes for quality problems that affect the customer experience Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@mostlyjason
Copy link
Contributor

mostlyjason commented Feb 21, 2022

Kibana version:
8.0

Elasticsearch version:
8.0

Server OS version:
ubuntu

Original install method (e.g. download page, yum, from source, etc.):
docker

Describe the bug:
I found several errors in the logs after upgrading my on-prem cluster to 8.0:

  1. It looks like it tried to install several packages and failed due to conflicts. It then tried to uninstall packages, which also failed because I have integration policies using these packages.
  2. There is a connection reset error but I'm not sure why
  3. There is an error about an integration policy with the same name. I assume that is because I have two system-1 integration policies?

Steps to reproduce:

  1. Run a 7.17 cluster with several packages and integration policies like linux metrics, docker metrics, apache, system
  2. Upgrade to 8.0

Expected behavior:
I'm unclear why its installing packages on upgrade. Isn't that normally done when the user clicks the upgrade integration button in Kibana? Also, how do I identify which conflicts and how to fix them?

Can the error message for the integration policy suggest resolution steps, such as renaming the policy to something unique?

Provide logs and/or server output (if relevant):

[2022-02-21T20:17:58.530+00:00][WARN ][plugins.fleet] Failure to install package [docker]: [Error: Encountered 2 errors creating saved objects: [{"type":"index-pattern","id":"logs-*","error":{"type":"conflict"}},{"type":"index-pattern","id":"metrics-*","error":{"type":"conflict"}}]]
[2022-02-21T20:17:58.535+00:00][ERROR][plugins.fleet] uninstalling docker-1.0.0 after error installing: [Error: Encountered 2 errors creating saved objects: [{"type":"index-pattern","id":"logs-*","error":{"type":"conflict"}},{"type":"index-pattern","id":"metrics-*","error":{"type":"conflict"}}]]
[2022-02-21T20:17:58.537+00:00][WARN ][plugins.fleet] Failure to install package [linux]: [Error: Encountered 2 errors creating saved objects: [{"type":"index-pattern","id":"logs-*","error":{"type":"conflict"}},{"type":"index-pattern","id":"metrics-*","error":{"type":"conflict"}}]]
[2022-02-21T20:17:58.537+00:00][ERROR][plugins.fleet] uninstalling linux-0.4.1 after error installing: [Error: Encountered 2 errors creating saved objects: [{"type":"index-pattern","id":"logs-*","error":{"type":"conflict"}},{"type":"index-pattern","id":"metrics-*","error":{"type":"conflict"}}]]
[2022-02-21T20:17:58.554+00:00][WARN ][plugins.fleet] Failure to install package [apache]: [Error: Encountered 2 errors creating saved objects: [{"type":"index-pattern","id":"logs-*","error":{"type":"conflict"}},{"type":"index-pattern","id":"metrics-*","error":{"type":"conflict"}}]]
[2022-02-21T20:17:58.555+00:00][ERROR][plugins.fleet] uninstalling apache-1.3.2 after error installing: [Error: Encountered 2 errors creating saved objects: [{"type":"index-pattern","id":"logs-*","error":{"type":"conflict"}},{"type":"index-pattern","id":"metrics-*","error":{"type":"conflict"}}]]
[2022-02-21T20:17:58.627+00:00][WARN ][plugins.fleet] large amount of default fields detected for index template logs-osquery_manager.result in package osquery_manager, applying the first 1024 fields
[2022-02-21T20:17:58.925+00:00][ERROR][plugins.fleet] ConnectionError: read ECONNRESET - Local: unknown:unknown, Remote: unknown:unknown
    at KibanaTransport.request (/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:504:31)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at KibanaTransport.request (/usr/share/kibana/src/core/server/elasticsearch/client/create_transport.js:63:16)
    at Client.DeleteApi [as delete] (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/api/api/delete.js:36:12)
[2022-02-21T20:17:59.057+00:00][ERROR][plugins.fleet] failed to uninstall or rollback package after installation error Error: unable to remove package with existing package policy(s) in use by agent(s)
[2022-02-21T20:17:59.059+00:00][ERROR][plugins.fleet] failed to uninstall or rollback package after installation error Error: unable to remove package with existing package policy(s) in use by agent(s)
[2022-02-21T20:17:59.166+00:00][ERROR][plugins.fleet] failed to uninstall or rollback package after installation error Error: unable to remove package with existing package policy(s) in use by agent(s)
[2022-02-21T20:18:20.004+00:00][INFO ][plugins.fleet] Found previous transform references:
 [{"id":"endpoint.metadata_current-default-1.3.0","type":"transform"},{"id":"endpoint.metadata_united-default-1.3.0","type":"transform"}]
[2022-02-21T20:18:20.004+00:00][INFO ][plugins.fleet] Deleting currently installed transform ids endpoint.metadata_current-default-1.3.0,endpoint.metadata_united-default-1.3.0
[2022-02-21T20:18:20.179+00:00][INFO ][plugins.fleet] Deleted: endpoint.metadata_current-default-1.3.0
[2022-02-21T20:18:20.323+00:00][INFO ][plugins.fleet] Deleted: endpoint.metadata_united-default-1.3.0
[2022-02-21T20:18:35.903+00:00][INFO ][plugins.fleet] Found previous transform references:
 [{"id":"endpoint.metadata_current-default-1.3.0","type":"transform"},{"id":"endpoint.metadata_united-default-1.3.0","type":"transform"}]
[2022-02-21T20:18:35.903+00:00][INFO ][plugins.fleet] Deleting currently installed transform ids endpoint.metadata_current-default-1.3.0,endpoint.metadata_united-default-1.3.0
[2022-02-21T20:18:36.031+00:00][INFO ][plugins.fleet] Deleted: endpoint.metadata_current-default-1.3.0
[2022-02-21T20:18:36.127+00:00][INFO ][plugins.fleet] Deleted: endpoint.metadata_united-default-1.3.0
[2022-02-21T20:18:46.435+00:00][INFO ][plugins.fleet] Package policy upgrade dry run ran successfully
[2022-02-21T20:18:50.497+00:00][INFO ][plugins.fleet] Package policy upgraded successfully
[2022-02-21T20:18:52.392+00:00][INFO ][plugins.fleet] Package policy upgrade dry run ran successfully
[2022-02-21T20:18:52.831+00:00][ERROR][plugins.fleet] There is already an integration policy with the same name
[2022-02-21T20:18:53.036+00:00][INFO ][plugins.fleet] Encountered non fatal errors during Fleet setup
[2022-02-21T20:18:53.036+00:00][INFO ][plugins.fleet] Fleet setup completed

CC @kpollich

@mostlyjason mostlyjason added bug Fixes for quality problems that affect the customer experience Team:Fleet Team label for Observability Data Collection Fleet team labels Feb 21, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@kpollich
Copy link
Member

It looks like it tried to install several packages and failed due to conflicts. It then tried to uninstall packages, which also failed because I have integration policies using these packages.

This is interesting because these are saved object conflicts, not the "conflicts" we detect when upgrading policies in Fleet. It looks like we're erroring trying to create index patterns logs-* and metrics-* that already exist. I'm not sure why this is surfacing as an error.

There is a connection reset error but I'm not sure why

Not sure on this one, either.

There is an error about an integration policy with the same name. I assume that is because I have two system-1 integration policies?

This is the case, yes. If you're able to access the Fleet UI and rename these, then reboot Kibana, there should be some improvement in errors here.

@mostlyjason
Copy link
Contributor Author

@joshdover if the conflict is due to the fact that index patterns are already installed, we should be able to ignore it. So it shouldn't trigger uninstalling a package then?

@kpollich is there any way to improve the UX for the same name error? I suppose its impossible for me to upgrade the package until I rename it. Could we tell users to rename it more directly in the error message itself?

@kpollich
Copy link
Member

@kpollich is there any way to improve the UX for the same name error? I suppose its impossible for me to upgrade the package until I rename it. Could we tell users to rename it more directly in the error message itself?

This error message can definitely be improved fairly easily. I filed an issue here: #126164 and will take a quick pass at it shortly. Should just be a few minutes to update and I have some bandwidth this morning.

@joshdover
Copy link
Contributor

joshdover commented Feb 22, 2022

@joshdover if the conflict is due to the fact that index patterns are already installed, we should be able to ignore it. So it shouldn't trigger uninstalling a package then?

I believe these conflicts come from issues related to the Saved Object re-key migration that happens in 8.0. We handled should have handled this as part of #108959.

Do you know if any packages were installed in other Kibana Spaces? Is this reproducible?

  • It looks like it tried to install several packages and failed due to conflicts. It then tried to uninstall packages, which also failed because I have integration policies using these packages.
  • There is a connection reset error but I'm not sure why

We should be resilient to these connection reset errors. This should have been handled by the changes in #118587 but I wonder if we missed a call site? However, Elasticsearch client also has automatic retry logic for this class of errors. This makes me think that this error may have been encountered repeatedly, which would point to an orchestration issue.

The real shame here is the rollback package logic conflicts with the case where the previous package is in use, which is pretty much the only case that matters. We need to address this I think. EDIT: I've opened #126190

I'm unclear why its installing packages on upgrade. Isn't that normally done when the user clicks the upgrade integration button in Kibana? Also, how do I identify which conflicts and how to fix them?

"Managed packages" will auto upgrade on Stack upgrades. This was the case for Endpoint, System, Fleet Server, and Elastic Agent since at least 7.14.0 GA. In 7.16.0 we added APM and Synthetics to that behavior and in 8.1 we are removing System from it.

@mostlyjason To help me gauge severity of this problem, did this break your cluster or were you able to retry in anyway? I see the logs say it failed with "non-fatal" error but I'm not 100% sure what state you were in after this.

@mostlyjason
Copy link
Contributor Author

Do you know if any packages were installed in other Kibana Spaces? Is this reproducible?

I only have the default space. Not sure what you mean by reproducible, I only upgraded the cluster once. Are you asking if these errors will show again if I restart my upgraded version of Kibana?

I'm unclear why its installing packages on upgrade.
"Managed packages" will auto upgrade on Stack upgrades.

It looks like the packages with conflicts are docker, linux and apache, which are not "managed packages". It sounds like they being installed due to the Saved Object re-key migration, and not due to auto-upgrade behavior?

Did this break your cluster or were you able to retry in anyway? I see the logs say it failed with "non-fatal" error but I'm not 100% sure what state you were in after this.

I am able to use the cluster and it appears that I have the older version of those 3 packages installed. I'm not sure how to check if the Saved Object re-key migration worked. I get an error message when I try to upgrade the system integration, and I have not given it a unique name yet.

@joshdover
Copy link
Contributor

joshdover commented Feb 24, 2022

I only have the default space. Not sure what you mean by reproducible, I only upgraded the cluster once. Are you asking if these errors will show again if I restart my upgraded version of Kibana?

Thanks for the info. By reproducible, I mean what happens if you try to re-install or upgrade these packages manually from the UI? If it is reproducible, it'd be helpful to get an export of your index-pattern and alias Saved Objects. We can coordinate this privately if you're open to it.

It looks like the packages with conflicts are docker, linux and apache, which are not "managed packages". It sounds like they being installed due to the Saved Object re-key migration, and not due to auto-upgrade behavior?

Ah for these other packages, I believe it's related to this upgrade logic that we have which will ensure the global Fleet ingest pipeline is applied to packages: #120363. This will be addressed as part of #121099

They're not being installed because of the Saved Object re-key migration, but they do seem to be failing to reinstall because of it.

I am able to use the cluster and it appears that I have the older version of those 3 packages installed. I'm not sure how to check if the Saved Object re-key migration worked. I get an error message when I try to upgrade the system integration, and I have not given it a unique name yet.

If the docker, linux, and apache dashboards and viz still work, then the Saved Object migration itself succeeded. But if they still can't be upgraded, then we have an issue on the Fleet side to look at.

@mostlyjason
Copy link
Contributor Author

mostlyjason commented Feb 28, 2022

I was able to manually upgrade the docker and linux metrics integrations in Kibana the first time I tried. Apache was already the latest version. The dashboards work fine.

However, upgrading Prebuilt Security Detection Rules required three tries to succeed. The first two times I saw the following errors:

kibana_1         | [2022-02-28T14:13:46.657+00:00][ERROR][plugins.fleet] ConnectionError: read ECONNRESET - Local: unknown:unknown, Remote: unknown:unknown
kibana_1         |     at KibanaTransport.request (/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:504:31)
kibana_1         |     at runMicrotasks (<anonymous>)
kibana_1         |     at processTicksAndRejections (node:internal/process/task_queues:96:5)
kibana_1         |     at KibanaTransport.request (/usr/share/kibana/src/core/server/elasticsearch/client/create_transport.js:63:16)
kibana_1         |     at Client.DeleteApi [as delete] (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/api/api/delete.js:36:12)
kibana_1         | [2022-02-28T14:13:52.685+00:00][ERROR][plugins.fleet] read ECONNRESET - Local: unknown:unknown, Remote: unknown:unknown
kibana_1         | [2022-02-28T14:16:45.802+00:00][ERROR][plugins.fleet] ConnectionError: read ECONNRESET - Local: unknown:unknown, Remote: unknown:unknown
kibana_1         |     at KibanaTransport.request (/usr/share/kibana/node_modules/@elastic/transport/lib/Transport.js:504:31)
kibana_1         |     at runMicrotasks (<anonymous>)
kibana_1         |     at processTicksAndRejections (node:internal/process/task_queues:96:5)
kibana_1         |     at KibanaTransport.request (/usr/share/kibana/src/core/server/elasticsearch/client/create_transport.js:63:16)
kibana_1         |     at Client.DeleteApi [as delete] (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/api/api/delete.js:36:12)
kibana_1         | [2022-02-28T14:16:52.082+00:00][WARN ][plugins.fleet] Failure to install package [security_detection_engine]: [ConnectionError: read ECONNRESET - Local: unknown:unknown, Remote: unknown:unknown]
kibana_1         | [2022-02-28T14:16:52.773+00:00][ERROR][plugins.fleet] rolling back to security_detection_engine-0.14.3 after error installing security_detection_engine-1.0.1
kibana_1         | [2022-02-28T14:17:02.011+00:00][ERROR][plugins.fleet] read ECONNRESET - Local: unknown:unknown, Remote: unknown:unknown

I'll email you my index pattern export.

@joshdover
Copy link
Contributor

Since the root issue here appears to have been fixed in #126611, I think we can close this issue. We still don't have any information on the connection resets, but I think Fleet did the best it can in such a scenario at this time. We'll also further investigate how we can improve rollbacks in #126190

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

4 participants