Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic Agent] Agent should be resilient to gzip errors #17915

Closed
111andre111 opened this issue Apr 22, 2020 · 10 comments · Fixed by #18876
Closed

[Elastic Agent] Agent should be resilient to gzip errors #17915

111andre111 opened this issue Apr 22, 2020 · 10 comments · Fixed by #18876
Assignees
Labels
bug Ingest Management:beta1 Group issues for ingest management beta1

Comments

@111andre111
Copy link
Contributor

At the moment I am getting a message like this one:

2020-04-22T20:23:48+02:00 DEBUG	operator.go:230	running operation 'operation-install' for metricbeat.8.0.0
2020-04-22T20:23:48+02:00 ERROR	reporter.go:47	2020-04-22T20:23:48+02:00: type: 'ERROR': sub_type: 'CONFIG' message: Application: metricbeat[8fac23b0-80c4-11ea-8fef-5b4a006ca222]: operation-install: requires gzip-compressed body: gzip: invalid header
2020-04-22T20:23:48+02:00 DEBUG	periodic.go:38	Failed to read configuration, error: could not emit configuration: operator: failed to execute step sc-run, error: operation-install: requires gzip-compressed body: gzip: invalid header: operation-install: requires gzip-compressed body: gzip: invalid header
	could not emit configuration: operator: failed to execute step sc-run, error: operation-install: requires gzip-compressed body: gzip: invalid header: operation-install: requires gzip-compressed body: gzip: invalid header
2020-04-22T20:23:58+02:00 DEBUG	periodic.go:56	Adding 1 file to watch

I am not sure of why that happens but obviously it seems like Kibana is waiting for a gzip header here or the agent doesn't send it like it's supposed to be?

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@ruflin ruflin added the Ingest Management:alpha1 Group issues for ingest management alpha1 label Apr 23, 2020
@ruflin ruflin assigned michalpristas and ruflin and unassigned ruflin Apr 30, 2020
@michalpristas
Copy link
Contributor

@111andre111 trying to reproduce this negative so far.
what was your approach to run into this.
I saw this error just once in past and it was my fault when tar ball in /data/download/metricbeat was incorrectly packed and contained 0B.
can you please share how you achieved this error or check whether tar balls are correct?

@ph
Copy link
Contributor

ph commented May 18, 2020

is this is still a problem?

@111andre111
Copy link
Contributor Author

111andre111 commented May 18, 2020

So I have rechecked now again

2020-05-18T21:13:27+02:00 DEBUG	action_dispatcher.go:93	Failed to dispatch action 'action_id: e00f8be6-eb78-4a57-af1b-d6cd2a68a630, type: CONFIG_CHANGE', error: operator: failed to execute step sc-run, error: operation-install: requires gzip-compressed body: gzip: invalid header: operation-install: requires gzip-compressed body: gzip: invalid header
	operator: failed to execute step sc-run, error: operation-install: requires gzip-compressed body: gzip: invalid header: operation-install: requires gzip-compressed body: gzip: invalid header
	operation-install: requires gzip-compressed body: gzip: invalid header
	requires gzip-compressed body: gzip: invalid header
	gzip: invalid header

And it still happens with a fresh pulled master branch.

I did now some digging and started up a tcpdump.

This showed me 2 requests from the Agent:

  1. This is ok.
curl -s -u elastic 'http://localhost:5601/api/ingest_manager/fleet/agent-status' -H 'Content-Type: application/json'
--> Response:  {"results":{"events":54,"total":1,"online":1,"error":0,"offline":0},"success":true}
  1. Here happens the error, that the agent just reflects.
curl -s -u elastic 'http://localhost:5601/api/ingest_manager/fleet/agents?page=1&perPage=20&showInactive=false' -H 'Content-Type: application/json'
---> Response jq path .list[0].current_error_events[0]:
        {
            "agent_id": "87463f9d-11f5-4fe4-b2d9-ee4bf42574ae",
            "type": "ERROR",
            "timestamp": "....+02:00",
            "subtype": "CONFIG",
            "message": "Application: filebeat[87463f9d-11f5-4fe4-b2d9-ee4bf42574ae]: operation-install: requires gzip-compressed body: gzip: invalid header",
            "config_id": "74a9d9b0-993b-11ea-9026-c7b92b1957af"
        }

I hope that helps a bit more.

@michalpristas
Copy link
Contributor

we went through @111andre111 's use case and found out that his data path contains corrupted tar archives.
correct archives could not be downloaded as ver 8 (working version used when building from source) is not published.
we discussed possibility of downloading a snapshot binary from snapshots.elastic.coin case agent is built as a snapshot which would made this use case easier.

@ph
Copy link
Contributor

ph commented May 19, 2020

@michalpristas Maybe we should find a way to recover from that state, redownload archive or verify the archives?

@michalpristas
Copy link
Contributor

yes in case we have a checksum hash available we can, but in other case we can limit retry to 1 and then fail hard not to get stuck in a loop wdyt

@ph ph changed the title [Elastic Agent] gzip error message and no connection to Ingest Manager [Elastic Agent] Agent should be resilient to gzip errors May 19, 2020
@ph
Copy link
Contributor

ph commented May 19, 2020

yes in case we have a checksum hash available we can, but in other case we can limit retry to 1 and then fail hard not to get stuck in a loop wdyt

@michalpristas Sound, good, I've changed the title of the issue to make it more actionable, I am going to move it to beta1.

@ph ph added Ingest Management:beta1 Group issues for ingest management beta1 and removed Ingest Management:alpha1 Group issues for ingest management alpha1 elastic-agent labels May 19, 2020
@111andre111
Copy link
Contributor Author

Another thing important to mention what I found out is the point out is if I use the snapshots:
SNAPSHOT=true mage build
Then it doesn't download the correct URLs.
It wants to download
https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.0.0-SNAPSHOT-darwin-x86_64.tar.gz
but these don't exist.
For now I solved that with that workaround:

  curl -s https://artifacts-api.elastic.co/v1/search/8.0-SNAPSHOT/filebeat | jq -r '.packages."filebeat-8.0.0-SNAPSHOT-darwin-x86_64.tar.gz".url' | xargs curl -O -J
    mv filebeat-8.0.0-SNAPSHOT-darwin-x86_64.tar.gz build/data/downloads
  curl -s https://artifacts-api.elastic.co/v1/search/8.0-SNAPSHOT/metricbeat | jq -r '.packages."metricbeat-8.0.0-SNAPSHOT-darwin-x86_64.tar.gz".url' | xargs curl -O -J
    mv metricbeat-8.0.0-SNAPSHOT-darwin-x86_64.tar.gz build/data/downloads

And as a last point:
It is important if you start the agent by being in the main master directory and execute
build/elastic-agent run
or I am in the directory build and run it there
/.....build/$ ./elastic-agent run

In second case it takes the data directory under build path, in first case, it wants the path one directory higher. So it wants the path relative where you actually are, and not where the executable artifact lies in.

@111andre111
Copy link
Contributor Author

Should be upcoming to be resolved in #18685

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Ingest Management:beta1 Group issues for ingest management beta1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants