Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Heartbeat] Adjust State loader to only retry for failed requests and not for 4xx #37424

Closed
vigneshshanmugam opened this issue Dec 12, 2023 · 5 comments · Fixed by #37981
Closed
Labels
8.12 candidate bug good first issue Indicates a good issue for first-time contributors Heartbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team

Comments

@vigneshshanmugam
Copy link
Member

Summary

Heartbeat uses state loader to get the last status from the ES cluster and loads the current monitor state after the monitor has been successfully ran. The ES loader has a backoff of 3 retries which will be attempted when the request fails due to connection failures or ES being unavailable. But currently, we are also retrying for 4xx where the API key has limited permissions to read the ES state.

This consequently affects the time a particular execution is taking when we are running inside the SAAS service.

Proposal

Retry ES state loader requests only for network connection failures or 5xx and avoid retrying for 403 and 4xx when we knows its a valid status from the ES side.

@vigneshshanmugam vigneshshanmugam added bug Heartbeat good first issue Indicates a good issue for first-time contributors Team:obs-ds-hosted-services Label for the Observability Hosted Services team 8.12 candidate labels Dec 12, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@vigneshshanmugam
Copy link
Member Author

Notes:

  • ES wont return 404 for the state requests since we are using _search query.
  • We need a mechanism to parse the response bytes for these unauthorized requests and short circuit the retried requests.
  • We also need to do perform this _search query only once for failed requests, We should not be updating the monitor state when we already know the user is not authorized.

@martinscholz83
Copy link
Contributor

@vigneshshanmugam, I could help if you want? Currently the error is a json string which could be unmarshalled and then check the status. For example

"401 Unauthorized: {\"error\":{\"root_cause\":[{\"type\":\"security_exception\",\"reason\":\"missing authentication credentials for REST request [/]\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}}],\"type\":\"security_exception\",\"reason\":\"missing authentication credentials for REST request [/]\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}},\"status\":401}"

But what about errors e.g. when the server is offline or any certificate issues. Then the error message has a different structure.

"couldn't connect to any of the configured Elasticsearch hosts. Errors: [error connecting to Elasticsearch at https://localhost:9200: Get \"https://localhost:9200\": dial tcp 127.0.0.1:9200: connect: connection refused]"

@martinscholz83
Copy link
Contributor

martinscholz83 commented Jan 15, 2024

I guess you had this attempts in mind?
https://github.com/elastic/beats/blob/main/heartbeat/beater/heartbeat.go#L355

@devcorpio
Copy link
Contributor

Hi @martinscholz83,

We started progressing in this issue a while ago. We overlooked your comments. Sorry about this.

We strongly appreciate your interest in solving this!

I guess you had this attempts in mind?

The attempts in mind were related to the query that extracts the state.

The one you mentioned was related to the "initial" connection to ES.

Thanks again,
Alberto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.12 candidate bug good first issue Indicates a good issue for first-time contributors Heartbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants