Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

querier: fix remote read error translation #7487

Merged

Conversation

dimitarvdimitrov
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov commented Feb 27, 2024

Problem

Currently remote-read has conflicting behaviours:

  • when returning samples all internal errors are translated to HTTP 400
  • when returning chunks all internal errors are translated to HTTP 500

PR description

This PR unifies error handling such that:

  • all validation errors will return HTTP 400
  • all promQL storage errors and all other errors will return HTTP 500

It also adds tests for this in two places

  • integration tests: this comes with moving of some code to query remote read from an endpoint
  • unit tests in pkg/querier

The PR also moves the content type string to pkg/querier/api since it's a complicated string that was repeated in a few places.

Note to reviewers

I couldn't really understand the Prometheus docs in terms of the expected return codes and content type. The docs give a JSON example as the structure for the API, but the remote read API is based on protocol buffers. At the same time the protocol buffers do not have fields for errors. So I'm still not sure if we're compliant with the docs page.

Because of the same there are some discrepancies with HTTP status codes on the regular query and query_range APIs - they return HTTP 422 for hit limits whereas remote read returns HTTP 400

I'm investigating a case where the remote-read errors for query length don't get correctly translated.

I tried adding an integration test which surprisingly passed. This means I still haven't found the problem but decided to submit the integration test anyway.

Signed-off-by: Dimitar Dimitrov <[email protected]>
@dimitarvdimitrov dimitarvdimitrov requested a review from a team as a code owner February 27, 2024 16:54
@dimitarvdimitrov dimitarvdimitrov changed the title query-frontend: add tests for remote read error translation querier: fix remote read error translation Feb 27, 2024
@dimitarvdimitrov dimitarvdimitrov added bug Something isn't working component/querier labels Feb 27, 2024
Signed-off-by: Dimitar Dimitrov <[email protected]>
@dimitarvdimitrov
Copy link
Contributor Author

this PR started just as added integration tests, but I ended up finding the bug and fixing it. The description and title are up to date now

@pracucci pracucci self-requested a review February 28, 2024 07:27
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't really understand the Prometheus docs in terms of the expected return codes and content type. The docs give a JSON example as the structure for the API, but the remote read API is based on protocol buffers. At the same time the protocol buffers do not have fields for errors. So I'm still not sure if we're compliant with the docs page.

Looking at the remote read handler implementation in Prometheus looks like they're just sending errors as plain text messages (see code).

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job spotting the bug and fixing it! 👏

dimitarvdimitrov and others added 3 commits February 28, 2024 14:40
Co-authored-by: Marco Pracucci <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
@dimitarvdimitrov
Copy link
Contributor Author

thanks for the review @pracucci!

@dimitarvdimitrov dimitarvdimitrov enabled auto-merge (squash) February 28, 2024 13:43
@dimitarvdimitrov
Copy link
Contributor Author

i'll leave this open in case anyone else wants to give it a review. Hope to merge it before Friday

@dimitarvdimitrov dimitarvdimitrov merged commit 8d88907 into main Mar 1, 2024
29 checks passed
@dimitarvdimitrov dimitarvdimitrov deleted the dimitar/query-frontend/add-tests-for-remote-read-errors branch March 1, 2024 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/querier
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants