Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm: release a new helm charts to use k8s service for discover query-scheduler replicas #9477

Conversation

sandeepsukhani
Copy link
Contributor

What this PR does / why we need it:
Currently, we have a bug in our code when running Loki in SSD mode and using the ring for query-scheduler discovery. It causes queries to not be distributed to all the available read pods. I have explained the issue in detail in the PR which fixes the code.

Since this bug causes a major query performance impact and code release might take time, in this PR we are doing a new helm release which fixes the issue by using the k8s service for discovering query-scheduler replicas.

Which issue(s) this PR fixes:
Fixes #9195

@sandeepsukhani sandeepsukhani requested a review from a team as a code owner May 18, 2023 11:11
@sandeepsukhani sandeepsukhani requested a review from JStickler as a code owner May 18, 2023 11:43
@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label May 18, 2023
Copy link
Collaborator

@trevorwhitney trevorwhitney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Confirmed they scheduler clients are registering for both Loki and GEL. Thank you!

Screenshot_20230518_135506

@@ -207,6 +207,16 @@ loki:
index_gateway:
{{- tpl (. | toYaml) $ | nindent 4 }}
{{- end }}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a TODO or some other note here to remove this section once we bump the Loki version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about continuing to use k8s service for service discovery? I am going to merge this PR, for now, to get the fix out. I will take care of the feedback in PR #9471 if you think we should move back to ring-based service discovery.

@@ -302,6 +312,10 @@ loki:
# -- Optional index gateway configuration
index_gateway:
mode: ring
frontend:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we could add the comment here to set these to empty objects when we bump the Loki version.

@sandeepsukhani sandeepsukhani merged commit d10549e into grafana:main May 22, 2023
rgarcia6520 pushed a commit to rgarcia6520/grafana-loki that referenced this pull request May 26, 2023
…-scheduler replicas (grafana#9477)

**What this PR does / why we need it**:
Currently, we have a bug in our code when running Loki in SSD mode and
using the ring for query-scheduler discovery. It causes queries to not
be distributed to all the available read pods. I have explained the
issue in detail in [the PR which fixes the
code](grafana#9471).

Since this bug causes a major query performance impact and code release
might take time, in this PR we are doing a new helm release which fixes
the issue by using the k8s service for discovering `query-scheduler`
replicas.

**Which issue(s) this PR fixes**:
Fixes grafana#9195
mraboosk pushed a commit to mraboosk/loki that referenced this pull request Oct 7, 2024
…-scheduler replicas (grafana#9477)

**What this PR does / why we need it**:
Currently, we have a bug in our code when running Loki in SSD mode and
using the ring for query-scheduler discovery. It causes queries to not
be distributed to all the available read pods. I have explained the
issue in detail in [the PR which fixes the
code](grafana#9471).

Since this bug causes a major query performance impact and code release
might take time, in this PR we are doing a new helm release which fixes
the issue by using the k8s service for discovering `query-scheduler`
replicas.

**Which issue(s) this PR fixes**:
Fixes grafana#9195
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm size/M type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loki queries not split across queriers
2 participants