Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: Add shuffle sharding doc #6173

Merged
merged 15 commits into from
Jul 21, 2022
Merged

doc: Add shuffle sharding doc #6173

merged 15 commits into from
Jul 21, 2022

Conversation

kavirajk
Copy link
Contributor

@kavirajk kavirajk commented May 17, 2022

Signed-off-by: Kaviraj [email protected]

What this PR does / why we need it:
Add shuffle sharding doc for Loki.

  1. What is it?
  2. Why we need it on Loki?
  3. Key Configs
  4. Key metrics

And some operational advice.

Which issue(s) this PR fixes:
Fixes #NA

Special notes for your reviewer:
Markdown preview: https://github.com/grafana/loki/blob/0dcc6034ea927b16ff9a79eb71100b98a736da78/docs/sources/operations/shuffle-sharding.md

Checklist

  • Documentation added
  • Is this an important fix or new feature? Add an entry in the CHANGELOG.md.

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0.3%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@kavirajk kavirajk requested a review from trevorwhitney May 17, 2022 15:35
Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall it looks great on technical explanation. There are some suggestions by Dan and I will leave it to Karen for suggesting any corrections in writing.

The maximum number of queriers can be overridden on a per-tenant basis in the limits overrides configuration (`max_queriers_per_tenant`).

### The impact of “query of death”
In the event a tenant is repeatedly sending a “query of death” which leads the querier to crash or getting killed because of out-of-memory, the crashed querier will get disconnected from the query-frontend and a new querier will be immediately assigned to the tenant’s shard. This practically invalidates the assumption that shuffle-sharding can be used to contain the blast radius in case of a query of death.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a line explaining a bit more why that is a problem by mentioning that the tenant would get assigned to other running queries to match the max_queriers_per_tenant and might eventually affect other tenant shards which beats the purpose of this feature.

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
-               loki	-0.6%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

-           ingester	-0.1%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
-        distributor	-0.3%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

1 similar comment
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kavirajk
Copy link
Contributor Author

kavirajk commented Jun 2, 2022

@KMiller-Grafana can you take a look at the doc when you get a chance?

@KMiller-Grafana KMiller-Grafana added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Jul 20, 2022
Signed-off-by: Kaviraj <[email protected]>
@pull-request-size pull-request-size bot added size/L and removed size/M labels Jul 21, 2022
Signed-off-by: Kaviraj <[email protected]>
Copy link
Contributor

@jeschkies jeschkies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the write up. It help me understand shuffle sharding better 🙂

Copy link
Contributor

@KMiller-Grafana KMiller-Grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small change please!

docs/sources/operations/shuffle-sharding.md Outdated Show resolved Hide resolved
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@KMiller-Grafana KMiller-Grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much.

@KMiller-Grafana KMiller-Grafana added the backport release-2.6.x Tag a PR with this label to create a PR which cherry pics it into the release-2.6.x branch label Jul 21, 2022
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@kavirajk kavirajk merged commit a1e75da into main Jul 21, 2022
@kavirajk kavirajk deleted the kavirajk/doc-shuffle-sharding branch July 21, 2022 21:05
grafanabot pushed a commit that referenced this pull request Jul 21, 2022
* doc: Add shuffle sharding doc

Signed-off-by: Kaviraj <[email protected]>

* Add query-scheduler config as well

Signed-off-by: Kaviraj <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Add line explaining what is "query-of-death" exactly means

* PR remarsk from Karen

Signed-off-by: Kaviraj <[email protected]>

* Fix image URL

Signed-off-by: Kaviraj <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Karen Miller <[email protected]>

Co-authored-by: Dan Mihai Dinu <[email protected]>
Co-authored-by: Karen Miller <[email protected]>
(cherry picked from commit a1e75da)
vlad-diachenko pushed a commit that referenced this pull request Jul 22, 2022
* doc: Add shuffle sharding doc

Signed-off-by: Kaviraj <[email protected]>

* Add query-scheduler config as well

Signed-off-by: Kaviraj <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Add line explaining what is "query-of-death" exactly means

* PR remarsk from Karen

Signed-off-by: Kaviraj <[email protected]>

* Fix image URL

Signed-off-by: Kaviraj <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Karen Miller <[email protected]>

Co-authored-by: Dan Mihai Dinu <[email protected]>
Co-authored-by: Karen Miller <[email protected]>
(cherry picked from commit a1e75da)

Co-authored-by: Kaviraj Kanagaraj <[email protected]>
lxwzy pushed a commit to lxwzy/loki that referenced this pull request Nov 7, 2022
* doc: Add shuffle sharding doc

Signed-off-by: Kaviraj <[email protected]>

* Add query-scheduler config as well

Signed-off-by: Kaviraj <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <[email protected]>

* Add line explaining what is "query-of-death" exactly means

* PR remarsk from Karen

Signed-off-by: Kaviraj <[email protected]>

* Fix image URL

Signed-off-by: Kaviraj <[email protected]>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Karen Miller <[email protected]>

Co-authored-by: Dan Mihai Dinu <[email protected]>
Co-authored-by: Karen Miller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport release-2.6.x Tag a PR with this label to create a PR which cherry pics it into the release-2.6.x branch size/L type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants