Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Document Thanos Sharding #1922

Merged
merged 3 commits into from
Mar 13, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions docs/sharding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: Sharding
type: docs
menu: thanos
slug: /sharding.md
---

# Background

Currently all components that read from object store assume that all the operations and functionality should be done based
on **all** the available blocks that are present in the certain bucket's root directory.

This is in most cases totally fine, however with time and allowance of storing blocks from multiple `Sources` into the same bucket,
the number of objects in a bucket can grow drastically.

This means that with time you might want to scale out certain components e.g:

* Compactor: Larger number of objects does not matter much, however compactor has to scale (CPU, network, disk, memory) with number of Sources pushing blocks to the object storage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Explain same for Store GW

# Relabelling

Similar to [promtail](https://github.com/grafana/loki/blob/master/docs/promtail.md#scrape-configs) this config will follow native
[Prometheus relabel-config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config) syntax.

The relabel config defines filtering process done on **every** synchronization with object storage.

We will allow potentially manipulating with several of inputs:

* External labels:
* `<name>`

Output:

* If output is empty, drop block.

By default, on empty relabel-config, all external labels are assumed.

Example usages would be:

* Drop blocks which contains external labels cluster=A

```yaml
- action: drop
regex: "A"
source_labels:
- cluster
```

* Keep only blocks which contains external labels cluster=A
```yaml
- action: keep
regex: "A"
source_labels:
- cluster
```

We can shard by adjusting which labels should be included in the blocks.

# Time Partitioning

For store gateway, we can specify `--min-time` and `--max-time` flags to filter for what blocks store gateway should be responsible for.

More details can refer to "Time based partitioning" chapter in [Store gateway](components/store.md).