Skip to content

Commit

Permalink
Cleaned up getting-started docs, added more docs for Querier. (thanos…
Browse files Browse the repository at this point in the history
…-io#1541)

Signed-off-by: Bartek Plotka <[email protected]>
  • Loading branch information
bwplotka authored Oct 3, 2019
1 parent 68c539c commit f1d3e63
Show file tree
Hide file tree
Showing 12 changed files with 429 additions and 229 deletions.
12 changes: 7 additions & 5 deletions .github/ISSUE_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ about what components it touches e.g "query:" or ".*:"
In case of issues related to exact bucket implementation, please ping corresponded maintainer from list here: https://github.com/thanos-io/thanos/blob/master/docs/storage.md
-->

**Thanos, Prometheus and Golang version used**
**Thanos, Prometheus and Golang version used**:

<!--
Output of "thanos --version" or docker image:tag used.
Expand All @@ -18,13 +18,15 @@ Output of "thanos --version" or docker image:tag used.
If you are using custom build from master branch, have you checked out the tip of the master?
-->

**What happened**
**Object Storage Provider**:

**What you expected to happen**
**What happened**:

**What you expected to happen**:

**How to reproduce it (as minimally and precisely as possible)**:

**Full logs to relevant components**
**Full logs to relevant components**:

<!--
Uncomment if you would like to post collapsible logs:
Expand All @@ -39,7 +41,7 @@ Uncomment if you would like to post collapsible logs:
</details>
-->

**Anything else we need to know**
**Anything else we need to know**:

<!--
Uncomment and fill if you use not casual environment or if it might be relevant.
Expand Down
5 changes: 4 additions & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@
about what components it touches e.g "query:" or ".*:"
-->

* [] CHANGELOG entry if change is relevant to the end user.
<!-- Don't forget about CHANGELOG! -->

* [] I added CHANGELOG entry for this change.
* [] Change is not relevant to the end user.

## Changes

Expand Down
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ Thanos is a set of components that can be composed into a highly available metri
system with unlimited storage capacity, which can be added seamlessly on top of existing
Prometheus deployments.

Thanos is a [CNCF](https://www.cncf.io/) Sandbox project.

Thanos leverages the Prometheus 2.0 storage format to cost-efficiently store historical metric
data in any object storage while retaining fast query latencies. Additionally, it provides
a global query view across all Prometheus installations and can merge data from Prometheus
Expand All @@ -23,16 +25,12 @@ Concretely the aims of the project are:
1. Unlimited retention of metrics.
1. High availability of components, including Prometheus.

## Architecture Overview

![architecture_overview](docs/img/arch.jpg)

## Getting Started

* **[Getting Started](https://thanos.io/getting-started.md/)**
* [Design](https://thanos.io/design.md/)
* [Prom Meetup Slides](https://www.slideshare.net/BartomiejPotka/thanos-global-durable-prometheus-monitoring)
* [Introduction blog post](https://improbable.io/games/blog/thanos-prometheus-at-scale)
* [Blog posts](docs/getting-started.md#blog-posts)
* [Talks](docs/getting-started.md#talks)
* [Proposals](docs/proposals)
* [Integrations](docs/integrations.md)

Expand All @@ -48,6 +46,10 @@ Concretely the aims of the project are:
* Simple gRPC "Store API" for unified data access across all metric data
* Easy integration points for custom metric providers

## Architecture Overview

![architecture_overview](docs/img/arch.jpg)

## Thanos Philosophy

The philosophy of Thanos and our community is borrowing much from UNIX philosophy and the golang programming language.
Expand Down
66 changes: 55 additions & 11 deletions docs/components/query.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,67 @@ type: docs
menu: components
---

# Query
# Querier/Query

The query component implements the Prometheus HTTP v1 API to query data in a Thanos cluster via PromQL.
The Querier component (also known as "Query") implements the [Prometheus HTTP v1 API](https://prometheus.io/docs/prometheus/latest/querying/api/) to query data in a Thanos cluster via PromQL.

It gathers the data needed to evaluate the query from underlying StoreAPIs. See [here](../service-discovery.md)
on how to connect querier with desired StoreAPIs.
In short, it gathers the data needed to evaluate the query from underlying [StoreAPIs](../../pkg/store/storepb/rpc.proto), evaluates the query and returns the result.

Querier currently is fully stateless and horizontally scalable.
Querier is fully stateless and horizontally scalable.

Example command to run Querier:

```bash
$ thanos query \
--http-address "0.0.0.0:9090" \
--store "<store-api>:<grpc-port>" \
--store "<store-api2>:<grpc-port>"
```
## Querier use cases, why do I need this component?

Thanos Querier essentially allows to aggregate and optionally deduplicate multiple metrics backends under single Prometheus Query endpoint.

### Global View

Since for Querier "a backend" is anything that implements gRPC StoreAPI we can aggregate data from any number of the different storages like:

* Prometheus (see [Sidecar](sidecar.md))
* Object Storage (see [Store Gateway](store.md))
* Global alerting/recording rules evaluations (see [Ruler](rule.md))
* Metrics received from Prometheus remote write streams (see [Thanos Receiver](../proposals/201812_thanos-remote-receive.md))
* Another Querier (you can stack Queriers on top of each other)
* Non-Prometheus systems!
* e.g [OpenTSDB](../integrations.md#opentsdb)

Thanks to that, you can run queries (manually, from Grafana or via Alerting rule) that aggregate metrics from mix of those sources.

Some examples:

* `sum(cpu_used{cluster=~"cluster-(eu1|eu2|eu3|us1|us2|us3)", job="service1"})` that will give you sum of CPU used inside all listed clusters for service `service1`. This will work
even if those clusters runs multiple Prometheus servers each. Querier will know which data sources to query.

* In single cluster you shard Prometheus functionally or have different Prometheus instances for different tenants. You can spin up Querier to have access to both within single Query evaluation.

### Run-time deduplication of HA groups

## Deduplication
Prometheus is stateful and does not allow replicating its database. This means that increasing high availability by running multiple Prometheus replicas is not very easy to use.
Simple loadbalancing will not work as for example after some crash, replica might be up but querying such replica will result in small gap during the period it was down. You have a
second replica that maybe was up, but it could be down in other moment (e.g rolling restart), so load balancing on top of those is not working well.

Thanos Querier instead pulls the data from both replicas, and deduplicate those signals, filling the gaps if any, transparently to the Querier consumer.

## Metric Query Flow Overview

<img src="../img/querier.svg" class="img-fluid" alt="querier-steps" />

Overall QueryAPI exposed by Thanos is guaranteed to be compatible with [Prometheus 2.x. API](https://prometheus.io/docs/prometheus/latest/querying/api/).
The above diagram shows what Querier does for each Prometheus query request.

See [here](../service-discovery.md) on how to connect Querier with desired StoreAPIs.

<!--- TODO explain steps --->

### Deduplication

The query layer can deduplicate series that were collected from high-availability pairs of data sources such as Prometheus.
A fixed single or multiple replica labels must be chosen for the entire cluster and can then be passed to query nodes on startup.
Expand Down Expand Up @@ -73,16 +117,17 @@ $ thanos query \

This logic can also be controlled via parameter on QueryAPI. More details below.

## Query API

Overall QueryAPI exposed by Thanos is guaranteed to be compatible with Prometheus 2.x.
## Query API Overview

However, for additional Thanos features, Thanos, on top of Prometheus adds
As mentioned, Query API exposed by Thanos is guaranteed to be compatible with [Prometheus 2.x. API](https://prometheus.io/docs/prometheus/latest/querying/api/).
However for additional Thanos features on top of Prometheus, Thanos adds:

* partial response behaviour
* several additional parameters listed below
* custom response fields.

Let's walk through all of those extensions:

### Partial Response

QueryAPI and StoreAPI has additional behaviour controlled via query parameter called [PartialResponseStrategy](/pkg/store/storepb/rpc.pb.go).
Expand Down Expand Up @@ -169,7 +214,6 @@ type queryData struct {
Additional field is `Warnings` that contains every error that occurred that is assumed non critical. `partial_response`
option controls if storeAPI unavailability is considered critical.


## Expose UI on a sub-path

It is possible to expose thanos-query UI and optionally API on a sub-path.
Expand Down
1 change: 0 additions & 1 deletion docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,6 @@ For example, rule sets can be divided across multiple HA pairs of rule nodes. St

Overall, first-class horizontal sharding is possible but will not be considered for the time being since there's no evidence that it is required in practical setups.


## Cost

The only extra cost Thanos adds to an existing Prometheus setup is essentially the price of storing and querying data from the object storage and running of the store node.
Expand Down
Loading

0 comments on commit f1d3e63

Please sign in to comment.