Skip to content

Commit

Permalink
Update Perf Tuning page for v2 (#806)
Browse files Browse the repository at this point in the history
Signed-off-by: Yuri Shkuro <[email protected]>
  • Loading branch information
yurishkuro authored Nov 28, 2024
1 parent 448561c commit 67dc2a5
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 8 deletions.
17 changes: 10 additions & 7 deletions content/docs/next-release-v2/performance-tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,30 @@ description: Tweaking your Jaeger instance to achieve a better performance
hasparent: true
---

Jaeger was built from day 1 to be able to ingest huge amounts of data in a resilient way. To better utilize resources that might cause delays, such as storage or network communications, Jaeger buffers and batches data. When more spans are generated than Jaeger is able to safely process, spans might get dropped. However, the defaults might not fit all scenarios.
Jaeger was built to be able to ingest huge amounts of data in a resilient way. To better utilize resources that might cause delays, such as storage or network communications, Jaeger buffers and batches data. When more spans are generated than Jaeger is able to safely process, spans might get dropped. However, the defaults might not fit all scenarios.

Since Jaeger v2 is based on the OpenTelemetry Collector, most of the advice in the [Scaling the Collector documentation](https://opentelemetry.io/docs/collector/scaling/) applies to Jaeger as well.

## Deployment considerations

Although performance tuning the individual components is important, the way Jaeger is deployed can be decisive in obtaining optimal performance.

### Scale the Collector up and down
## Scale the Collector up and down

Use the auto-scaling capabilities of your platform: **jaeger-collector** is nearly horizontally scalable so that more instances can be added and removed on-demand.

Adding **jaeger-collector** instances is recommended when your platform provides auto-scaling capabilities, or when it's easier to start/stop **jaeger-collector** instances than changing existing, running instances. Scaling horizontally is also indicated when the CPU usage should be spread across nodes.

### Make sure the storage can keep up
## Make sure the storage can keep up

{{< rawhtml >}}<!-- TODO: fix me once the latency metric is available -->{{< /rawhtml >}}
{{< danger >}}
The metric `jaeger_collector_save_latency_bucket` mentioned below is not yet available in Jaeger v2.
{{< /danger >}}

Each span is written to the storage by **jaeger-collector** using one worker, blocking it until the span has been stored. When the storage is too slow, the number of workers blocked by the storage might be too high, causing spans to be dropped. To help diagnose this situation, the histogram `jaeger_collector_save_latency_bucket` can be analyzed. Ideally, the latency should remain the same over time. When the histogram shows that most spans are taking longer and longer over time, it’s a good indication that your storage might need some attention.

### Consider using Apache Kafka as intermediate buffer
## Consider using Kafka as intermediate buffer

Jaeger [can use Apache Kafka](../architecture/) as a buffer between **jaeger-collector** and the actual backing storage (Elasticsearch, Apache Cassandra). This is ideal for cases where the traffic spikes are relatively frequent (prime time traffic) but the storage can eventually catch up once the traffic normalizes. For that, the `SPAN_STORAGE_TYPE` environment variable should be set to `kafka` in **jaeger-collector**, and **jaeger-ingester** component must be used, reading data from Kafka and writing it to the storage.
Jaeger [can use Apache Kafka](../architecture/) as a buffer between **jaeger-collector** and the actual backing storage (Elasticsearch, Apache Cassandra). This is ideal for cases where the traffic spikes are relatively frequent (prime time traffic) but the storage can eventually catch up once the traffic normalizes. Please refer to the [Kafka page](../kafka/) for details on configuring this deployment.

In addition to the performance aspects, having spans written to Kafka is useful for building real time data pipeline for aggregations and feature extraction from traces.

Expand Down
2 changes: 1 addition & 1 deletion content/docs/next-release-v2/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,5 +36,5 @@ At default settings the service listens on the following port(s):

Port | Protocol | Function
----- | ------- | ---
17271 | gRPC | [Remote Storage API][storage.proto]
17271 | gRPC | [Remote Storage API](../apis/#remote-storage-api)
17270 | HTTP | admin port: health check at `/` and metrics at `/metrics`

0 comments on commit 67dc2a5

Please sign in to comment.