Optimizing the write path for mixed storage v1/v2 state #6474

yurishkuro · 2025-01-03T17:29:10Z

I was thinking more about this. In both the read and write paths, we want to avoid introducing obvious inefficiencies by requiring multiple data transformations. This applies not just to Jaeger v2 but also Jaeger v1, since many user are still running it at scale and model transformations are the major source of performance overhead (esp. memory allocations).

Take the write paths:

Legacy (1):

graph LR
  Client -->|model| Collector
  Collector --> |model| Storage[Storage v1]
  Storage --> |dbmodel| Database[(Database)]

OTLP (2):

graph LR
  Client -->|OTLP| Receiver
  Receiver --> |model| Collector
  Collector --> |model| Storage[Storage v1]
  Storage --> |dbmodel| Database[(Database)]

In these two examples most model transformations are necessary, although out could argue that in the OTLP case it should be possible to bypass the model part and go directly OTLP --> dbmodel. This is what Storage v2 API gives us:

OTLP with v2 storage (3):

graph LR
  Client -->|OTLP| Receiver
  Receiver --> |OTLP| Collector
  Collector --> |OTLP| Storage[Storage v2]
  Storage --> |dbmodel| Database[(Database)]

This change requires the v1 collector pipeline to support OTLP as the payload, which it does not. If we upgrade it (just the collector part), but still use the underlying v1 storage implementations, then the OTLP path still looks ok:

OTLP with v1 storage pretending to be v2 storage (4):

graph LR
  Client -->|OTLP| Receiver
  Receiver --> |OTLP| Collector
  Collector --> |OTLP| Storage2[Storage Adapter v2]
  Storage2 --> |model| Storage1[Storage v1]
  Storage1 --> |dbmodel| Database[(Database)]

(4) has the same amount of transformations as (2), so no regression. But (1) not looks bad:

Legacy with v1 storage pretending to be v2 storage (5):

graph LR
  Client -->|model| Receiver
  Receiver --> |OTLP| Collector
  Collector --> |OTLP| Storage2[Storage Adapter v2]
  Storage2 --> |model| Storage1[Storage v1]
  Storage1 --> |dbmodel| Database[(Database)]

Here we introduced an unnecessary transformation into OTLP that makes the path less efficient. This will improve once the storage is upgraded to v2 proper, but that will take some time.

My proposal is to consider upgrading the internal pipeline to support both model and OTLP simultaneously, and also to utilize the fact that the storage v2 might be an adapter over v1.

graph LR
  Client -->|model or OTLP| Collector
  Collector --> |model or OTLP| Processor{Processor}
  Processor --> |model| Storage1[Storage v2 Adapter over v1]
  Processor --> |OTLP| Storage2[Storage v2]
  Storage1 --> |dbmodel| Database[(Database)]
  Storage2 --> |dbmodel| Database[(Database)]

The text was updated successfully, but these errors were encountered:

## Which problem is this PR solving? - Part of #6474 ## Description of the changes - Extend SpanProcessor interface to carry either v1 or v2 spans ## How was this change tested? - CI --------- Signed-off-by: Yuri Shkuro <[email protected]>

## Which problem is this PR solving? - Continuation of #6474 ## Description of the changes - In order to allow the queue to carry both v1 and v2 data model, let's first make the queue strongly typed by using generics ## How was this change tested? - unit tests, CI --------- Signed-off-by: Yuri Shkuro <[email protected]>

…6484) ## Which problem is this PR solving? - Part of jaegertracing#6474 ## Description of the changes - Extend SpanProcessor interface to carry either v1 or v2 spans ## How was this change tested? - CI --------- Signed-off-by: Yuri Shkuro <[email protected]> Signed-off-by: adityachopra29 <[email protected]>

## Which problem is this PR solving? - Continuation of jaegertracing#6474 ## Description of the changes - In order to allow the queue to carry both v1 and v2 data model, let's first make the queue strongly typed by using generics ## How was this change tested? - unit tests, CI --------- Signed-off-by: Yuri Shkuro <[email protected]> Signed-off-by: adityachopra29 <[email protected]>

## Which problem is this PR solving? - Part of #6487 - Part of #6474 ## Description of the changes - Swap v1 spanWriter for v2 traceWriter in collector pipeline - Currently the traceWriter is provided via v1 adapter, so it's always v1 writer underneath - And since only v1 spans entry point is currently implemented, there is no performance impact from additional data transformations - However, as soon as OTLP entry point is utilized (e.g. via OTLP receiver), the `ptrace.Traces` batch will be handled via exporterhelp queue as a single item (not broken into individual spans) and then passed directly to the writer as a batch. Since the writer is implemented via adapter the batch will be converted to spans and written one span at a time. There will be no additional data transformations on this path either. ## How was this change tested? - CI ## Outstanding - [x] Invoking proper preprocessing, like sanitizers and collector tags, on the OTLP path - [x] Adequate metrics parity, ideally same as v1 collector - [ ] Test coverage, including passing a v2-like (mock) writer that cannot be downgraded to v1 - Idea: parameterize some tests (ideally those that also validate pre-processing) to execute both v1 and v2 write paths ## Follow-up PRs * Enable v2 write path from OTLP and Zipkin receivers (they currently explicitly downgrade to v1). This will also allow adding better unit tests. --------- Signed-off-by: Yuri Shkuro <[email protected]> Signed-off-by: Yuri Shkuro <[email protected]>

## Which problem is this PR solving? - Part of jaegertracing#6487 - Part of jaegertracing#6474 ## Description of the changes - Swap v1 spanWriter for v2 traceWriter in collector pipeline - Currently the traceWriter is provided via v1 adapter, so it's always v1 writer underneath - And since only v1 spans entry point is currently implemented, there is no performance impact from additional data transformations - However, as soon as OTLP entry point is utilized (e.g. via OTLP receiver), the `ptrace.Traces` batch will be handled via exporterhelp queue as a single item (not broken into individual spans) and then passed directly to the writer as a batch. Since the writer is implemented via adapter the batch will be converted to spans and written one span at a time. There will be no additional data transformations on this path either. ## How was this change tested? - CI ## Outstanding - [x] Invoking proper preprocessing, like sanitizers and collector tags, on the OTLP path - [x] Adequate metrics parity, ideally same as v1 collector - [ ] Test coverage, including passing a v2-like (mock) writer that cannot be downgraded to v1 - Idea: parameterize some tests (ideally those that also validate pre-processing) to execute both v1 and v2 write paths ## Follow-up PRs * Enable v2 write path from OTLP and Zipkin receivers (they currently explicitly downgrade to v1). This will also allow adding better unit tests. --------- Signed-off-by: Yuri Shkuro <[email protected]> Signed-off-by: Yuri Shkuro <[email protected]>

dosubot bot added area/storage performance v2 labels Jan 3, 2025

This was referenced Jan 3, 2025

Migrate Query Service Handlers to Use v2 Query Service #6460

Closed

Refactor collector pipeline to allow v1/v2 data model #6484

Merged

yurishkuro added changelog:skip Trivial change that does not require an entry in CHANGELOG and removed changelog:skip Trivial change that does not require an entry in CHANGELOG labels Jan 5, 2025

yurishkuro mentioned this issue Jan 5, 2025

Change collector's queue to use generics #6486

Merged

yurishkuro mentioned this issue Jan 6, 2025

Switch v1 collector pipeline to v2 Writer #6491

Merged

3 tasks

yurishkuro mentioned this issue Jan 12, 2025

Switch v1 receivers to use v2 write path #6532

Merged

yurishkuro closed this as completed in #6532 Jan 15, 2025

yurishkuro closed this as completed in 0b5f8b1 Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing the write path for mixed storage v1/v2 state #6474

Optimizing the write path for mixed storage v1/v2 state #6474

yurishkuro commented Jan 3, 2025

Optimizing the write path for mixed storage v1/v2 state #6474

Optimizing the write path for mixed storage v1/v2 state #6474

Comments

yurishkuro commented Jan 3, 2025