Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing the write path for mixed storage v1/v2 state #6474

Closed
yurishkuro opened this issue Jan 3, 2025 · 0 comments · Fixed by #6532
Closed

Optimizing the write path for mixed storage v1/v2 state #6474

yurishkuro opened this issue Jan 3, 2025 · 0 comments · Fixed by #6532

Comments

@yurishkuro
Copy link
Member

I was thinking more about this. In both the read and write paths, we want to avoid introducing obvious inefficiencies by requiring multiple data transformations. This applies not just to Jaeger v2 but also Jaeger v1, since many user are still running it at scale and model transformations are the major source of performance overhead (esp. memory allocations).

Take the write paths:

Legacy (1):

graph LR
  Client -->|model| Collector
  Collector --> |model| Storage[Storage v1]
  Storage --> |dbmodel| Database[(Database)]
Loading

OTLP (2):

graph LR
  Client -->|OTLP| Receiver
  Receiver --> |model| Collector
  Collector --> |model| Storage[Storage v1]
  Storage --> |dbmodel| Database[(Database)]
Loading

In these two examples most model transformations are necessary, although out could argue that in the OTLP case it should be possible to bypass the model part and go directly OTLP --> dbmodel. This is what Storage v2 API gives us:

OTLP with v2 storage (3):

graph LR
  Client -->|OTLP| Receiver
  Receiver --> |OTLP| Collector
  Collector --> |OTLP| Storage[Storage v2]
  Storage --> |dbmodel| Database[(Database)]
Loading

This change requires the v1 collector pipeline to support OTLP as the payload, which it does not. If we upgrade it (just the collector part), but still use the underlying v1 storage implementations, then the OTLP path still looks ok:

OTLP with v1 storage pretending to be v2 storage (4):

graph LR
  Client -->|OTLP| Receiver
  Receiver --> |OTLP| Collector
  Collector --> |OTLP| Storage2[Storage Adapter v2]
  Storage2 --> |model| Storage1[Storage v1]
  Storage1 --> |dbmodel| Database[(Database)]
Loading

(4) has the same amount of transformations as (2), so no regression. But (1) not looks bad:

Legacy with v1 storage pretending to be v2 storage (5):

graph LR
  Client -->|model| Receiver
  Receiver --> |OTLP| Collector
  Collector --> |OTLP| Storage2[Storage Adapter v2]
  Storage2 --> |model| Storage1[Storage v1]
  Storage1 --> |dbmodel| Database[(Database)]
Loading

Here we introduced an unnecessary transformation into OTLP that makes the path less efficient. This will improve once the storage is upgraded to v2 proper, but that will take some time.

My proposal is to consider upgrading the internal pipeline to support both model and OTLP simultaneously, and also to utilize the fact that the storage v2 might be an adapter over v1.

graph LR
  Client -->|model or OTLP| Collector
  Collector --> |model or OTLP| Processor{Processor}
  Processor --> |model| Storage1[Storage v2 Adapter over v1]
  Processor --> |OTLP| Storage2[Storage v2]
  Storage1 --> |dbmodel| Database[(Database)]
  Storage2 --> |dbmodel| Database[(Database)]
Loading
@yurishkuro yurishkuro added changelog:skip Trivial change that does not require an entry in CHANGELOG and removed changelog:skip Trivial change that does not require an entry in CHANGELOG labels Jan 5, 2025
yurishkuro added a commit that referenced this issue Jan 5, 2025
## Which problem is this PR solving?
- Part of #6474

## Description of the changes
- Extend SpanProcessor interface to carry either v1 or v2 spans

## How was this change tested?
- CI

---------

Signed-off-by: Yuri Shkuro <[email protected]>
yurishkuro added a commit that referenced this issue Jan 6, 2025
## Which problem is this PR solving?
- Continuation of #6474

## Description of the changes
- In order to allow the queue to carry both v1 and v2 data model, let's
first make the queue strongly typed by using generics

## How was this change tested?
- unit tests, CI

---------

Signed-off-by: Yuri Shkuro <[email protected]>
adityachopra29 pushed a commit to adityachopra29/jaeger that referenced this issue Jan 9, 2025
…6484)

## Which problem is this PR solving?
- Part of jaegertracing#6474

## Description of the changes
- Extend SpanProcessor interface to carry either v1 or v2 spans

## How was this change tested?
- CI

---------

Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: adityachopra29 <[email protected]>
adityachopra29 pushed a commit to adityachopra29/jaeger that referenced this issue Jan 9, 2025
## Which problem is this PR solving?
- Continuation of jaegertracing#6474

## Description of the changes
- In order to allow the queue to carry both v1 and v2 data model, let's
first make the queue strongly typed by using generics

## How was this change tested?
- unit tests, CI

---------

Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: adityachopra29 <[email protected]>
yurishkuro added a commit that referenced this issue Jan 12, 2025
## Which problem is this PR solving?
- Part of #6487
- Part of #6474

## Description of the changes
- Swap v1 spanWriter for v2 traceWriter in collector pipeline
- Currently the traceWriter is provided via v1 adapter, so it's always
v1 writer underneath
- And since only v1 spans entry point is currently implemented, there is
no performance impact from additional data transformations
- However, as soon as OTLP entry point is utilized (e.g. via OTLP
receiver), the `ptrace.Traces` batch will be handled via exporterhelp
queue as a single item (not broken into individual spans) and then
passed directly to the writer as a batch. Since the writer is
implemented via adapter the batch will be converted to spans and written
one span at a time. There will be no additional data transformations on
this path either.

## How was this change tested?
- CI

## Outstanding
- [x] Invoking proper preprocessing, like sanitizers and collector tags,
on the OTLP path
- [x] Adequate metrics parity, ideally same as v1 collector
- [ ] Test coverage, including passing a v2-like (mock) writer that
cannot be downgraded to v1
- Idea: parameterize some tests (ideally those that also validate
pre-processing) to execute both v1 and v2 write paths

## Follow-up PRs
* Enable v2 write path from OTLP and Zipkin receivers (they currently
explicitly downgrade to v1). This will also allow adding better unit
tests.

---------

Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
ekefan pushed a commit to ekefan/jaeger that referenced this issue Jan 14, 2025
## Which problem is this PR solving?
- Part of jaegertracing#6487
- Part of jaegertracing#6474

## Description of the changes
- Swap v1 spanWriter for v2 traceWriter in collector pipeline
- Currently the traceWriter is provided via v1 adapter, so it's always
v1 writer underneath
- And since only v1 spans entry point is currently implemented, there is
no performance impact from additional data transformations
- However, as soon as OTLP entry point is utilized (e.g. via OTLP
receiver), the `ptrace.Traces` batch will be handled via exporterhelp
queue as a single item (not broken into individual spans) and then
passed directly to the writer as a batch. Since the writer is
implemented via adapter the batch will be converted to spans and written
one span at a time. There will be no additional data transformations on
this path either.

## How was this change tested?
- CI

## Outstanding
- [x] Invoking proper preprocessing, like sanitizers and collector tags,
on the OTLP path
- [x] Adequate metrics parity, ideally same as v1 collector
- [ ] Test coverage, including passing a v2-like (mock) writer that
cannot be downgraded to v1
- Idea: parameterize some tests (ideally those that also validate
pre-processing) to execute both v1 and v2 write paths

## Follow-up PRs
* Enable v2 write path from OTLP and Zipkin receivers (they currently
explicitly downgrade to v1). This will also allow adding better unit
tests.

---------

Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant