Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][v2][query] Create archive reader/writer using regular factory methods #6519

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

mahadzaryab1
Copy link
Collaborator

Which problem is this PR solving?

Description of the changes

How was this change tested?

Checklist

@mahadzaryab1
Copy link
Collaborator Author

mahadzaryab1 commented Jan 10, 2025

@yurishkuro should this be marked as a breaking change? the es storage wont use the -archive alias anymore for jaeger-v2

Copy link

codecov bot commented Jan 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 50.27%. Comparing base (0b5f8b1) to head (d18a211).
Report is 3 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (0b5f8b1) and HEAD (d18a211). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (0b5f8b1) HEAD (d18a211)
unittests 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #6519       +/-   ##
===========================================
- Coverage   96.25%   50.27%   -45.98%     
===========================================
  Files         372      188      -184     
  Lines       21371    11411     -9960     
===========================================
- Hits        20570     5737    -14833     
- Misses        610     5219     +4609     
- Partials      191      455      +264     
Flag Coverage Δ
badger_v1 10.65% <ø> (ø)
badger_v2 2.78% <ø> (ø)
cassandra-4.x-v1-manual 16.55% <ø> (ø)
cassandra-4.x-v2-auto 2.71% <ø> (ø)
cassandra-4.x-v2-manual 2.71% <ø> (ø)
cassandra-5.x-v1-manual 16.55% <ø> (ø)
cassandra-5.x-v2-auto 2.71% <ø> (ø)
cassandra-5.x-v2-manual 2.71% <ø> (ø)
elasticsearch-6.x-v1 20.34% <ø> (+<0.01%) ⬆️
elasticsearch-7.x-v1 20.41% <ø> (ø)
elasticsearch-8.x-v1 20.57% <ø> (ø)
elasticsearch-8.x-v2 2.78% <ø> (-0.10%) ⬇️
grpc_v1 12.16% <ø> (-0.01%) ⬇️
grpc_v2 9.03% <ø> (-0.01%) ⬇️
kafka-3.x-v1 10.33% <ø> (ø)
kafka-3.x-v2 2.78% <ø> (ø)
memory_v2 2.78% <ø> (ø)
opensearch-1.x-v1 20.46% <ø> (+<0.01%) ⬆️
opensearch-2.x-v1 20.45% <ø> (ø)
opensearch-2.x-v2 2.77% <ø> (-0.01%) ⬇️
tailsampling-processor 0.51% <ø> (ø)
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Mahad Zaryab <[email protected]>
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a breaking change implied here, in the sense that if the users do not set the config options appropriately the new storage objects may not behave the same way as archive objects did previously. I recommend handling it with a feature gate from OTEL. We can declare the feature with Beta state (i.e. enabled right away, mark as breaking change), but the users have the option to turn it off manually and fall back onto the old behavior that we should preserve for now. Then in the next release we set the feature to Stable where turning it off will give a runtime error (could also be labeled a breaking change) so the new behavior is the only one possible. And then in the following release we remove the feature altogether along with the legacy code.

example in cmd/jaeger/internal/extension/remotesampling/config.go

if ar != nil && aw != nil {
v2opts.ArchiveTraceReader = ar
v2opts.ArchiveTraceWriter = aw
if reader != nil && writer != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this always true now? regular Create methods are not allowed to return nil without error

yurishkuro
yurishkuro previously approved these changes Jan 15, 2025
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@yurishkuro yurishkuro changed the title [WIP][v2][query] Change jaegerquery extension to use primary reader/writer for archive storage [WIP][v2][query] Create archive reader/writer using regular factory methods Jan 15, 2025
@yurishkuro
Copy link
Member

I have a feeling we will still need to address the question of removing create-archive methods later because as we start LFX project and being implementing v2 API in the storage directly, we will run into issue with Elasticsearch implementation.

@yurishkuro yurishkuro dismissed their stale review January 15, 2025 02:47

We need to implement a feature gate

@yurishkuro
Copy link
Member

I think this is breaking backwards compatibility, is it not? I.e. to get the equivalent behavior as the previous version the user needs to set some parameters (which ones, what combination?)

@mahadzaryab1
Copy link
Collaborator Author

mahadzaryab1 commented Jan 16, 2025

Test Report

1. Establish Ground Truth on main

To begin, configure the storage and query settings in the all-in-one.yaml file as follows:

jaeger_storage:
  backends:
    some_storage:
      elasticsearch:
        indices:
          index_prefix: "jaeger-main"
          spans:
            date_layout: "2006-01-02"
            rollover_frequency: "day"
            shards: 5
            replicas: 1
          services:
            date_layout: "2006-01-02"
            rollover_frequency: "day"
            shards: 5
            replicas: 1
          dependencies:
            date_layout: "2006-01-02"
            rollover_frequency: "day"
            shards: 5
            replicas: 1
          sampling:
            date_layout: "2006-01-02"
            rollover_frequency: "day"
            shards: 5
            replicas: 1
    another_storage:
      elasticsearch:
        indices:
          index_prefix: "jaeger-archive"

2. Spin Up Elasticsearch Using Docker

To bring up Elasticsearch, run the following commands:

jaeger % cd docker-compose/elasticsearch/v8 
v8 % docker compose up

3. Start Jaeger

Start the Jaeger service:

jaeger % go run ./cmd/jaeger

4. Archive a Trace

From the Jaeger UI, select a trace and archive it.

traceID: 0dc3e460bd9b8e0dddfa29a2f751cfb9

Trace Screenshot

5. Update index_prefix

Stop Jaeger and modify the index_prefix in the primary configuration to jaeger-main-1. This ensures the query for the same trace is no longer found in the primary storage but will be found in the archive storage.

6. Query for the Same Trace

curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.data[].spans[] | {traceID, operationName}'
{
  "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "operationName": "/api/services"
}
{
  "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "operationName": "GetService"
}
curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.result.resourceSpans[].scopeSpans[].spans[] | {traceId, name}'
{
  "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "name": "GetService"
}
{
  "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "name": "/api/services"
}

7. Test Changes from This PR

Stop Jaeger, checkout this PR, and restart Jaeger:

gh pr checkout 6519
go run ./cmd/jaeger

8. Query for the Same Trace After PR Changes

curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq .
{
  "data": null,
  "total": 0,
  "limit": 0,
  "offset": 0,
  "errors": [
    {
      "code": 404,
      "msg": "trace not found"
    }
  ]
}
curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq .
{
  "error": {
    "httpCode": 404,
    "message": "No traces found"
  }
}

🛑 This is where the breaking change occurs 🛑

9. Mitigation for Users

Set use_aliases for Archive Storage

To mitigate the issue, set the use_aliases configuration for your archive storage to true. Update the configuration as follows:

another_storage:
  elasticsearch:
    indices:
      index_prefix: "jaeger-archive"
    use_aliases: true

Add Alias from Old Index to New Index

To ensure backwards compatibility, add an alias from the old index to the new index. You can query the current set of aliases in Elasticsearch with:

curl -X GET "http://localhost:9200/_aliases?pretty"
{
  "jaeger-main-1-jaeger-span-2025-01-16": { "aliases": {} },
  "jaeger-main-2-jaeger-span-2025-01-16": { "aliases": {} },
  "jaeger-archive-jaeger-span-archive": { "aliases": {} },
  "jaeger-main-1-jaeger-service-2025-01-16": { "aliases": {} }
}

To link the new index (jaeger-archive-jaeger-span-read) with the old index (jaeger-archive-jaeger-span-archive), run the following:

curl -X POST "http://localhost:9200/_aliases" -H 'Content-Type: application/json' -d'
{
  "actions": [
    {
      "add": {
        "index": "jaeger-archive-jaeger-span-archive",
        "alias": "jaeger-archive-jaeger-span-read"
      }
    }
  ]
}'

Confirm that the alias has been added:

curl -X GET "http://localhost:9200/_aliases?pretty"
{
  "jaeger-main-1-jaeger-span-2025-01-16": { "aliases": {} },
  "jaeger-archive-jaeger-span-archive": { "aliases": { "jaeger-archive-jaeger-span-read": {} } },
  "jaeger-main-2-jaeger-span-2025-01-16": { "aliases": {} },
  "jaeger-main-2-jaeger-service-2025-01-16": { "aliases": {} },
  "jaeger-main-1-jaeger-service-2025-01-16": { "aliases": {} }
}

10. Restart Jaeger and Retry the Query

Finally, restart Jaeger and run the same trace queries again:

curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.data[].spans[] | {traceID, operationName}'
{
  "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "operationName": "/api/services"
}
{
  "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "operationName": "GetService"
}
curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.result.resourceSpans[].scopeSpans[].spans[] | {traceId, name}'
{
  "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "name": "/api/services"
}
{
  "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "name": "GetService"
}

Signed-off-by: Mahad Zaryab <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants