Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransformProcessor doesn't set TraceID and SpanID within LogContext #32190

Closed
AlessandroFazio opened this issue Apr 5, 2024 · 6 comments
Closed
Labels
bug Something isn't working processor/transform Transform processor Stale

Comments

@AlessandroFazio
Copy link

AlessandroFazio commented Apr 5, 2024

Component(s)

processor/transform

What happened?

Description

Hello, I have have setup a pipeline with the transform processor to extract TraceID and SpanId data from traceparent http header and set the trace_id and span_id in the LogContext.

     processors:
      transform/envoyal:
        error_mode: ignore
        log_statements:
          - context: log
            statements:
              - set(attributes["trace-id"], Substring(attributes["traceparent"], 3, 16))
              - set(attributes["parent-id"], Substring(attributes["traceparent"], 20, 8))
              - set(trace_id.string, attributes["trace-id"])
              - set(span_id.string, attributes["parent-id"])
              # - delete_key(attributes, "traceparent")
              # - delete_key(attributes, "trace-id")
              # - delete_key(attributes, "trace-id")

The key here is:

  - set(trace_id.string, attributes["trace-id"])
  - set(span_id.string, attributes["parent-id"])

Steps to Reproduce

Expected Result

This is the payload I expect:

LogRecord #0
ObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-04-05 14:38:49.070989 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str([2024-04-05T14:38:49.070Z] \"GET /api/v1/call-me HTTP/1.1\" 200 - via_upstream - \"-\" 0 55 2 2 \"-\" \"Java/1.8.0_402\" \"a64cd4db-7988-931a-a0d3-2ebde5a7735c\" \"inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc:8080\" \"10.244.0.17:8080\" outbound|8080||inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc.cluster.local 10.244.0.16:37036 10.96.67.20:8080 10.244.0.16:59646 - default
)
Attributes:
     -> x_forwarded_for: Str(-)
     -> authority: Str(inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc:8080)
     -> bytes_received: Str(0)
     -> bytes_sent: Str(55)
     -> downstream_local_address: Str(10.96.67.20:8080)
     -> downstream_remote_address: Str(10.244.0.16:59646)
     -> duration: Str(2)
     -> dyn_meta: Str(-)
     -> method: Str(GET)
     -> path: Str(/api/v1/call-me)
     -> user_agent: Str(Java/1.8.0_402)
     -> protocol: Str(HTTP/1.1)
     -> requested_server_name: Str(-)
     -> response_code: Str(200)
     -> response_flags: Str(-)
     -> route_name: Str(default)
     -> services: Str([{\"host\":\"inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc.cluster.local\",\"namespace\":\"esol-ap12345-otel-poc1-dev-namespace\",\"name\":\"inventory-service\"}])
     -> start_time: Str(2024-04-05T14:38:49.070Z)
     -> traceparent: Str(00-1e2c3bff3f92d79cf8d1140f6af32f69-c784b919dc6de35e-01)
     -> upstream_cluster: Str(outbound|8080||inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc.cluster.local)
     -> upstream_host: Str(10.244.0.17:8080)
     -> upstream_local_address: Str(10.244.0.16:37036)
     -> upstream_service_time: Str(2)
     -> upstream_transport_failure_reason: Str(-)
     -> trace-id: Str(1e2c3bff3f92d79c)
     -> parent-id: Str(8d1140f6)
Trace ID: Str(1e2c3bff3f92d79c)
Span ID: Str(8d1140f6)
Flags: 0

Actual Result

This is the payload I expect:

LogRecord #0
ObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-04-05 14:38:49.070989 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str([2024-04-05T14:38:49.070Z] \"GET /api/v1/call-me HTTP/1.1\" 200 - via_upstream - \"-\" 0 55 2 2 \"-\" \"Java/1.8.0_402\" \"a64cd4db-7988-931a-a0d3-2ebde5a7735c\" \"inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc:8080\" \"10.244.0.17:8080\" outbound|8080||inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc.cluster.local 10.244.0.16:37036 10.96.67.20:8080 10.244.0.16:59646 - default
)
Attributes:
     -> x_forwarded_for: Str(-)
     -> authority: Str(inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc:8080)
     -> bytes_received: Str(0)
     -> bytes_sent: Str(55)
     -> downstream_local_address: Str(10.96.67.20:8080)
     -> downstream_remote_address: Str(10.244.0.16:59646)
     -> duration: Str(2)
     -> dyn_meta: Str(-)
     -> method: Str(GET)
     -> path: Str(/api/v1/call-me)
     -> user_agent: Str(Java/1.8.0_402)
     -> protocol: Str(HTTP/1.1)
     -> requested_server_name: Str(-)
     -> response_code: Str(200)
     -> response_flags: Str(-)
     -> route_name: Str(default)
     -> services: Str([{\"host\":\"inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc.cluster.local\",\"namespace\":\"esol-ap12345-otel-poc1-dev-namespace\",\"name\":\"inventory-service\"}])
     -> start_time: Str(2024-04-05T14:38:49.070Z)
     -> traceparent: Str(00-1e2c3bff3f92d79cf8d1140f6af32f69-c784b919dc6de35e-01)
     -> upstream_cluster: Str(outbound|8080||inventory-service.esol-ap12345-otel-poc1-dev-namespace.svc.cluster.local)
     -> upstream_host: Str(10.244.0.17:8080)
     -> upstream_local_address: Str(10.244.0.16:37036)
     -> upstream_service_time: Str(2)
     -> upstream_transport_failure_reason: Str(-)
     -> trace-id: Str(1e2c3bff3f92d79c)
     -> parent-id: Str(8d1140f6)
Trace ID:
Span ID:
Flags: 0

Hope you can help me with this. Thanks in advance for your time.

Collector version

0.97.0

Environment information

Os: Ubuntu 22.04

OpenTelemetry Collector configuration

receivers:
      otlp:
        protocols:
          grpc:
            endpoint: "0.0.0.0:4317"
      prometheus:
        config:
          scrape_configs:
            - job_name: 'otel-collector'
              scrape_interval: 5s
              static_configs:
                - targets: ['0.0.0.0:8888']
                  labels:
                    apm_code: 'ap00001'
                    project: 'otel-poc'
        target_allocator:
          endpoint: http://otel-targetallocator
          interval: 30s
          collector_id: "$${POD_NAME}"

    processors:
      batch/traces:
        timeout: 500ms
        send_batch_size: 100
      batch/logs:
        timeout: 500ms
        send_batch_size: 100
      batch/metrics:
        timeout: 500ms
        send_batch_size: 100
      groupbyattrs/istiometrics:
        keys:
          - apm_code
          - project
      groupbyattrs/envoyal:
        keys:
          - apm_code
          - project
      transform/envoyal:
        error_mode: ignore
        log_statements:
          - context: log
            statements:
              - set(attributes["trace-id"], Substring(attributes["traceparent"], 3, 16))
              - set(attributes["parent-id"], Substring(attributes["traceparent"], 20, 8))
              - set(trace_id.string, attributes["trace-id"])
              - set(span_id.string, attributes["parent-id"])
              # - delete_key(attributes, "traceparent")
              # - delete_key(attributes, "trace-id")
              # - delete_key(attributes, "trace-id")

    exporters:
      logging:
        loglevel: debug
      otlphttp:
        traces_endpoint: "url"
        metrics_endpoint: "uel"
        logs_endpoint: "url"
        auth:
          authenticator: sigv4auth
        tls:
          insecure: true
        compression: none
        retry_on_failure:
          initial_interval: 10s
          max_interval: 30s
          max_elapsed_time: 1200s # (in secs) set to 0 for infinite retry
        sending_queue:
          storage: file_storage/otc
          queue_size: 5000
          num_consumers: 10
        timeout: 10s

    extensions:
      health_check:
        endpoint: 0.0.0.0:13133
        path: "/health"
      zpages:
        endpoint: ":55679"
      sigv4auth:
        assume_role:
          arn: "role_arn"
          sts_region: "eu-central-1"
        region: "eu-central-1"
        service: "osis"
      file_storage/otc:
        directory: /var/lib/otelcol/buffers
        compaction:
          on_start: true
          on_rebound: true
          directory: /tmp/otel
          max_transaction_size: 65_536
          rebound_needed_threshold_mib: 100
          rebound_trigger_threshold_mib: 10

    service:
      extensions: [health_check, zpages, sigv4auth, file_storage/otc]
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch/traces]
          exporters: [otlphttp]
        logs:
          receivers: [otlp]
          processors: [groupbyattrs/envoyal,transform/envoyal] # batch/logs
          exporters: [otlphttp] # logging
        metrics:
          receivers: [otlp,prometheus]
          processors: [batch/metrics,groupbyattrs/istiometrics]
          exporters: [otlphttp]
      telemetry:
        logs:
          level: "DEBUG"
          development: true
          encoding: "json"

Log output

{"level":"error","ts":1712330461.6687455,"caller":"logs/processor.go:54","msg":"failed processing logs","kind":"processor","name":"transform/envoyal","pipeline":"logs","error":"failed to execute statement: set(trace_id.string, attributes["trace-id"]), trace ids must be 32 hex characters","stacktrace":"github.com/open-telemetry/opentelemetry-collector-contrib/processor/transformprocessor/internal/logs.(*Processor).ProcessLogs\n\tgithub.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/internal/logs/processor.go:54\ngo.opentelemetry.io/collector/processor/processorhelper.NewLogsProcessor.func1\n\tgo.opentelemetry.io/collector/[email protected]/processorhelper/logs.go:48\ngo.opentelemetry.io/collector/consumer.ConsumeLogsFunc.ConsumeLogs\n\tgo.opentelemetry.io/collector/[email protected]/logs.go:25\ngo.opentelemetry.io/collector/processor/processorhelper.NewLogsProcessor.func1\n\tgo.opentelemetry.io/collector/[email protected]/processorhelper/logs.go:56\ngo.opentelemetry.io/collector/consumer.ConsumeLogsFunc.ConsumeLogs\n\tgo.opentelemetry.io/collector/[email protected]/logs.go:25\ngo.opentelemetry.io/collector/consumer.ConsumeLogsFunc.ConsumeLogs\n\tgo.opentelemetry.io/collector/[email protected]/logs.go:25\ngo.opentelemetry.io/collector/internal/fanoutconsumer.(*logsConsumer).ConsumeLogs\n\tgo.opentelemetry.io/[email protected]/internal/fanoutconsumer/logs.go:62\ngo.opentelemetry.io/collector/receiver/otlpreceiver/internal/logs.(*Receiver).Export\n\tgo.opentelemetry.io/collector/receiver/[email protected]/internal/logs/otlp.go:41\ngo.opentelemetry.io/collector/pdata/plog/plogotlp.rawLogsServer.Export\n\tgo.opentelemetry.io/collector/[email protected]/plog/plogotlp/grpc.go:88\ngo.opentelemetry.io/collector/pdata/internal/data/protogen/collector/logs/v1._LogsService_Export_Handler.func1\n\tgo.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/logs/v1/logs_service.pb.go:311\ngo.opentelemetry.io/collector/config/configgrpc.(*ServerConfig).toServerOption.enhanceWithClientInformation.func9\n\tgo.opentelemetry.io/collector/config/[email protected]/configgrpc.go:398\ngo.opentelemetry.io/collector/pdata/internal/data/protogen/collector/logs/v1._LogsService_Export_Handler\n\tgo.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/logs/v1/logs_service.pb.go:313\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\tgoogle.golang.org/[email protected]/server.go:1386\ngoogle.golang.org/grpc.(*Server).handleStream\n\tgoogle.golang.org/[email protected]/server.go:1797\ngoogle.golang.org/grpc.(*Server).serveStreams.func2.1\n\tgoogle.golang.org/[email protected]/server.go:1027"}

Additional context

@AlessandroFazio AlessandroFazio added bug Something isn't working needs triage New item requiring triage labels Apr 5, 2024
@github-actions github-actions bot added the processor/transform Transform processor label Apr 5, 2024
Copy link
Contributor

github-actions bot commented Apr 5, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1 crobert-1 changed the title TransformProcessor doens't set TraceID and SpanID within LogContext TransformProcessor doesn't set TraceID and SpanID within LogContext Apr 5, 2024
@AlessandroFazio
Copy link
Author

AlessandroFazio commented Apr 5, 2024

After looking more in details w3c tracing protocol I figured out that the version part in the traceparent header value is just one byte, so assuming another byte is taken for '-', the substring function should start reading from byte at index 2 until 16 bytes are read. I tested with 2 as starting index instead of 3, but this results in the same error.

My guess is that the header traceparent itself when set as attribute by envoy access log service in the telemetry payload is represented with wrong encoding and so the error propagate downstream in data processing. I didn't find anything about hex encoding conversion in the util funcitons in OTTL.

@TylerHelmuth
Copy link
Member

TylerHelmuth commented Apr 5, 2024

We have #31929 to track a Hex converter, but I believe all you'll need to do is adjust your Substring command:

processors:
      transform/envoyal:
        error_mode: ignore
        log_statements:
          - context: log
            statements:
              - set(attributes["trace-id"], Substring(attributes["traceparent"], 3, 32))
              - set(attributes["parent-id"], Substring(attributes["traceparent"], 36, 16))
              - set(trace_id.string, attributes["trace-id"])
              - set(span_id.string, attributes["parent-id"])

That should capture 1e2c3bff3f92d79cf8d1140f6af32f69 as the trace ID and c784b919dc6de35e as the span ID.

@JaredTan95 JaredTan95 removed the needs triage New item requiring triage label Apr 8, 2024
Copy link
Contributor

github-actions bot commented Jun 7, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Aug 12, 2024
@TylerHelmuth
Copy link
Member

I believe this issue is solved by the new Hex function #31929. Please ping me if you disagree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working processor/transform Transform processor Stale
Projects
None yet
Development

No branches or pull requests

3 participants