Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTEL collector crashes when using googlecloudpubsub receiver with encoding set to cloud_logging #32007

Open
ZachTB123 opened this issue Mar 27, 2024 · 10 comments
Assignees
Labels
bug Something isn't working receiver/googlecloudpubsub

Comments

@ZachTB123
Copy link

Component(s)

receiver/googlecloudpubsub

What happened?

Description

I'm trying to use the googlecloudpubsub receiver to receive Cloud Logs. I have configured a log router to route all my logs to a pub/sub topic. The inclusion filter on the sink is resource.type = ("cloud_run_revision") OR log_id("dialogflow-runtime.googleapis.com/requests"). I have no exclusion filter. After some time, the collector crashes with the log output below.

Setting encoding to raw_text works without issue.

Steps to Reproduce

  1. Create a log router described above.
  2. Run the collector with the configuration below.

Expected Result

The collector does not crash.

Actual Result

The collector crashes.

Collector version

v0.97.0

Environment information

No response

OpenTelemetry Collector configuration

receivers:
  googlecloudpubsub:
    project: my-project
    subscription: my-subscription
    encoding: cloud_logging

processors: {}

exporters:
  logging/debug:
    loglevel: debug
  logging/error:
    loglevel: error

service:
  telemetry:
    logs:
      level: DEBUG
  pipelines:
    logs:
      receivers: [googlecloudpubsub]
      processors: []
      exporters: [logging/debug]

Log output

panic: runtime error: index out of range [8] with length 8

goroutine 59 [running]:
encoding/hex.Decode({0xc002c69968?, 0x0?, 0xc001cb54d0?}, {0xc001ca2a80?, 0xc001e28c80?, 0xc001cb54d0?})
	encoding/hex/hex.go:101 +0x130
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/googlecloudpubsubreceiver/internal.spanIDStrToSpanIDBytes({0xc001ca2a80?, 0xc001c93230?})
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/internal/log_entry.go:59 +0x4b
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/googlecloudpubsubreceiver/internal.TranslateLogEntry({0x58b?, 0x58b?}, 0xc002c69c60?, {0xc002d45680, 0x45b, 0x480})
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/internal/log_entry.go:231 +0x43d
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/googlecloudpubsubreceiver.(*pubsubReceiver).handleCloudLoggingLogEntry(0xc0028056b0, {0x948ef20, 0xef7b1c0}, 0xa43440?)
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/receiver.go:145 +0x56
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/googlecloudpubsubreceiver.(*pubsubReceiver).createReceiverHandler.func1({0x948ef20, 0xef7b1c0}, 0xc002d68050)
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/receiver.go:299 +0x186
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/googlecloudpubsubreceiver/internal.(*StreamHandler).responseStream(0xc0028a86e0, {0x94904b8, 0xc001c8e6e0}, 0xc001c8c530)
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/internal/handler.go:193 +0x65d
created by github.com/open-telemetry/opentelemetry-collector-contrib/receiver/googlecloudpubsubreceiver/internal.(*StreamHandler).recoverableStream in goroutine 46
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/internal/handler.go:109 +0x1cb

Additional context

No response

@ZachTB123 ZachTB123 added bug Something isn't working needs triage New item requiring triage labels Mar 27, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1
Copy link
Member

It looks like this panic is happening because a spanId is longer than expected. From the spec, spanId must be an 8-byte array.

Can you provide a sample log that's causing this panic to happen so we can confirm this to be the case?

Incoming data being the wrong format shouldn't cause the collector to panic. The receiver should log an error and drop data instead.

@ZachTB123
Copy link
Author

ZachTB123 commented Mar 27, 2024

I believe this is coming from log entries where logName is equal to projects/project-id/logs/run.googleapis.com%2Frequests. Based on some previous logs that I've ingested by setting encoding to raw_text, the value for spanId is 20 characters long. For example:

{
    "spanId": "15426074336963245120"
}

@alexvanboxel
Copy link
Contributor

I will have a look, this ticket can be assigned to me

@alexvanboxel
Copy link
Contributor

This issue is reproducible, but I've logged an issue with Google Cloud as it's a bug on their side:
https://issuetracker.google.com/issues/338634230

I will make the parsing safer so the collector doesn't crash, but I will not detect decimals; I will handle it as a too-large HEX.

Copy link
Contributor

github-actions bot commented Jul 4, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 4, 2024
@geekflyer
Copy link

can we do a fix/workaround on the otel collector side for this? I bet GCP is gonna take a while to change this in cloud run.

@github-actions github-actions bot removed the Stale label Jul 11, 2024
@tjun
Copy link

tjun commented Aug 19, 2024

@alexvanboxel
Hi, Thank you for your PR! Would it be possible to reopen #33247 and have it merged?
We have been trying to use googlecloudpubsub receiver with cloud_logging encoding and have frequently encountered this crashing issue, which has been troubling us. However, when we incorporated the code from your PR and tested it, the problem no longer occurred. We would be very happy if your PR could be merged and made available for use.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 21, 2024
@alexvanboxel
Copy link
Contributor

Work has been started of extracting the Cloud Logging encoding from the receiver. This will happen in 3 steps:

  1. Start support for encoding extensions
  2. Add deprecation warning for the cloud logging
  3. Extract the Cloud Logging in a branch (in search for a code owner, so it can be developed independently)

@github-actions github-actions bot removed the Stale label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/googlecloudpubsub
Projects
None yet
Development

No branches or pull requests

5 participants