Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpUrlConnection instrument exception #4697

Closed
wangzlei opened this issue Nov 23, 2021 · 4 comments
Closed

HttpUrlConnection instrument exception #4697

wangzlei opened this issue Nov 23, 2021 · 4 comments
Labels
bug Something isn't working

Comments

@wangzlei
Copy link
Contributor

Describe the bug
opentelemetry-javaagent gets HttpUrlConnection exception on AWS EC2 host.

Steps to reproduce
Collector config, enable AWS XRay proxy to fetch remote sampling rules

extensions:
  awsproxy:
    endpoint: 0.0.0.0:2000

receivers:
  otlp:
    protocols:
      grpc:

exporters:
  logging:
    loglevel: debug
  awsxray:

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [logging, awsxray]
  extensions: [awsproxy]
  telemetry:
    logs:
      level: "debug"

docker composite, launches a simple spring server with OTel javaagent

version: "3.6"
services:
  otel:
    image: otel/opentelemetry-collector-contrib:latest
    command: --config /config/collector3.yml
    volumes:
      - /home/ec2-user:/config
    environment:
      AWS_REGION: us-west-2
    ports:
      - '4317:4317'
      - '13133:13133'
      - '2000:2000'
  app:
    image: myapp:latest
    command: java -Dotel.javaagent.debug=true -javaagent:./aws-opentelemetry-agent.jar -jar ./build/libs/spring-boot-0.0.1-SNAPSHOT.jar Application
    environment:
      OTEL_SERVICE_NAME: MyEC2App
      OTEL_TRACES_SAMPLER: xray
      OTEL_TRACES_SAMPLER_ARG: 'endpoint=http://otel:2000'
      OTEL_TRACES_EXPORTER: otlp
      OTEL_EXPORTER_OTLP_ENDPOINT: http://otel:4317
    ports:
      - '8080:8080'
    depends_on:
      - otel

What did you expect to see?
No exception for HttpUrlConnection.
Collector should relay SamplingRules and SamplingTargets requests from application to AWS XRay

What did you see instead?
application error

[otel.javaagent 2021-11-23 02:02:52:862 +0000] [xray-rules-poller] DEBUG io.opentelemetry.sdk.internal.JavaVersionSpecific - Using the APIs optimized for: Java 9+
[otel.javaagent 2021-11-23 02:02:52:870 +0000] [xray-rules-poller] DEBUG io.opentelemetry.javaagent.bootstrap.ExceptionLogger - Failed to handle exception in instrumentation for sun.net.www.protocol.http.HttpURLConnection on io.opentelemetry.javaagent.bootstrap.AgentClassLoader@1324409e
java.lang.IllegalStateException: Already connected
	at java.base/sun.net.www.protocol.http.HttpURLConnection.setRequestProperty(HttpURLConnection.java:3104)
	at io.opentelemetry.javaagent.instrumentation.httpurlconnection.HeadersInjectAdapter.set(HeadersInjectAdapter.java:17)
	at io.opentelemetry.javaagent.instrumentation.httpurlconnection.HeadersInjectAdapter.set(HeadersInjectAdapter.java:11)
	at io.opentelemetry.javaagent.shaded.io.opentelemetry.extension.aws.AwsXrayPropagator.inject(AwsXrayPropagator.java:157)
	at io.opentelemetry.javaagent.shaded.io.opentelemetry.context.propagation.MultiTextMapPropagator.inject(MultiTextMapPropagator.java:52)
	at io.opentelemetry.javaagent.shaded.instrumentation.api.instrumenter.ClientInstrumenter.start(ClientInstrumenter.java:27)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1493)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:3069)
	at java.base/java.net.HttpURLConnection.getResponseCode(Unknown Source)
	at io.opentelemetry.contrib.awsxray.JdkHttpClient.fetchString(JdkHttpClient.java:83)
	at io.opentelemetry.contrib.awsxray.XraySamplerClient.executeJsonRequest(XraySamplerClient.java:85)
	at io.opentelemetry.contrib.awsxray.XraySamplerClient.getSamplingRules(XraySamplerClient.java:68)
	at io.opentelemetry.contrib.awsxray.AwsXrayRemoteSampler.getAndUpdateSampler(AwsXrayRemoteSampler.java:123)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Thread.java:834)
[otel.javaagent 2021-11-23 02:02:52:873 +0000] [xray-rules-poller] DEBUG io.opentelemetry.contrib.awsxray.JdkHttpClient - JdkHttpClient fetch string failed.
java.net.SocketTimeoutException: Read timed out
	at java.base/java.net.SocketInputStream.socketRead0(Native Method)
	at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
	at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
	at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:292)
	at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
	at java.base/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:746)
	at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:689)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
	at java.base/java.net.HttpURLConnection.getResponseCode(Unknown Source)
	at io.opentelemetry.contrib.awsxray.JdkHttpClient.fetchString(JdkHttpClient.java:83)
	at io.opentelemetry.contrib.awsxray.XraySamplerClient.executeJsonRequest(XraySamplerClient.java:85)
	at io.opentelemetry.contrib.awsxray.XraySamplerClient.getSamplingRules(XraySamplerClient.java:68)
	at io.opentelemetry.contrib.awsxray.AwsXrayRemoteSampler.getAndUpdateSampler(AwsXrayRemoteSampler.java:123)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Thread.java:834)
[otel.javaagent 2021-11-23 02:02:52:875 +0000] [xray-rules-poller] DEBUG io.opentelemetry.contrib.awsxray.AwsXrayRemoteSampler - Failed to update sampler
java.io.UncheckedIOException: Failed to deserialize response.
	at io.opentelemetry.contrib.awsxray.XraySamplerClient.executeJsonRequest(XraySamplerClient.java:90)
	at io.opentelemetry.contrib.awsxray.XraySamplerClient.getSamplingRules(XraySamplerClient.java:68)
	at io.opentelemetry.contrib.awsxray.AwsXrayRemoteSampler.getAndUpdateSampler(AwsXrayRemoteSampler.java:123)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: No content to map due to end-of-input
 at [Source: (String)""; line: 1, column: 0]
	at com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:59)
	at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4766)
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4668)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3630)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3598)
	at io.opentelemetry.contrib.awsxray.XraySamplerClient.executeJsonRequest(XraySamplerClient.java:88)
	... 8 more

collector error

otel_1  | 2021-11-23T02:02:50.749Z	debug	[email protected]/server.go:91	Received request on X-Ray receiver TCP proxy server	{"kind": "extension", "name": "awsproxy", "URL": "/GetSamplingRules"}
otel_1  | 2021-11-23T02:02:52.823Z	error	[email protected]/server.go:116	Unable to sign request	{"kind": "extension", "name": "awsproxy", "error": "RequestCanceled: request context canceled\ncaused by: context canceled"}
otel_1  | github.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/proxy.NewServer.func1
otel_1  | 	github.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/[email protected]/server.go:116
otel_1  | net/http/httputil.(*ReverseProxy).ServeHTTP
otel_1  | 	net/http/httputil/reverseproxy.go:251
otel_1  | net/http.serverHandler.ServeHTTP
otel_1  | 	net/http/server.go:2878
otel_1  | net/http.(*conn).serve
otel_1  | 	net/http/server.go:1929
otel_1  | 2021/11/23 02:02:52 http: proxy error: context canceled

What version are you using?
v1.7.2

Environment
Compiler/Runtime: OpenJDK 11
OS: Amazon Linux 2 amd64

Additional context
The issue does not happen in Mac. Not sure if happens in other Linux OS.

@wangzlei wangzlei added the bug Something isn't working label Nov 23, 2021
@laurit
Copy link
Contributor

laurit commented Nov 23, 2021

Firstly I believe that the SocketTimeoutException actually happens before IllegalStateException. HttpURLConnection.getResponseCode stashes the exception from first connection attempt and rethrows it a bit later if the call to getHeaderField doesn't provide a result. IllegalStateException is from the call to getHeaderField. Currently I fail to see how we could cause SocketTimeoutException, though I guess anything is possible. I'd start from verifying that agent can connect to wherever it is connecting with the timeout specified in https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/aws-xray/src/main/java/io/opentelemetry/contrib/awsxray/JdkHttpClient.java
I suspect that the IllegalStateException happens because AwsXrayRemoteSampler starts before the initial retransformation that sets up integrations for already loaded classes. Instrumentation for HttpURLConnection happens while it is already executing so we'll end up with inconsistent state where the first call to getInputStream runs the version that is not instrumented (instrumented version would attach state) and the second call to getInputStream runs instrumented version (it would skip the call that causes the exception if it had the state attached from the first call). Notice that some methods have (Unknown Source) in stack trace, I believe that these are obsolete methods (methods that have been redefined, but still executing old code). It would be best not to start any threads before the initial retransformation is done.

@anuraaga
Copy link
Contributor

It would be best not to start any threads before the initial retransformation is done

@laurit Is this something that can be reordered within agent initialization logic itself in a generic way to solve? Or is it essentially the same as #4642, recognizing threads started by the agent to not instrument them?

@laurit
Copy link
Contributor

laurit commented Nov 25, 2021

It would be best not to start any threads before the initial retransformation is done

@laurit Is this something that can be reordered within agent initialization logic itself in a generic way to solve? Or is it essentially the same as #4642, recognizing threads started by the agent to not instrument them?

@anuraaga You could try changing beforeAgent to afterAgent in https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/javaagent-tooling/src/main/java/io/opentelemetry/javaagent/tooling/OpenTelemetryInstaller.java I don't understand this well enough to predict what other side effects this might have. I only ran a single tomcat smoke test and that passed with afterAgent. If this doesn't work then it will get complicated as the sampler is initialized by sdk. I'm sure you'd hate the idea of splitting sdk initialization into stuff that can be done/can't be done before agent.
I believe ignoring agent threads could help and is a good alternative if we can't use afterAgent. If we cut processing in instrumenter().shouldStart() then the code that causes IllegalStateException won't be executed. Nevertheless avoiding races like this is preferable to working around them.

@trask
Copy link
Member

trask commented Jan 19, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants