Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dash0 Node IP does not work on Docker-desktop to report telemetry from the OTel Java agent #229

Closed
mmanciop opened this issue Jan 6, 2025 · 4 comments

Comments

@mmanciop
Copy link
Member

mmanciop commented Jan 6, 2025

We are using the Downwards API's status.hostIP1 field to populate the DASH0_NODE_IP environment variable, which is then referenced in DASH0_OTEL_COLLECTOR_BASE_URL (and, in #228, also for OTEL_EXPORTER_OTLP_ENDPOINT) to point the OTel SDKs to the collector running on the daemonset via:

env:
   - name: DASH0_OTEL_COLLECTOR_BASE_URL
      value: http://$(DASH0_NODE_IP):40318
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
      value: http://$(DASH0_NODE_IP):40318
    - name: DASH0_NAMESPACE_NAME

On Minikube, everything works fine. On Docker Desktop, the OTel Java agent running on a pod cannot collect to the collector (192.168.65.3 is the node internal IP, confirmed using kubectl get nodes):

[otel.javaagent 2025-01-06 11:18:01:303 +0000] [OkHttp http://192.168.65.3:40318/...] ERROR io.opentelemetry.exporter.internal.http.HttpExporter - Failed to export spans. The request could not be executed. Full error message: No route to host
java.net.NoRouteToHostException: No route to host
        at java.base/sun.nio.ch.Net.pollConnect(Native Method)
        at java.base/sun.nio.ch.Net.pollConnectNow(Unknown Source)
        at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(Unknown Source)
        at java.base/sun.nio.ch.NioSocketImpl.connect(Unknown Source)
        at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
        at java.base/java.net.Socket.connect(Unknown Source)
        at okhttp3.internal.platform.Platform.connectSocket(Platform.kt:128)
        at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.kt:295)
        at okhttp3.internal.connection.RealConnection.connect(RealConnection.kt:207)
        at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.kt:226)
        at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.kt:106)
        at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:74)
        at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:255)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at io.opentelemetry.exporter.sender.okhttp.internal.RetryInterceptor.intercept(RetryInterceptor.java:91)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
        at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
@mmanciop mmanciop changed the title Dash0 Node IP does not work on Docker-desktop to report telemetry Dash0 Node IP does not work on Docker-desktop to report telemetry from the OTel Java agent Jan 6, 2025
@mmanciop
Copy link
Member Author

mmanciop commented Jan 6, 2025

Uhm, it seems something specific to the OTel Java agent. Running an ubuntu pod, I can curl to the OTel collector on the daemonset:

# curl -v http://192.168.65.3:4318/v1/traces
*   Trying 192.168.65.3:4318...
* Connected to 192.168.65.3 (192.168.65.3) port 4318
> GET /v1/traces HTTP/1.1
> Host: 192.168.65.3:4318
> User-Agent: curl/8.5.0
> Accept: */*
> 
< HTTP/1.1 405 Method Not Allowed
< Content-Type: text/plain
< Date: Mon, 06 Jan 2025 11:45:49 GMT
< Content-Length: 41
< 
* Connection #0 to host 192.168.65.3 left intact
405 method not allowed, supported: [POST]

(The 405 method not allowed is expected, the OTLP API expects POST as HTTP method.)

@mmanciop
Copy link
Member Author

mmanciop commented Jan 6, 2025

I have the suspicion that is has something to do with the buildpack I am using. it's Spring Boot app image created with ./mvnw package spring-boot:build-image using this Maven pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>3.4.1</version>
		<relativePath/> <!-- lookup parent from repository -->
	</parent>
	<groupId>com.example</groupId>
	<artifactId>demo</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>demo</name>
	<description>Demo project for Spring Boot</description>
	<properties>
		<java.version>17</java.version>
	</properties>
	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-actuator</artifactId>
		</dependency>

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
				<configuration>
					<image>
						<buildpacks>
							<buildpack>gcr.io/paketo-buildpacks/java</buildpack>
							<buildpack>gcr.io/paketo-buildpacks/opentelemetry</buildpack>
						</buildpacks>
						<env>
							<BP_OPENTELEMETRY_ENABLED>true</BP_OPENTELEMETRY_ENABLED>
						</env>
					</image>
				</configuration>
			</plugin>
		</plugins>
	</build>

</project>

However, I have yet to find a way to SSH into the container (the java buildpack sets the image up as effectively distroless, so no shell doe me). Other pods on the same node do not have these networking issues.

@joschi
Copy link
Contributor

joschi commented Jan 6, 2025

OkHttp http://192.168.65.3:40318

Are you sure that the port (40318) correct? You've been using cURL with port 4318.

@mmanciop
Copy link
Member Author

This is a transient problem in Docker Desktop, sometimes the networking gets hosed and the only known solution is to restart the Docker daemon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants