Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce otlp/http support #5322

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

TomasLongo
Copy link

Description

Introduce http endpoint to support otlp/http for the otel trace source. So far, this draft PR contains:

  • A single armeria server listening on a port
  • A HTTP Service
    • listening under /opentelemetry.proto.collector.trace.v1.TraceService/Export
    • processing ExportTraceServiceRequest
    • supporting basic auth/tls
  • (not yet complete)Testing of the http service
  • updated otel-proto specs

This draft PR serves as starting for further discussions

Configuration of HTTP and gRPC Service

Currently, the source config is responsible for setting up a working gRPC service. Now, an HTTP Service has to be configured as well.

As far as this PR is concerned, we can assign the current config items into two groups:

  • Config items for gRPC specific features (e.g. unframed_requests, proto_reflection_service)
  • Config items relevant for both services (e.g. compression, authentication) and should/could

This leads to the questions how the structure of the config should look like in the future

Create distinct sections for every endpoint

port: 123  
thread_count: 123  
...  
http:  
  path: /path  compression: gzip  authentication:    http_basic:grpc:  
  compression: gzip  proto_reflection_service: true  authentication:    http_basic:  

Let the endpoint share as much of the current config as possible

e.g. features like compression, authentication

Do we want to keep unframed requests

My understanding is that unframed requests enable clients to send plain http requests to the gRPC endpoint and would be rendered deprecated by this PR. However, there might be more to this feature which I'm currently unaware of.

Usage of the plugin mechanism to handle configs

The Method HttpService.createAuthenticationProvider contains a comment on how the handling of the basic auth config could be made less complex.

The basic idea is to not treat simple configs as plugins. The overall developer experience could benefit from using the simple parsing mechanism that is already used for other configs (like RetryInfo).

Again, there could be reasons for using the plugin mechanism which I'm not aware of.

Issues Resolved

Resolves #4983

Related to #5259

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Tomas Longo added 18 commits November 5, 2024 15:51
Signed-off-by: Tomas Longo <[email protected]>
Signed-off-by: Tomas Longo <[email protected]>
Signed-off-by: Tomas Longo <[email protected]>
Copy link
Collaborator

@KarstenSchnitter KarstenSchnitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing this PR. This is intended to start a discussion about this change. @dlvenable we would like your opinion as well as of all other maintainers. There are still a couple of TODOs in the comment, that need to be addressed. I think, they mark points waiting for clarification.

Tomas, can you explain a little bit more about the upgrade of the otlp-proto classes. Did you need to remove the instrumentation library support because of that? Do you know, how this change behaves if the old format is ingested?

ResourceSpans resourceSpans = ResourceSpans.newBuilder()
.setResource(resource)
.addScopeSpans(scopeSpans)
.addInstrumentationLibrarySpans(ilSpans)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, that instrumentation library was migrated to instrumentation scope. Is this removal required to address the changed data model?

Copy link
Author

@TomasLongo TomasLongo Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. InstrumentaionLibrarySpans have been removed.

Basically the hierarchy has been changed. In 0.16.x InstrumentationLibrarySpans carried the span information. Now instrumentation information lives next to the span information inside the ScopeSpan object

@@ -71,60 +69,6 @@ void init() {
rawProcessor = new OTelMetricsRawProcessor(testsettings, new OtelMetricsRawProcessorConfig());
}

@Test
void testInstrumentationLibrary() throws JsonProcessingException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this test still be required for backward compatibility?

@@ -350,7 +351,7 @@ void testHttpFullJsonWithCustomPathAndUnframedRequests() throws InvalidProtocolB
.join();
}


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check you IDE settings for the correct formatter. There should be no whitespace changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines -201 to -211
Stream<OpenTelemetryLog> mappedInstrumentationLibraryLogs = rs.getInstrumentationLibraryLogsList()
.stream()
.map(ils ->
processLogsList(ils.getLogRecordsList(),
serviceName,
OTelProtoCodec.getInstrumentationLibraryAttributes(ils.getInstrumentationLibrary()),
resourceAttributes,
schemaUrl,
timeReceived))
.flatMap(Collection::stream);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this break backward compatibility with old OTel data models?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in a sense that requests carrying only InstrumentationLibrary*** data will not result in a failure (this change will return a 200 in that case), but won't see their data forwarded to the sink

@TomasLongo
Copy link
Author

TomasLongo commented Jan 17, 2025

Tomas, can you explain a little bit more about the upgrade of the otlp-proto classes. Did you need to remove the instrumentation library support because of that?

Yes: According to the deprecation message in 0.16.0 InstrumentationLibrary was removed in June 15, 2022 (pretty specific). The now used version (1.3.2-alpha) does indeed not carry it anymore.

Do you know, how this change behaves if the old format is ingested?

My Tests so far:

  • message carrying scopeSpans along with instrumentationLibrarySpans
    -- instrumentationLibrarySpans are simply ignored. The sink will receive the data from the scope spans
  • message carrying only instrumentationLibrarySpans
    -- Sink will receive nothing, since the current implementation looks for scopeSpans inside the resourceSpans and returns an empty list if it can find none

All the tests have been made against the new http endpoint (using json as input). I have yet to test behaviour of the gRPC endpoint (I don't expect a different behaviour, though, since both endpoints use the same decoding functionality)

[UPDATE ON TESTING 2025/01/17]
Testing the gRPC endpoint brought no surprises. Clients with outdated protobuf specifications will still be able to use data-prepper as long as their messages come packed with ScopeSpans

The spec of 0.16.0 carried details on how senders and receivers should behave during the grace period:

During the grace period the following rules SHOULD be followed:

For Binary Protobufs

Binary Protobuf senders SHOULD NOT set instrumentation_library_spans. Instead
scope_spans SHOULD be set.

Binary Protobuf receivers SHOULD check if instrumentation_library_spans is set
and scope_spans is not set then the value in instrumentation_library_spans
SHOULD be used instead by converting InstrumentationLibrarySpans into ScopeSpans.
If scope_spans is set then instrumentation_library_spans SHOULD be ignored.

For JSON

JSON senders that set instrumentation_library_spans field MAY also set
scope_spans to carry the same spans, essentially double-publishing the same data.
Such double-publishing MAY be controlled by a user-settable option.
If double-publishing is not used then the senders SHOULD set scope_spans and
SHOULD NOT set instrumentation_library_spans.

JSON receivers SHOULD check if instrumentation_library_spans is set and
scope_spans is not set then the value in instrumentation_library_spans
SHOULD be used instead by converting InstrumentationLibrarySpans into ScopeSpans.
If scope_spans is set then instrumentation_library_spans field SHOULD be ignored.

According to that comment, the data-prepper behaves correctly now, and did adhere to the specfication in the past (by using either ScopeSpans or InstrumentationLibrarySpans).

IMO, if Data-Prepper clients followed the recommendations, there should be no friction upgrading the proto classes.

@TomasLongo
Copy link
Author

I discovered a comment that could be important for backward compatibility in the 0.16.0 deprecation notice:

InstrumentationLibrarySpans is wire-compatible with ScopeSpans for binary Protobuf format

I'm not that fluent in protbuf speech yet to really grasp the implication of that statement. Does wire-compatible mean, that the binary representation of InstrumentationLibrarySpans and ScopeSpans does look the same and that receivers can therefore simply decode incoming InstrumentationLibrarySpans into ScopeSpans in their code without any further effort?

Signed-off-by: Tomas Longo <[email protected]>
@KarstenSchnitter
Copy link
Collaborator

@TomasLongo I will check the wire compatibility. Can you have a look at the OTelTraceSource_*Test test cases. For me the first test-case in ./gradlew ":data-prepper-plugins:otel-trace-source:test" fails causing all subsequent tests to fail as well:

./gradlew ":data-prepper-plugins:otel-trace-source:test"

> Task :data-prepper-plugins:otel-trace-source:test

OTelTraceSourceTest > testOptionalHttpAuthServiceInPlaceWithUnauthenticatedDisabled() FAILED
    java.lang.NullPointerException: Cannot invoke "Object.toString()" because the return value of "java.util.Map.get(Object)" is null

OTelTraceSourceTest > testHealthCheckUnauthAllowed() FAILED
    java.lang.AssertionError: Http Status
    Expected: <200 OK>
         but: was <401 Unauthorized>

OTelTraceSourceTest > testServerConnectionsMetric() FAILED
    java.lang.RuntimeException: java.util.concurrent.ExecutionException: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address already in use

        Caused by:
        java.util.concurrent.ExecutionException: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address already in use

            Caused by:
            io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address already in use

OTelTraceSourceTest > testOptionalHttpAuthServiceInPlace() FAILED
    java.lang.NullPointerException: Cannot invoke "Object.toString()" because the return value of "java.util.Map.get(Object)" is null

OTelTraceSourceTest > testHttpFullJsonWithCustomPathAndAuthHeader_with_successful_response() FAILED
    java.lang.RuntimeException: java.util.concurrent.ExecutionException: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address already in use

        Caused by:
        java.util.concurrent.ExecutionException: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address already in use

            Caused by:
            io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address already in use

[...]

This is also causing the failing tests on the build system.

// - it becomes testable
// cons:
// - currently tied to one impl by using 'new'.
return new HttpBasicArmeriaHttpAuthenticationProvider(new HttpBasicAuthenticationConfig(pluginSettings.get("username").toString(), pluginSettings.get("password").toString()));
Copy link
Collaborator

@KarstenSchnitter KarstenSchnitter Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this line. This causes an NPE in testing, when pluginsSettings is an empty map. Run test case OTelTraceSourceTest.testOptionalHttpAuthServiceInPlaceWithUnauthenticatedDisabled.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the two tests (the other beeing testOptionalHttpAuthServiOceInPlace) that causes NPEs. I disabled them for now! However, we should discuss if they are still needed.

The reason is, that the test assumes an invalid auth-config, which data-prepper refuses to ingest in production. The config in the test looks as follows:

otel_trace_source:
authentication:
test:
The tests are nonetheless green in main, because the critical code, the PluginFactory, is completely mocked away in the test. This branch produces NPEs, because the new HTTPService does not use the Pluginfactory to instantiate the AuthProvider.

IMO, this tests could be removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable what is your take on that?

…ccepts unframed requests

Signed-off-by: Tomas Longo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support OpenTelemetry OTLP/HTTP as addition to OTLP/gRPC
2 participants