Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Micrometer observation #1017

Merged
merged 20 commits into from
Jul 17, 2023
Merged

Conversation

acogoluegnes
Copy link
Contributor

No description provided.

@ikavgo
Copy link

ikavgo commented May 4, 2023

Please note this OT integration - rabbitmq/rabbitmq-dotnet-client#1261. We should aim for same semantic and tags.

@stebet
Copy link

stebet commented May 4, 2023

Please note this OT integration - rabbitmq/rabbitmq-dotnet-client#1261. We should aim for same semantic and tags.

Awesome! Don't hesitate to ping me in the .NET PR as the spec has probably changed a few times (although I've tried to keep up) since I started this work, and let's keep things in sync if you feel the .NET version should name things differently or vice/versa. Would also be great to get some feedback on tags and operation names from @lmolkova @trask and hopefully more people with OTel knowledge if they have the time.

@ikavgo
Copy link

ikavgo commented May 4, 2023

Hi, I created an md file to synchronize and track - https://github.com/rabbitmq/contribute/blob/main/observability/clients.md. Please take a look and change/add missing parts

@trask
Copy link

trask commented May 4, 2023

Please note this OT integration - rabbitmq/rabbitmq-dotnet-client#1261. We should aim for same semantic and tags.

Awesome! Don't hesitate to ping me in the .NET PR as the spec has probably changed a few times (although I've tried to keep up) since I started this work, and let's keep things in sync if you feel the .NET version should name things differently or vice/versa. Would also be great to get some feedback on tags and operation names from @lmolkova @trask and hopefully more people with OTel knowledge if they have the time.

hi! @lmolkova and I would of course recommend instrumenting with OpenTelemetry API 😄

@acogoluegnes
Copy link
Contributor Author

@stebet Thanks for chiming in. I had a look at the .NET client PR. It's similar in concepts to this one: we are using an abstraction (ActivitySource / Micrometer) and we are using OTel naming conventions. I just want to check something with you: is the receive activity "around" the process activity? It seems to me it is and that they are not one after the other. OTel is not super clear on this part.

@acogoluegnes
Copy link
Contributor Author

@stebet I had a look at the OTel Java instrumentation agent and it uses receive for basic.get and process for asynchronous deliveries (from basic.consume).

@trask
Copy link

trask commented May 5, 2023

It's similar in concepts to this one: we are using an abstraction (ActivitySource / Micrometer)

the OpenTelemetry and .NET community aligned and ActivitySource is (part of) the OpenTelemetry API for .NET: https://opentelemetry.io/docs/instrumentation/net/manual/#add-tags-to-an-activity

(while Micrometer Tracing is Spring-specific)

@lmolkova
Copy link

lmolkova commented May 5, 2023

I might miss something, but my understanding was that micrometer-tracing is an abstraction layer over otel or brave. With brave slowly becoming legacy, what's the benefit of having an extra layer of abstraction?

@lmolkova
Copy link

lmolkova commented May 5, 2023

@stebet Thanks for chiming in. I had a look at the .NET client PR. It's similar in concepts to this one: we are using an abstraction (ActivitySource / Micrometer) and we are using OTel naming conventions. I just want to check something with you: is the receive activity "around" the process activity? It seems to me it is and that they are not one after the other. OTel is not super clear on this part.

The receive operation over process is a trick to correlate them - receive operation does not necessarily know what it's going to be received.
OTel messaging SIG is working on a better version of the spec (open-telemetry/oteps#220) - here's we'll have two receive models:

  • receive (poll) - it will be a single span. It should have a link to the received message. It's not yet possible to add link after span starts, so it should be started later with a fake start time.
  • deliver (consume) - it mostly matches the current process and can be a child of context in the message.

It does not contradict current spec (which is indeed quite vague), so can be done right away. And I hope the OTEP above gives more clarity on implementation

@marcingrzejszczak
Copy link
Contributor

I might miss something, but my understanding was that micrometer-tracing is an abstraction layer over otel or brave.

Yes it is

With brave slowly becoming legacy

Where this assumption comes from?

what's the benefit of having an extra layer of abstraction?

That we control the api and can replace one tracer into another. Micrometer is very serious about not breaking backward compatibility and the history of current tracing projects such as open tracing or open census warns about what can happen when you rely directly on an api.

Of course you can use micrometer observation with otels api directly - you don't have to use micrometer tracing

@@ -111,6 +120,17 @@ private abstract static class IntegrationTest extends SampleTestRunner {
public TracingSetup[] getTracingSetup() {
return new TracingSetup[] {TracingSetup.IN_MEMORY_BRAVE, TracingSetup.ZIPKIN_BRAVE};
}

void runWithNullingObservation(ObservationRegistry registry, Tracer tracer, Observation.CheckedRunnable<?> runnable) {
Observation noParentObservation = Observation.createNotStarted("null_observation", registry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Micrometer 1.10.8 I think we can change this whole method to become new NullObservation(registry).observeChecked(runnable)

@acogoluegnes acogoluegnes force-pushed the marcingrzejszczak-observation branch from aebc325 to c4d656f Compare June 23, 2023 07:10
headers,
response.getBody() == null ? 0 : response.getBody().length);
Observation observation =
RabbitMqObservationDocumentation.RECEIVE_OBSERVATION.observation(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is its really intentional to start another observation over here?
You already got that Observation.createNotStarted("rabbitmq.receive") and its observe().
Why do we need this child one, please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have 2 observations here.

  • rabbitmq.receive is the observation that is responsible for polling. Some trace (e.g. with id X) might have been initiated earlier and suddenly someone calls basicGet the rabbitmq.receive observation will warp an operation of polling a message from the broker
  • RECEIVE_OBSERVATION is the ReceiverContext based operation that will continue a trace that got propagated through the messaging headers. That means that in the RabbitMq message there could be another trace (e.g. with id Y)

Let's look at it in the following way:

  • Application A initiates trace with id X -> calls some logic -> polls for the message from RabbitMq (rabbitmq.receive)
  • Application B initiates trace with id Y -> calls some logic -> sends a message to RabbitMQ with tracing header containing Y -> Application A takes the message from the queue and then continues trace id Y (RECEIVE_OBSERVATION)

Since basicGet is a synchronous operation we were thinking of giving the user a switch to leave the RECEIVE_OBSERVATION started, and a scope would be opened and then it's up to the user to stop it and stop the scope. That way they will be able to continue the trace id Y

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you do that manually here just like this a bit bellow:

                observation.start();
                observation.stop();

So, yeah, you connect this RECEIVE_OBSERVATION to the propagated one, but it is closed immediately.
Whatever end-user would do with the propagated trace over header afterward would be just another, parallel child span, not connected to this one. Plus looks this one is going to be a child of that rabbitmq.receive in the beginning of this method. No? Is the parent overridden from the value from headers?

I may guess that it is OK to have a simple span connected to that propagated trace just to fulfill the the whole picture of services interaction, but is it really an intention? Wouldn't it be better to leave such a propagation up to end-user with respective API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, yeah, you connect this RECEIVE_OBSERVATION to the propagated one, but it is closed immediately.
Whatever end-user would do with the propagated trace over header afterward would be just another, parallel child span, not connected to this one.

As I said this code

 observation.start();
                observation.stop();

would have to change to sth like

observation.start();
if (someOptionIsTurnedOn) {
  observation.openScope();
} else {
  observation.stop();
}

and then the user will have to do sth like this after some listener code gets executed

Scope scope = observationRegistry.getCurrentObservationScope();
if (scope != null) {
  scope.close();
  scope.getCurrentObservation().stop();
}

Plus looks this one is going to be a child of that rabbitmq.receive in the beginning of this method. No? Is the parent overridden from the value from headers?

Yes, the child will be overridden. So if you started trace X then due to the fact that the message contains X headers, the trace id will continue.

I may guess that it is OK to have a simple span connected to that propagated trace just to fulfill the the whole picture of services interaction, but is it really an intention? Wouldn't it be better to leave such a propagation up to end-user with respective API?

I don't follow, what do you mean exactly?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow, what do you mean exactly?

When we propagate a trace over the broker simple start()-stop() would just add extra span into a trace, but that wouldn't give a chance for end-user to continue that trace downstream when we step into our business logic.
I think your if (someOptionIsTurnedOn) { answers to my question.
Thanks

marcingrzejszczak and others added 13 commits July 12, 2023 10:14
Bumps [jackson-databind](https://github.com/FasterXML/jackson) from 2.14.1 to 2.15.0.
- [Release notes](https://github.com/FasterXML/jackson/releases)
- [Commits](https://github.com/FasterXML/jackson/commits)

---
updated-dependencies:
- dependency-name: com.fasterxml.jackson.core:jackson-databind
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
For Micrometer Observation collector implementation. This is based
on what the OTel Java agent exports. The guidelines for
attributes may change, but that is a good starting point for now.

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/messaging.md
Due to the fact that in the tests we have both the producer and the consumer and they have the same parent, the zipkin graph looks bizarre.

With these changes we're changing the test sending code to simulate sending a message from a different service that will result in a creation of a new trace identifier. Due to this we will have 2 sets of trace ids created, one for sending and one for polling.

Sending: null_observation -> send -> receive message
Polling: test_span -> receive (very short)
@acogoluegnes acogoluegnes force-pushed the marcingrzejszczak-observation branch from a2e7dc5 to afb39ff Compare July 12, 2023 08:59
This way it is possible to keep the second observation
for the application processing of the message.

The observation will have to be closed by the application
in this case.
@acogoluegnes acogoluegnes marked this pull request as ready for review July 13, 2023 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants