Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with OpenTracing api #2413

Closed
wants to merge 11 commits into from

Conversation

jkremser
Copy link

@jkremser jkremser commented Jun 22, 2017

UPDATE: I've enhanced the PR for the sequences.

This is based on #2403 that added the tracing for the invoker. However, opentracing allows to leverage the paren-child hierarchy among the spans so this PR also adds the support for sequences.

When starting a sequence action a new (parent) span is created and the member actions starts sub-spans. This works also if the sequence itself is member of another sequence. Using the open tracing api, it is easy to change the tracer implementation, simply by providing other jar with tracer impl that is compliant and implements correctly the SPI mechanism (META-INF/services). By default, the jeager is used.

Hopefully, the code isn't too invasive. I had to change couple of method signatures, because it is needed to propagate the span context from the controller to the invokers so that the parent-child link can be created. In theory, the cause or activationId could have been used as a correlation id between child and parent, but to make it implementation agnostic, the opentracing API requires to use either text map or byte array as a "carrier".

a pic:
gnome-shell-screenshot-1qmi2y

@jkremser jkremser force-pushed the opentracing-integration branch 2 times, most recently from 979aa5d to 10f89d1 Compare June 22, 2017 16:56
* Starts the tracing.
*/
private def startSpan(msg: ActivationMessage): Option[Span] = {
val tracerOption = if (GlobalTracer.isRegistered) Option.empty else Some(TracerResolver.resolveTracer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala tip: Option(GlobalTracer.isRegistered).exists(identity).map(TracerResolver.resolveTracer)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, this doesn't work because

Option(GlobalTracer.isRegistered).exists(identity)

returns either true or false and I can't call map on Boolean

this works:

Option(GlobalTracer.isRegistered).flatMap(if (_) Option.empty else Option(TracerResolver.resolveTracer))

but perhaps it's too much "flatmappy" :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, s/exists/filter/ just worked 👍

@rabbah
Copy link
Member

rabbah commented Jun 22, 2017

no null! 👍

Copy link

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code made me want to learn Scala :)

*/
private def startSpan(msg: ActivationMessage): Option[Span] = {
val tracerOption = if (GlobalTracer.isRegistered) Option.empty else Some(TracerResolver.resolveTracer)
tracerOption foreach GlobalTracer.register

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be that tracerOption is null at this point:

https://travis-ci.org/apache/incubator-openwhisk/builds/245871954#L6909-L6913

    java.lang.NullPointerException: Cannot register GlobalTracer <null>.
        at io.opentracing.util.GlobalTracer.register(GlobalTracer.java:83)
        at whisk.core.container.WhiskContainer$$anonfun$startSpan$1.apply(WhiskContainer.scala:104)
        at whisk.core.container.WhiskContainer$$anonfun$startSpan$1.apply(WhiskContainer.scala:104)
        at scala.Option.foreach(Option.scala:257)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I didn't realize the TracerResolver.resolveTracer() can return also null and when using Some(null) it actually creates the Some(null) what I need here is the Option(null) that results in None

@jkremser jkremser force-pushed the opentracing-integration branch from 10f89d1 to f0c3ebb Compare June 23, 2017 15:15
@rabbah
Copy link
Member

rabbah commented Jun 23, 2017 via email

@rabbah
Copy link
Member

rabbah commented Jun 23, 2017 via email

@jkremser jkremser force-pushed the opentracing-integration branch 2 times, most recently from 9678f13 to 85debb1 Compare June 26, 2017 13:13
@jpkrohling
Copy link

@Jiri-Kremser , may I close #2403 in favor of this one?

@jkremser
Copy link
Author

@Jiri-Kremser , may I close #2403 in favor of this one?

yes

jpkrohling and others added 8 commits June 30, 2017 16:57
Sequence action starts its own span and component actions are then
created as child spans so the tracing reflects what is actually
happening in the event based system, including the simple causality
by following the sequences.

It looks like the activationId or cause are good candidates for the
correlation id among the spans, however, in order to preserve the open-tracing
ideas with tracer neutrality (we don't like vendor lock-ins, right?) the
mutable Map[String,String] needs to be propagated from the parent spans
(started in the controller component) to the child spans (started in the
invokers). To do that, I had to add a new attribute the to
ActivationMessage and also change couple of signatures in the
SequenceActions.scala.

=> No vendor lock-in on the tracing solution = Profit!
@jkremser jkremser force-pushed the opentracing-integration branch from 3b0a7ca to 7b2a120 Compare June 30, 2017 15:26
@jkremser jkremser changed the title [wip] Draft 2 - Wrap runs within OpenTracing spans Integration with OpenTracing api Jun 30, 2017
@@ -56,7 +56,8 @@ case class ActivationMessage(
activationNamespace: EntityPath,
rootControllerIndex: InstanceId,
content: Option[JsObject],
cause: Option[ActivationId] = None)
cause: Option[ActivationId] = None,
tracingMetadata: Option[Map[String, String]] = None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how big might this map get?

in general this is going to be a problem - we have a standing issue that the control and data plane are conflated.
because these message pass over kafka, and carry the user payload, every bit added to the control is one less bit for the user payload; in other words, we have a bug today that a user cannot send a 1MB payload through and taking away more bits makes it worse... hence the question.

for data, I expect that message to flow externally through a redis for example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of jaeger it's one key-value pair, where key is "jaeger-id" (or similar) and value some short string, zipkin is similar. So it's couple of bytes. What can also help is setting the sampling to some lower value, like 5% so that only random 5% of all actions will be traced (which reminded me that I should probably set the default value to something more sane, like 10% and perhaps make it configurable in the wisk.properties (?))

for data, I expect that message to flow externally through a redis for example.

not sure I follow you here, are you planning to add also redis in the future?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the last point/question: #254.
as part of replicating the controller, i expect redis will be in the mix and might double multiple duties.

* the tracer implementation inserts its own ids, these ids are then propagated via the ActivationMessage into
* invoker that can add the reference from the primitive action to the parent sequence action
*/
private def startTracingHelper(cause: Option[ActivationId],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider if this is fully going to subsume the markers generated in the logs if we want to have two way of tracing or just one because the information is going to be duplicated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of the markers in the akka logging system. Do I get it right that you use it for calculating how long some execution took?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jiri-Kremser yes, that's correct. We basically use a start and a finish marker. These markers are set in the code using a log marker token, i.e. activationRun. Setting the finish marker requires the start marker object. The context of this marker pair is the current transaction, i.e. user request.

In the code you basically instrument a log marker as for the activation execution, i.e. activationRun:

        implicit transid: TransactionId
        ....
        val start = transid.started(this, LoggingMarkers.INVOKER_ACTIVATION_RUN, s"sending arguments to $actionName at $id $ip")
        ....
       transid.finished(this, start.copy(start = r.interval.start), s"running result: ${r.toBriefString}", endTime = r.interval.end)

example here

For both markers we write a log entry. The log entry for the start marker contains the time since transaction start. The log entry for the finish marker contains time since transaction start and the time since it's corresponding start marker.

start marker (141 ms since transaction start)

... sending arguments to ...[marker:invoker_activationRun_start:141]

finish marker (516 ms since transaction start, and time of activationRun took 400ms)

... running result: ok [marker:invoker_activationRun_finish:516:400] 

Based on this it's possible to calculate the times within a transaction between start and finish markers of the same log marker token and also between log marker tokens.

@jkremser
Copy link
Author

@rabbah is there anything else you want me to improve on this PR? Is it ok to have the open tracing code next to the markers in the logs as the first step? The tracing can be turned off or set to 10% using the env properties, so that only the fraction of the calls will be traced.

@rabbah
Copy link
Member

rabbah commented Jul 11, 2017

There is a second tracing pr #2282 - does it make sense to consider consolidating efforts?

@jpkrohling
Copy link

jpkrohling commented Jul 11, 2017

I don't think so: there has been a thread on the mailing list (subject: "Adding OpenTracing support"), and it looks like that one has a different goal.

@rabbah
Copy link
Member

rabbah commented Nov 9, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants