Integration with OpenTracing api #2413

jkremser · 2017-06-22T16:55:15Z

UPDATE: I've enhanced the PR for the sequences.

This is based on #2403 that added the tracing for the invoker. However, opentracing allows to leverage the paren-child hierarchy among the spans so this PR also adds the support for sequences.

When starting a sequence action a new (parent) span is created and the member actions starts sub-spans. This works also if the sequence itself is member of another sequence. Using the open tracing api, it is easy to change the tracer implementation, simply by providing other jar with tracer impl that is compliant and implements correctly the SPI mechanism (META-INF/services). By default, the jeager is used.

Hopefully, the code isn't too invasive. I had to change couple of method signatures, because it is needed to propagate the span context from the controller to the invokers so that the parent-child link can be created. In theory, the cause or activationId could have been used as a correlation id between child and parent, but to make it implementation agnostic, the opentracing API requires to use either text map or byte array as a "carrier".

a pic:

rabbah · 2017-06-22T17:02:53Z

core/invoker/src/main/scala/whisk/core/container/WhiskContainer.scala

+     * Starts the tracing.
+     */
+    private def startSpan(msg: ActivationMessage): Option[Span] = {
+        val tracerOption = if (GlobalTracer.isRegistered) Option.empty else Some(TracerResolver.resolveTracer)


scala tip: Option(GlobalTracer.isRegistered).exists(identity).map(TracerResolver.resolveTracer)

actually, this doesn't work because

Option(GlobalTracer.isRegistered).exists(identity)

returns either true or false and I can't call map on Boolean

this works:

Option(GlobalTracer.isRegistered).flatMap(if (_) Option.empty else Option(TracerResolver.resolveTracer))

but perhaps it's too much "flatmappy" :)

Right, s/exists/filter/ just worked 👍

rabbah · 2017-06-22T17:04:06Z

no null! 👍

jpkrohling

Your code made me want to learn Scala :)

jpkrohling · 2017-06-23T03:57:49Z

core/invoker/src/main/scala/whisk/core/container/WhiskContainer.scala

+     */
+    private def startSpan(msg: ActivationMessage): Option[Span] = {
+        val tracerOption = if (GlobalTracer.isRegistered) Option.empty else Some(TracerResolver.resolveTracer)
+        tracerOption foreach GlobalTracer.register


It might be that tracerOption is null at this point:

https://travis-ci.org/apache/incubator-openwhisk/builds/245871954#L6909-L6913

java.lang.NullPointerException: Cannot register GlobalTracer <null>. at io.opentracing.util.GlobalTracer.register(GlobalTracer.java:83) at whisk.core.container.WhiskContainer$$anonfun$startSpan$1.apply(WhiskContainer.scala:104) at whisk.core.container.WhiskContainer$$anonfun$startSpan$1.apply(WhiskContainer.scala:104) at scala.Option.foreach(Option.scala:257)

right, I didn't realize the TracerResolver.resolveTracer() can return also null and when using Some(null) it actually creates the Some(null) what I need here is the Option(null) that results in None

rabbah · 2017-06-23T15:23:25Z

Option(null) == None.

rabbah · 2017-06-23T15:24:10Z

oops, use filter instead of exists.

jpkrohling · 2017-06-26T13:42:22Z

@Jiri-Kremser , may I close #2403 in favor of this one?

jkremser · 2017-06-26T13:56:05Z

@Jiri-Kremser , may I close #2403 in favor of this one?

yes

Sequence action starts its own span and component actions are then created as child spans so the tracing reflects what is actually happening in the event based system, including the simple causality by following the sequences. It looks like the activationId or cause are good candidates for the correlation id among the spans, however, in order to preserve the open-tracing ideas with tracer neutrality (we don't like vendor lock-ins, right?) the mutable Map[String,String] needs to be propagated from the parent spans (started in the controller component) to the child spans (started in the invokers). To do that, I had to add a new attribute the to ActivationMessage and also change couple of signatures in the SequenceActions.scala. => No vendor lock-in on the tracing solution = Profit!

rabbah · 2017-06-30T16:34:02Z

common/scala/src/main/scala/whisk/core/connector/Message.scala

@@ -56,7 +56,8 @@ case class ActivationMessage(
    activationNamespace: EntityPath,
    rootControllerIndex: InstanceId,
    content: Option[JsObject],
-    cause: Option[ActivationId] = None)
+    cause: Option[ActivationId] = None,
+    tracingMetadata: Option[Map[String, String]] = None)


how big might this map get?

in general this is going to be a problem - we have a standing issue that the control and data plane are conflated.
because these message pass over kafka, and carry the user payload, every bit added to the control is one less bit for the user payload; in other words, we have a bug today that a user cannot send a 1MB payload through and taking away more bits makes it worse... hence the question.

for data, I expect that message to flow externally through a redis for example.

In case of jaeger it's one key-value pair, where key is "jaeger-id" (or similar) and value some short string, zipkin is similar. So it's couple of bytes. What can also help is setting the sampling to some lower value, like 5% so that only random 5% of all actions will be traced (which reminded me that I should probably set the default value to something more sane, like 10% and perhaps make it configurable in the wisk.properties (?))

for data, I expect that message to flow externally through a redis for example.

not sure I follow you here, are you planning to add also redis in the future?

on the last point/question: #254.
as part of replicating the controller, i expect redis will be in the mix and might double multiple duties.

rabbah · 2017-06-30T16:35:41Z

core/controller/src/main/scala/whisk/core/controller/actions/SequenceActions.scala

+     *  the tracer implementation  inserts its own ids, these ids are then propagated via the ActivationMessage into
+     *  invoker that can add the reference from the primitive action to the parent sequence action
+     */
+    private def startTracingHelper(cause: Option[ActivationId],


we should consider if this is fully going to subsume the markers generated in the logs if we want to have two way of tracing or just one because the information is going to be duplicated.

I wasn't aware of the markers in the akka logging system. Do I get it right that you use it for calculating how long some execution took?

@Jiri-Kremser yes, that's correct. We basically use a start and a finish marker. These markers are set in the code using a log marker token, i.e. activationRun. Setting the finish marker requires the start marker object. The context of this marker pair is the current transaction, i.e. user request.

In the code you basically instrument a log marker as for the activation execution, i.e. activationRun:

implicit transid: TransactionId .... val start = transid.started(this, LoggingMarkers.INVOKER_ACTIVATION_RUN, s"sending arguments to $actionName at $id $ip") .... transid.finished(this, start.copy(start = r.interval.start), s"running result: ${r.toBriefString}", endTime = r.interval.end)

example here

For both markers we write a log entry. The log entry for the start marker contains the time since transaction start. The log entry for the finish marker contains time since transaction start and the time since it's corresponding start marker.

start marker (141 ms since transaction start)

... sending arguments to ...[marker:invoker_activationRun_start:141]

finish marker (516 ms since transaction start, and time of activationRun took 400ms)

... running result: ok [marker:invoker_activationRun_finish:516:400]

Based on this it's possible to calculate the times within a transaction between start and finish markers of the same log marker token and also between log marker tokens.

jkremser · 2017-07-11T15:11:55Z

@rabbah is there anything else you want me to improve on this PR? Is it ok to have the open tracing code next to the markers in the logs as the first step? The tracing can be turned off or set to 10% using the env properties, so that only the fraction of the calls will be traced.

rabbah · 2017-07-11T15:14:43Z

There is a second tracing pr #2282 - does it make sense to consider consolidating efforts?

jpkrohling · 2017-07-11T15:34:18Z

I don't think so: there has been a thread on the mailing list (subject: "Adding OpenTracing support"), and it looks like that one has a different goal.

rabbah · 2017-11-09T14:59:03Z

Closing per discussion on the dev list https://mail-archives.apache.org/mod_mbox/incubator-openwhisk-dev/201707.mbox/%3CCAAC1_d6e_XDAVhycAeCRc6fwPrMtmmUMqcLZVNww9-TJCQEvGw%40mail.gmail.com%3E.

jkremser force-pushed the opentracing-integration branch 2 times, most recently from 979aa5d to 10f89d1 Compare June 22, 2017 16:56

rabbah reviewed Jun 22, 2017

View reviewed changes

jpkrohling reviewed Jun 23, 2017

View reviewed changes

jkremser force-pushed the opentracing-integration branch from 10f89d1 to f0c3ebb Compare June 23, 2017 15:15

jkremser force-pushed the opentracing-integration branch 2 times, most recently from 9678f13 to 85debb1 Compare June 26, 2017 13:13

jpkrohling mentioned this pull request Jun 26, 2017

[wip] Draft 1 - Wrap runs within OpenTracing spans #2403

Closed

2 tasks

jpkrohling and others added 8 commits June 30, 2017 16:57

Draft 1 - Wrap runs within OpenTracing spans

7e31e27

Adding idea files to the .gitignore

aebd20c

Making the code more Scala idiomatic.

1e1bcbf

Handling the case when the global tracer has been registered.

58113b2

Moving the tracing code into its own concern.

faa1207

More ScalaDoc for the Tracing.scala

7cdf7e7

Opentracing api version bumps

7b2a120

jkremser force-pushed the opentracing-integration branch from 3b0a7ca to 7b2a120 Compare June 30, 2017 15:26

jkremser changed the title ~~[wip] Draft 2 - Wrap runs within OpenTracing spans~~ Integration with OpenTracing api Jun 30, 2017

jkremser added 2 commits June 30, 2017 17:41

Making license header checker happy.

c213350

Fixing tests by adding a parameter to method signature.

805665d

rabbah reviewed Jun 30, 2017

View reviewed changes

Handling the case when version was None.

140c440

rabbah closed this Nov 9, 2017

jkremser mentioned this pull request Feb 21, 2018

allow parent/child transaction ids #3083

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration with OpenTracing api #2413

Integration with OpenTracing api #2413

jkremser commented Jun 22, 2017 •

edited

Loading

rabbah Jun 22, 2017

jkremser Jun 23, 2017

jkremser Jun 26, 2017

rabbah commented Jun 22, 2017

jpkrohling left a comment

jpkrohling Jun 23, 2017

jkremser Jun 23, 2017

rabbah commented Jun 23, 2017 via email

rabbah commented Jun 23, 2017 via email

jpkrohling commented Jun 26, 2017

jkremser commented Jun 26, 2017

rabbah Jun 30, 2017

jkremser Jun 30, 2017

rabbah Jun 30, 2017

rabbah Jun 30, 2017

jkremser Jun 30, 2017

jeremiaswerner Jul 1, 2017

jkremser commented Jul 11, 2017

rabbah commented Jul 11, 2017

jpkrohling commented Jul 11, 2017 •

edited

Loading

rabbah commented Nov 9, 2017 •

edited

Loading

Integration with OpenTracing api #2413

Integration with OpenTracing api #2413

Conversation

jkremser commented Jun 22, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rabbah commented Jun 22, 2017

jpkrohling left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rabbah commented Jun 23, 2017 via email

rabbah commented Jun 23, 2017 via email

jpkrohling commented Jun 26, 2017

jkremser commented Jun 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkremser commented Jul 11, 2017

rabbah commented Jul 11, 2017

jpkrohling commented Jul 11, 2017 • edited Loading

rabbah commented Nov 9, 2017 • edited Loading

jkremser commented Jun 22, 2017 •

edited

Loading

jpkrohling commented Jul 11, 2017 •

edited

Loading

rabbah commented Nov 9, 2017 •

edited

Loading