-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration with OpenTracing api #2413
Conversation
979aa5d
to
10f89d1
Compare
* Starts the tracing. | ||
*/ | ||
private def startSpan(msg: ActivationMessage): Option[Span] = { | ||
val tracerOption = if (GlobalTracer.isRegistered) Option.empty else Some(TracerResolver.resolveTracer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scala tip: Option(GlobalTracer.isRegistered).exists(identity).map(TracerResolver.resolveTracer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, this doesn't work because
Option(GlobalTracer.isRegistered).exists(identity)
returns either true
or false
and I can't call map on Boolean
this works:
Option(GlobalTracer.isRegistered).flatMap(if (_) Option.empty else Option(TracerResolver.resolveTracer))
but perhaps it's too much "flatmappy" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, s/exists/filter/
just worked 👍
no null! 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your code made me want to learn Scala :)
*/ | ||
private def startSpan(msg: ActivationMessage): Option[Span] = { | ||
val tracerOption = if (GlobalTracer.isRegistered) Option.empty else Some(TracerResolver.resolveTracer) | ||
tracerOption foreach GlobalTracer.register |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be that tracerOption
is null at this point:
https://travis-ci.org/apache/incubator-openwhisk/builds/245871954#L6909-L6913
java.lang.NullPointerException: Cannot register GlobalTracer <null>.
at io.opentracing.util.GlobalTracer.register(GlobalTracer.java:83)
at whisk.core.container.WhiskContainer$$anonfun$startSpan$1.apply(WhiskContainer.scala:104)
at whisk.core.container.WhiskContainer$$anonfun$startSpan$1.apply(WhiskContainer.scala:104)
at scala.Option.foreach(Option.scala:257)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I didn't realize the TracerResolver.resolveTracer()
can return also null and when using Some(null)
it actually creates the Some(null)
what I need here is the Option(null)
that results in None
10f89d1
to
f0c3ebb
Compare
Option(null) == None.
|
oops, use filter instead of exists.
|
9678f13
to
85debb1
Compare
@Jiri-Kremser , may I close #2403 in favor of this one? |
yes |
Sequence action starts its own span and component actions are then created as child spans so the tracing reflects what is actually happening in the event based system, including the simple causality by following the sequences. It looks like the activationId or cause are good candidates for the correlation id among the spans, however, in order to preserve the open-tracing ideas with tracer neutrality (we don't like vendor lock-ins, right?) the mutable Map[String,String] needs to be propagated from the parent spans (started in the controller component) to the child spans (started in the invokers). To do that, I had to add a new attribute the to ActivationMessage and also change couple of signatures in the SequenceActions.scala. => No vendor lock-in on the tracing solution = Profit!
3b0a7ca
to
7b2a120
Compare
@@ -56,7 +56,8 @@ case class ActivationMessage( | |||
activationNamespace: EntityPath, | |||
rootControllerIndex: InstanceId, | |||
content: Option[JsObject], | |||
cause: Option[ActivationId] = None) | |||
cause: Option[ActivationId] = None, | |||
tracingMetadata: Option[Map[String, String]] = None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how big might this map get?
in general this is going to be a problem - we have a standing issue that the control and data plane are conflated.
because these message pass over kafka, and carry the user payload, every bit added to the control is one less bit for the user payload; in other words, we have a bug today that a user cannot send a 1MB payload through and taking away more bits makes it worse... hence the question.
for data, I expect that message to flow externally through a redis for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of jaeger it's one key-value pair, where key is "jaeger-id" (or similar) and value some short string, zipkin is similar. So it's couple of bytes. What can also help is setting the sampling to some lower value, like 5% so that only random 5% of all actions will be traced (which reminded me that I should probably set the default value to something more sane, like 10% and perhaps make it configurable in the wisk.properties (?))
for data, I expect that message to flow externally through a redis for example.
not sure I follow you here, are you planning to add also redis in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the last point/question: #254.
as part of replicating the controller, i expect redis will be in the mix and might double multiple duties.
* the tracer implementation inserts its own ids, these ids are then propagated via the ActivationMessage into | ||
* invoker that can add the reference from the primitive action to the parent sequence action | ||
*/ | ||
private def startTracingHelper(cause: Option[ActivationId], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should consider if this is fully going to subsume the markers generated in the logs if we want to have two way of tracing or just one because the information is going to be duplicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware of the markers in the akka logging system. Do I get it right that you use it for calculating how long some execution took?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jiri-Kremser yes, that's correct. We basically use a start
and a finish
marker. These markers are set in the code using a log marker token, i.e. activationRun
. Setting the finish marker requires the start marker object. The context of this marker pair is the current transaction, i.e. user request.
In the code you basically instrument a log marker as for the activation execution, i.e. activationRun
:
implicit transid: TransactionId
....
val start = transid.started(this, LoggingMarkers.INVOKER_ACTIVATION_RUN, s"sending arguments to $actionName at $id $ip")
....
transid.finished(this, start.copy(start = r.interval.start), s"running result: ${r.toBriefString}", endTime = r.interval.end)
For both markers we write a log entry. The log entry for the start
marker contains the time since transaction start. The log entry for the finish
marker contains time since transaction start and the time since it's corresponding start marker.
start marker (141 ms since transaction start)
... sending arguments to ...[marker:invoker_activationRun_start:141]
finish marker (516 ms since transaction start, and time of activationRun
took 400ms)
... running result: ok [marker:invoker_activationRun_finish:516:400]
Based on this it's possible to calculate the times within a transaction between start and finish markers of the same log marker token and also between log marker tokens.
@rabbah is there anything else you want me to improve on this PR? Is it ok to have the open tracing code next to the markers in the logs as the first step? The tracing can be turned off or set to 10% using the env properties, so that only the fraction of the calls will be traced. |
There is a second tracing pr #2282 - does it make sense to consider consolidating efforts? |
I don't think so: there has been a thread on the mailing list (subject: "Adding OpenTracing support"), and it looks like that one has a different goal. |
Closing per discussion on the dev list https://mail-archives.apache.org/mod_mbox/incubator-openwhisk-dev/201707.mbox/%3CCAAC1_d6e_XDAVhycAeCRc6fwPrMtmmUMqcLZVNww9-TJCQEvGw%40mail.gmail.com%3E. |
UPDATE: I've enhanced the PR for the sequences.
This is based on #2403 that added the tracing for the invoker. However, opentracing allows to leverage the paren-child hierarchy among the spans so this PR also adds the support for sequences.
When starting a sequence action a new (parent) span is created and the member actions starts sub-spans. This works also if the sequence itself is member of another sequence. Using the open tracing api, it is easy to change the tracer implementation, simply by providing other jar with tracer impl that is compliant and implements correctly the SPI mechanism (
META-INF/services
). By default, the jeager is used.Hopefully, the code isn't too invasive. I had to change couple of method signatures, because it is needed to propagate the span context from the controller to the invokers so that the parent-child link can be created. In theory, the
cause
oractivationId
could have been used as a correlation id between child and parent, but to make it implementation agnostic, the opentracing API requires to use either text map or byte array as a "carrier".a pic: