Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix OpenTracing trace context propagation with Fault Tolerance @Asynchronous methods #7987

Merged
merged 2 commits into from
Apr 14, 2020

Conversation

Ladicek
Copy link
Contributor

@Ladicek Ladicek commented Mar 19, 2020

There used to be a test for that, but didn't involve @Asynchronous
methods, so was in fact useless. This commit changes the test to
actually call an @Asynchronous method, which revealed an issue.
The smallrye-fault-tolerance-tracing-propagation dependency was
missing, so trace context propagation didn't actually work.

Then, the test also uncovered an issue in SmallRye Fault Tolerance
(see smallrye/smallrye-fault-tolerance#208),
which is why this commit also brings a new version of it.

In addition to that, this commit also fixes a devmode issue where
on hot reload, the Jaeger extension tried to register a gauge
that was already present. That ended up with an exception.

And finally, this commit also adds a test for tracing JDBC.

Resolves #7981.
Resolves #7982.
Resolves #7984.

@boring-cyborg boring-cyborg bot added area/dependencies Pull requests that update a dependency file area/smallrye labels Mar 19, 2020
@Ladicek Ladicek force-pushed the opentracing-tests-and-fixes branch from c599c08 to 568e411 Compare March 25, 2020 13:54
@Ladicek Ladicek marked this pull request as ready for review March 25, 2020 13:54
@Ladicek
Copy link
Contributor Author

Ladicek commented Mar 25, 2020

Hmm, the failing test passes for me... I'll dig into it.

@Ladicek
Copy link
Contributor Author

Ladicek commented Mar 25, 2020

Hmm, it seems the test is quite flaky :-(

@Ladicek
Copy link
Contributor Author

Ladicek commented Mar 25, 2020

I can reproduce the test failure locally when I let the test repeat 1000 times. It typically takes cca 400 attempts for the failure to occur. Not terribly fun to debug and I suspect there's some thread safety issue buried deep in ... I don't know where :-)

@gsmet
Copy link
Member

gsmet commented Mar 25, 2020

Not sure if it makes sense to backport that one, given the tests are flaky?

@Ladicek
Copy link
Contributor Author

Ladicek commented Mar 26, 2020

I would love to have this backported, but the test issue indeed needs to be resolved first.

@Ladicek
Copy link
Contributor Author

Ladicek commented Mar 30, 2020

I tracked this down to a SmallRye Fault Tolerance threading issue: smallrye/smallrye-fault-tolerance#208 Will update this PR when new SmallRye Fault Tolerance is released.

@Ladicek Ladicek force-pushed the opentracing-tests-and-fixes branch from 568e411 to 7235832 Compare April 1, 2020 15:12
@Ladicek
Copy link
Contributor Author

Ladicek commented Apr 2, 2020

This is now ready.

The failing test is unrelated: io.quarkus.mongodb.reactive.legacy.AxleReactiveMongoClientTest.

The test that was previously failing (TracingTest) is now OK. I've ran it many thousands of times locally while figuring out what's wrong (there's was a concurrency issue in SmallRye Fault Tolerance which is fixed in new version).

@gsmet
Copy link
Member

gsmet commented Apr 2, 2020

@jmartisk could you have a look? Thanks!

@gsmet gsmet added this to the 1.4.0 milestone Apr 2, 2020
JaegerGauge gauge = registry.register(meta(name, MetricType.GAUGE), new JaegerGauge(), toTagArray(tags));
Tag[] tagArray = toTagArray(tags);
JaegerGauge gauge = (JaegerGauge) registry.getGauges().get(new MetricID(name, tagArray));
if (gauge == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #7018 we automatically drop and re-create all metric registries during shutdown before a hot reload, so it should not happen anymore that the gauge is still registered. Are you sure we still need this? That would mean the registry dropping is wrong?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen an exception during hot reload on this. I'll try again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is still required. When I revert this change, I get this on hot reload:

org.jboss.resteasy.spi.UnhandledException: java.lang.RuntimeException: No reflection exceptions should be thrown unless there is a fundamental error in your code set up.
        at org.jboss.resteasy.core.ExceptionHandler.handleException(ExceptionHandler.java:381)
        at org.jboss.resteasy.core.SynchronousDispatcher.writeException(SynchronousDispatcher.java:216)
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:515)
        at org.jboss.resteasy.core.SynchronousDispatcher.lambda$invoke$4(SynchronousDispatcher.java:259)
        at org.jboss.resteasy.core.SynchronousDispatcher.lambda$preprocess$0(SynchronousDispatcher.java:160)
        at org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:362)
        at org.jboss.resteasy.core.SynchronousDispatcher.preprocess(SynchronousDispatcher.java:163)
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:245)
        at io.quarkus.resteasy.runtime.standalone.RequestDispatcher.service(RequestDispatcher.java:73)
        at io.quarkus.resteasy.runtime.standalone.VertxRequestHandler.dispatch(VertxRequestHandler.java:122)
        at io.quarkus.resteasy.runtime.standalone.VertxRequestHandler.access$000(VertxRequestHandler.java:36)
        at io.quarkus.resteasy.runtime.standalone.VertxRequestHandler$1.run(VertxRequestHandler.java:87)
        at io.quarkus.runtime.CleanableExecutor$CleaningRunnable.run(CleanableExecutor.java:231)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
        at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:2027)
        at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1551)
        at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1442)
        at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
        at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
        at java.lang.Thread.run(Thread.java:748)
        at org.jboss.threads.JBossThread.run(JBossThread.java:479)
Caused by: java.lang.RuntimeException: No reflection exceptions should be thrown unless there is a fundamental error in your code set up.
        at io.jaegertracing.internal.metrics.Metrics.createMetrics(Metrics.java:76)
        at io.jaegertracing.internal.metrics.Metrics.<init>(Metrics.java:32)
        at io.jaegertracing.internal.metrics.Metrics.<init>(Metrics.java:28)
        at io.jaegertracing.Configuration.getTracerBuilder(Configuration.java:221)
        at io.quarkus.jaeger.runtime.QuarkusJaegerTracer.tracer(QuarkusJaegerTracer.java:40)
        at io.quarkus.jaeger.runtime.QuarkusJaegerTracer.buildSpan(QuarkusJaegerTracer.java:59)
        at io.opentracing.util.GlobalTracer.buildSpan(GlobalTracer.java:133)
        at io.quarkus.smallrye.opentracing.runtime.TracerProducer_ProducerMethod_tracer_96dadb3d6afa0cccadfe742c3e06ad433737c844_ClientProxy.buildSpan(TracerProducer_ProducerMethod_tracer_96dadb3d6afa0cccadfe742c3e06ad433737c844_ClientProxy.zig:238)
        at io.opentracing.contrib.jaxrs2.server.ServerTracingFilter.filter(ServerTracingFilter.java:69)
        at org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:310)
        at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:439)
        at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:400)
        at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:374)
        at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:67)
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:488)
        ... 20 more
Caused by: java.lang.IllegalArgumentException: A metric with metricID MetricID{name='jaeger_tracer_reporter_queue_length', tags=[]} already exists
        at io.smallrye.metrics.MetricsRegistryImpl.register(MetricsRegistryImpl.java:131)
        at io.quarkus.jaeger.runtime.QuarkusJaegerMetricsFactory.createGauge(QuarkusJaegerMetricsFactory.java:48)
        at io.jaegertracing.internal.metrics.Metrics.createMetrics(Metrics.java:66)
        ... 34 more

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmartisk Any idea? Thanks! :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fishy, I'll try to reproduce this and let you know

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seriously? 😮 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I would probably be able to do all that, but that would make this PR pretty large. I just wanted to fix the trace context propagation, found a dev mode issue, included a fix which was tiny, and now this :-/

Here's my proposal: I already have finished the conditional inclusion/exclusion of metrics based on the capability. I have also changed quarkus-jaeger to no longer depend on smallrye-metrics and microprofile-metrics-api; instead, it uses an optional dependency on quarkus-smallrye-metrics as you suggest. It seems to work (didn't try native 😁). What's is not done is addressing your 2nd comment, the one starting with "And what would be absolutely optimal" -- I'd leave that for future. Agree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's clearly not the goal of this PR.
I created #8502 for that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks! I'll update this PR in a few minutes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR rebased and updated per discussion above.

Ladicek added 2 commits April 9, 2020 13:13
…hronous methods

There used to be a test for that, but didn't involve `@Asynchronous`
methods, so was in fact useless. This commit changes the test to
actually call an `@Asynchronous` method, which revealed an issue.
The `smallrye-fault-tolerance-tracing-propagation` dependency was
missing, so trace context propagation didn't actually work.

Then, the test also uncovered an issue in SmallRye Fault Tolerance
(see smallrye/smallrye-fault-tolerance#208),
which is why this commit also brings a new version of it.

In addition to that, this commit also adds a test for tracing JDBC.
The Jaeger extension used to collect metrics unconditionally,
even when the SmallRye Metrics extension wasn't present.
With this commit, the Jaeger extension uses `NoopMetricsFactory`
if SmallRye Metrics are missing, and only uses the real
metrics factory if SmallRye Metrics are present.

This is not totally consistent with how metrics are enabled/disabled
in all other extensions. Fixing that is left for future.
@Ladicek Ladicek force-pushed the opentracing-tests-and-fixes branch from 7235832 to 1b676f3 Compare April 9, 2020 11:15
Copy link
Contributor

@jmartisk jmartisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@gsmet gsmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, per Jan.

@gsmet gsmet merged commit 8ca1d0d into quarkusio:master Apr 14, 2020
@Ladicek Ladicek deleted the opentracing-tests-and-fixes branch April 15, 2020 06:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dependencies Pull requests that update a dependency file area/jaeger area/smallrye
Projects
None yet
7 participants