Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add support for using rust-python-jaeger-reporter #7697

Merged
merged 5 commits into from
Jun 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/7697.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add support for using `rust-python-jaeger-reporter` library to reduce jaeger tracing overhead.
3 changes: 3 additions & 0 deletions mypy.ini
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,6 @@ ignore_missing_imports = True

[mypy-authlib.*]
ignore_missing_imports = True

[mypy-rust_python_jaeger_reporter.*]
ignore_missing_imports = True
39 changes: 36 additions & 3 deletions synapse/logging/opentracing.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,8 +171,9 @@ def set_fates(clotho, lachesis, atropos, father="Zues", mother="Themis"):
import re
import types
from functools import wraps
from typing import TYPE_CHECKING, Dict
from typing import TYPE_CHECKING, Dict, Optional, Type

import attr
from canonicaljson import json

from twisted.internet import defer
Expand Down Expand Up @@ -232,6 +233,30 @@ class _DummyTagNames(object):
LogContextScopeManager = None # type: ignore


try:
from rust_python_jaeger_reporter import Reporter

@attr.s(slots=True, frozen=True)
class _WrappedRustReporter:
"""Wrap the reporter to ensure `report_span` never throws.
"""

_reporter = attr.ib(type=Reporter, default=attr.Factory(Reporter))

def set_process(self, *args, **kwargs):
return self._reporter.set_process(*args, **kwargs)

def report_span(self, span):
try:
return self._reporter.report_span(span)
except Exception:
logger.exception("Failed to report span")
Comment on lines +249 to +253
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this due to lack of confidence in the underlying report_span code or a just-in-case or?

Should this be in the RustReporter-class instead?

(Also, wouldn't this be simpler as a sub-class instead of a wrapper?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called from a lot of places, so an exception here can turn fairly fatal to the entire app, so I think an abundance of caution is probably warranted here.

We have basically three options here:

  1. Make the native class not raise, which is probably mostly doable, though a) there is a layer between the python and the native code that can raise exceptions and b) means figuring out logging from the rust library (which is probably fine).
  2. Move this python class into the library so that it becomes a mixed python/native library. I thought this would turn out hard, but actually looks like the tooling now supports it (some of their docs say it doesn't, but I think they're outdated)
  3. What we've done here.

We should probably do a combination of 1 and 2 TBH.

It's probably also worth noting that the rust library currently does throw occasionally since a) it assumes something is around to catch and log exceptions and b) the python jaeger client is really inconsistent with what data types pass to the Reporter objects, and so there is a bunch of code in the rust lib to change types from the python custom class to the thrift defined classes, and they're not totally fool proof. (Plus, if you try and give a native library an integer bigger than 128 bits things explode).

(Also, wouldn't this be simpler as a sub-class instead of a wrapper?)

Maybe? I'm not a super fan of sub classing and I'm not 100% if there are any gotchas with subclassing native types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably do a combination of 1 and 2 TBH.

We could make this improvement in the future, I suppose. 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything we should log here to help identify what happened to cause the exception? Or is using logger.exception enough to provide what we need?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging the exception should tell us everything bar what span is, and sentry should tell us that.


RustReporter = _WrappedRustReporter # type: Optional[Type[_WrappedRustReporter]]
except ImportError:
RustReporter = None


logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -320,11 +345,19 @@ def init_tracer(hs: "HomeServer"):

set_homeserver_whitelist(hs.config.opentracer_whitelist)

JaegerConfig(
config = JaegerConfig(
config=hs.config.jaeger_config,
service_name="{} {}".format(hs.config.server_name, hs.get_instance_name()),
scope_manager=LogContextScopeManager(hs.config),
).initialize_tracer()
)

# If we have the rust jaeger reporter available let's use that.
if RustReporter:
logger.info("Using rust_python_jaeger_reporter library")
tracer = config.create_tracer(RustReporter(), config.sampler)
opentracing.set_global_tracer(tracer)
else:
config.initialize_tracer()


# Whitelisting
Expand Down