-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic implementation of OpenCensus #1414
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Daz!
I think this is a great idea, could potentially kill off a bunch of complexity eventually by pulling everything under opencensus.
I popped a few light comments in-line, though I know you said the code's not really ready yet.
monitoring/opencensus/metrics.go
Outdated
log.Fatal(err) | ||
} | ||
// Important to invoke Flush before exiting | ||
defer sd.Flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to have this func not be init
but an exported Initialise
func which itself returns a func()
to be called at shutdown time, e.g.
func Initialize() (func(), error) {
...
return func() {
if using SD {
sd.Flush()
}
if using DD {
dd.Stop()
}
}, nil
}
then in the various server main
s:
...
var mf monitoring.MetricFactory
if *useOpencensus {
flush, err := opencensus.Initialise()
// if err etc.
defer flush()
mf = &opencensus.MetricFactory{..}
}
...
Otherwise, as it stands, I think it'll just flush anything that's been logged during the course of the init
method when the init returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed thanks to your guidance: 2706a6e
This requires changes to each server main
which would break the Prometheus implementation. So, leaving unchanged for now. Must also understand how best to reflect the configuration through Trillian's configuration.
//TODO(dazwilkin) Datadog, Prometheus & Stackdriver should be configurable
if true {
flush, err := opencensus.Initialize()
if err != nil {
glog.Exitf("Failed to initialize OpenCensus Exporter(s): %v", err)
}
defer flush()
}
monitoring/opencensus/metrics.go
Outdated
MetricPrefix: namespace, | ||
}) | ||
if err != nil { | ||
log.Fatal(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a mix of log
and glog
through here, is that because you don't want to rely on glog
being usable in init
?
monitoring/opencensus/metrics.go
Outdated
} | ||
} | ||
|
||
// createKeyMethods creates OpenCensus Tag Keys for each label name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createTagKeys
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed: c44fd34
monitoring/opencensus/metrics.go
Outdated
// -- contains non-printable ASCII | ||
// -- or len is 0 or >256 | ||
// Printable ASCII 32-126 inclusive | ||
func checkLabelNames(labelNames []string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: labelNames
-> names
or ln
to reduce stutter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed: c6436c0
monitoring/opencensus/metrics.go
Outdated
return tagKeys | ||
} | ||
|
||
// createMeasureAndView creates the OpenCensus Measure used to record stats and a View for reporting them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit missing full stops in comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed: d5ab0a5
monitoring/opencensus/metrics.go
Outdated
|
||
// forAllLabelsAValue trivially ensures the numbers of labels matches the number of values | ||
func forAllLabelsAValue(labels, values []string) error { | ||
if len(labels) != len(values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can reduce clutter and re-calculation a bit like this:
if ll, lv := len(labels), len(values); ll != lv {
return fmt.Errorf(..., ll, lv)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed: e487093
monitoring/opencensus/metrics.go
Outdated
return r | ||
} | ||
|
||
// NewHistogram creates a new Histogram object backed by Prometheus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/Prometheus/OpenCensus/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed: acc264c
} | ||
|
||
// Inc adds 1 to the gauge | ||
func (g *Gauge) Inc(labelVals ...string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible these are unused, or calling code could be refactored to call Set
(github search makes it hard to figure that out, vi
and go build
would do a better job :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment but I think perhaps you allude to this issue:
One challenge implementing the Trillian metric interface with OpenCensus is that the interface assumes the possibility of tracking state of the measure. Unlike with Prometheus (where this is possible), OpenCensus is a proxy (for one or more monitoring services) and the proxy simply dispatches the current value to the configured backends without recording the current measure's value.
As a result, it's possible to increment Counters, and set the value of Gauges. But it's not possible to get the value of either, and thus it's not possible to increment, decrement of add values to e.g. Gauges because this requires retaining knowledge of the current value.
If the interface is changed to reflect this, then the Prometheus tests that assume this, will break. As it is, the OpenCensus tests break because it's not possible to get the measure values.
@AlCutter -- Yes, was seeking feedback and guidance, thank you. I'm headed out on vacation but will incorporate your feedback when I'm back. |
Apologies... a further week's delay. We lost power this week during Seattle's Snowmaggedon and it's set me back |
OpenCensus now provides an Agent. Among other features (including accepting incoming Prometheus metric data), this proxies OpenCensus metric data to OpenCensus Exporters. This means that, instead of -- as currently -- having to supported multiple Exporters (Stackdriver, Prometheus, Datadog) in the Trillian code, the code may be simplified to write to a generic OpenCensus exporter that itself writes to the Agent. At run|deploy time, the operator can decide which monitoring|trace services to configure. I'll simplify the code to use the generic Exporter and re-commit. |
monitoring/opencensus/metrics.go
Outdated
) | ||
|
||
const ( | ||
namespace = "trillian" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace
is unused (from deadcode
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed: 30c7a29
monitoring/opencensus/metrics.go
Outdated
if strings.IndexFunc(labelName, nonPrintableASCII) != -1 { | ||
glog.Fatalf("OpenCensus label names must be printable ASCII; '%s' is not", labelName) | ||
if strings.IndexFunc(name, nonPrintableASCII) != -1 { | ||
glog.Fatalf("OpenCensus label names must be printable ASCII; '%s' is not", name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitty nit: can use %q
rather than '%s'
for the same output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's missing an interface method:
|
There's an issue in the OpenCensus Stackdriver Exporter. If buckets are defined to include a 0-valued bucket i.e. [0.0, ....], the Exporter fails and no metric data is sent to Stackdriver: census-ecosystem/opencensus-go-exporter-stackdriver#98 This Trillian OpenCensus code now ships metrics to an OpenCensus Agent. The Trillian code is unable to determine whether the Agent is configured to export to Stackdriver. To avoid this error (and assuming -- !! -- that this doesn't impact the other Exporters), the code currently logs the fact and deletes the 0-valued bucket from the list of buckets. NB The Interface doesn't support and the existing Prometheus code doesn't check for a histogram created with zero buckets. Neither does this OpenCensus code (though it does log the fact). This appears to be an error case. |
Probably need a rebase on this - Travis is failing due to an obsolete workaround still being attempted. |
Hi @DazWilkin, is this stalled on the linked exporter issue or does it just need some work to rebase it and get it merged? |
I'm just back from vacation. I am unclear what failed. I'll try to recreate this, review and update this thread by the end of this week, OK? |
@RJPercival tested and confirmed that it continues to work. Trillian will need an additional
I am unclear why this was failing CI but I've removed the tests. OpenCensus ships metrics asynchronously to exporters (in this case, the OpenCensus Agent). Thus it has no (easy) way to retrieve measurement values and thus it's unable to succeed with the monitoring tests as defined. I've revised the
var (
ocatEndpoint = flag.String("ocat_endpoint", "localhost:55678", "Endpoint for OpenCensus Agent receiver (host:port)")
)
...
mf := opencensus.MetricFactory{}
flush, err := opencensus.Initialize(*ocatEndpoint)
if err != nil {
glog.Exitf("Failed to initialize OpenCensus Agent: %v", err)
}
defer flush() NB Because it no longer defaults to (only use) Prometheus, the server's HTTP endpoint no longer includes the |
Hey Daz, we're just cleaning out old PR/issues - there's clearly a bunch of work which went into this, is it worth trying to bump this up to date and land it or is it less useful now? |
As you know, OpenCensus is superseded by OpenTelemetry which, IIRC, implements a metric format that's very close to Prometheus' exposition format (maybe even the same?). I think this PR should be dropped and Trillian may either export directly as an OpenTelemetry exporter or worst-case use the OpenTelemetry collector. |
Cool, thanks Daz! |
Implementation of OpenCensus monitoring (alternative to Prometheus-specific monitoring and a complement to the existing OpenCensus trace).
The code needs work.
I'm interested in feedback on the implementation and its usefulness.
Checklist