Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metrics-generator] filter out spans based on policy #2274

Merged
merged 44 commits into from
May 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
8d93769
First pass at span filtering
zalegrala Mar 29, 2023
504dc32
Validate the spanmetrics filteirng config on startup
zalegrala Mar 29, 2023
d7192ab
Give some hope that we return a true match
zalegrala Mar 29, 2023
5b38fd4
Drop unused argument service name and rely on attributes
zalegrala Mar 29, 2023
85b574b
Handling a few intrinsics
zalegrala Mar 29, 2023
47257a5
Include documentation for spanmetrics filtering policies
zalegrala Mar 30, 2023
c06443a
Update docs/sources/tempo/metrics-generator/span_metrics.md
zalegrala Mar 30, 2023
815bdd9
Update docs/sources/tempo/metrics-generator/span_metrics.md
zalegrala Mar 30, 2023
df84ba5
Update docs/sources/tempo/metrics-generator/span_metrics.md
zalegrala Mar 30, 2023
0f5a60d
Update docs/sources/tempo/metrics-generator/span_metrics.md
zalegrala Mar 30, 2023
676898b
Update docs/sources/tempo/metrics-generator/span_metrics.md
zalegrala Mar 30, 2023
604b0c9
Update docs/sources/tempo/metrics-generator/span_metrics.md
zalegrala Mar 30, 2023
b34607d
Adjust filter policy to split policies during New()
zalegrala Apr 6, 2023
27f2060
Update test for intrinsic
zalegrala Apr 6, 2023
887a007
Include benchmark and supporting span generator
zalegrala Apr 6, 2023
08ff12e
Include metric for counting spans that have been filtered out
zalegrala Apr 7, 2023
d030644
Include config warning when unsupported intrinic is used
zalegrala Apr 11, 2023
6b91f32
Relocate spanmetrics.FilterPolicy to sharedconfig package and impleme…
zalegrala Apr 11, 2023
faa54d7
Include sharedconfig pacakge
zalegrala Apr 12, 2023
ca4c226
Update modules/generator/processor/spanmetrics/spanmetrics.go
zalegrala Apr 12, 2023
ca74049
Refactor spanfilter into its own package
zalegrala Apr 14, 2023
d5d3fdd
Include tests for spanfilter.New()
zalegrala Apr 14, 2023
abc52b0
Update spanmetrics processor to return an error for spanfilter error
zalegrala Apr 14, 2023
6ca1914
Relocate config validation to spanfilter during New
zalegrala Apr 14, 2023
a081371
Update tests for spanmetrics error return
zalegrala Apr 14, 2023
26f3f6f
Drop unused
zalegrala Apr 14, 2023
ee6981d
Update docs to include nesting of filtering config
zalegrala Apr 14, 2023
082ece9
Exit early when attributes are unmatched
zalegrala Apr 17, 2023
beb3047
Exit early when intrinsics are not matched
zalegrala Apr 17, 2023
3bc8e16
Preallocate a couple variables
zalegrala Apr 17, 2023
83521d5
Add note about use of RandomBatcher
zalegrala Apr 17, 2023
a9f7c97
Update changelog
zalegrala Apr 20, 2023
e8e8b9c
Drop TODO comment
zalegrala Apr 20, 2023
2fbdab0
Add back the lost metric during rebase
zalegrala Apr 20, 2023
bc3c9cd
Fix policy override configuration
zalegrala Apr 24, 2023
91b81ce
Include generator config test
zalegrala Apr 24, 2023
cce79ef
Migrate the metric and expand reasons
zalegrala Apr 24, 2023
df6c139
Update tests for discardCounter
zalegrala Apr 25, 2023
4f15bde
Include doc about which kinds are available for filtering
zalegrala Apr 26, 2023
de30000
Spellcheck
zalegrala Apr 26, 2023
79151bb
Perform number matching for kind and status
zalegrala May 1, 2023
ebe07dd
Rename discardCounter to filteredSpansCounter
zalegrala May 2, 2023
7b90d5a
Improve error quality
zalegrala May 2, 2023
fba090a
Update error message in test
zalegrala May 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
* [ENHANCEMENT] Add synchronous read mode to vParquet and vParquet2 optionally enabled by env vars [#2165](https://github.com/grafana/tempo/pull/2165) (@mdisibio)
* [ENHANCEMENT] Add option to override metrics-generator ring port [#2399](https://github.com/grafana/tempo/pull/2399) (@mdisibio)
* [ENHANCEMENT] Add support for IPv6 [#1555](https://github.com/grafana/tempo/pull/1555) (@zalegrala)
* [ENHANCEMENT] Add span filtering to spanmetrics processor [#2274](https://github.com/grafana/tempo/pull/2274) (@zalegrala)
* [BUGFIX] tempodb integer divide by zero error [#2167](https://github.com/grafana/tempo/issues/2167) (@kroksys)
* [CHANGE] **Breaking Change** Rename s3.insecure_skip_verify [#???](https://github.com/grafana/tempo/pull/???) (@zalegrala)
```yaml
Expand Down
93 changes: 86 additions & 7 deletions docs/sources/tempo/metrics-generator/span_metrics.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
aliases:
- /docs/tempo/latest/server_side_metrics/span_metrics/
- /docs/tempo/latest/metrics-generator/span_metrics/
- /docs/tempo/latest/server_side_metrics/span_metrics/
- /docs/tempo/latest/metrics-generator/span_metrics/
zalegrala marked this conversation as resolved.
Show resolved Hide resolved
title: Generate metrics from spans
weight: 400
---
Expand All @@ -11,8 +11,9 @@ weight: 400
The span metrics processor generates metrics from ingested tracing data, including request, error, and duration (RED) metrics.

Span metrics generate two metrics:
* A counter that computes requests
* A histogram that tracks the distribution of durations of all requests

- A counter that computes requests
- A histogram that tracks the distribution of durations of all requests

Span metrics are of particular interest if your system is not monitored with metrics,
but it has distributed tracing implemented.
Expand Down Expand Up @@ -43,7 +44,7 @@ This processor is designed with the goal to mirror the implementation from the O
The following metrics are exported:

| Metric | Type | Labels | Description |
|--------------------------------|-----------|------------|------------------------------|
| ------------------------------ | --------- | ---------- | ---------------------------- |
| traces_spanmetrics_latency | Histogram | Dimensions | Duration of the span |
| traces_spanmetrics_calls_total | Counter | Dimensions | Total count of the span |
| traces_spanmetrics_size_total | Counter | Dimensions | Total size of spans ingested |
Expand All @@ -56,7 +57,6 @@ When a configured dimension collides with one of the default labels (e.g. `statu

If you use ratio based sampler you can use custom sampler below to not lose metric information, you also need to set `metrics_generator.processor.span_metrics.span_multiplier_key` to `"X-SampleRatio"`


```go
package tracer
import (
Expand Down Expand Up @@ -91,6 +91,85 @@ func (ds RatioBasedSampler) Description() string {
}
```

### Filtering

In some cases, you may want to reduce the number of metrics produced by the `spanmetrics` processor. You can configure the processor to use an `include` filter to match criteria that must be present in the span in order to be included. Following the include filter, an `exclude` filter may be used to reject portions of what was previously included by the filter policy.

Currently, only filtering by resource and span attributes with the following value types is supported.

- `bool`
- `double`
- `int`
- `string`

Additionally, these intrinsic span attributes may be filtered upon:

- `name`
- `status` (code)
- `kind`
zalegrala marked this conversation as resolved.
Show resolved Hide resolved

The following intrinsic kinds are available for filtering.

- `SPAN_KIND_SERVER`
- `SPAN_KIND_INTERNAL`
- `SPAN_KIND_CLIENT`
- `SPAN_KIND_PRODUCER`
- `SPAN_KIND_CONSUMER`

Intrinsic keys can be acted on directly when implementing a filter policy. For example:

```yaml
---
metrics_generator:
processor:
span_metrics:
filter_policies:
- include:
match_type: strict
attributes:
- key: kind
value: SPAN_KIND_SERVER
```

In this example, spans which are of `kind` "server" are included for metrics export.

When selecting spans based on non-intrinsic attributes, it is required to specify the scope of the attribute, similar to how it is specified in TraceQL. For example, if the `resource` contains a `location` attribute which is to be used in a filter policy, then the reference needs to be specified as `resource.location`. This requires users to know and specify which scope an attribute is to be found and avoids the ambiguity of conflicting values at differing scopes. The following may help illustrate.

```yaml
---
metrics_generator:
processor:
span_metrics:
filter_policies:
- include:
match_type: strict
attributes:
- key: resource.location
value: earth
```

In the above examples, we are using `match_type` of `strict`, which is a direct comparison of values. An additional option for `match_type` is `regex`. This allows users to build a regular expression to match against.

```yaml
---
metrics_generator:
processor:
span_metrics:
filter_policies:
- include:
match_type: regex
attributes:
- key: resource.location
value: eu-.*
- exclude:
match_type: regex
attributes:
- key: resource.tier
value: dev-.*
```

In the above, we first include all spans which have a `resource.location` that begins with `eu-` with the `include` statement, and then exclude those with begin with `dev-`. In this way, a flexible approach to filtering can be achieved to ensure that only metrics which are important are generated.

## Example

<p align="center"><img src="../span-metrics-example.png" alt="Span metrics overview"></p>
<p align="center"><img src="../span-metrics-example.png" alt="Span metrics overview"></p>
3 changes: 3 additions & 0 deletions modules/generator/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,9 @@ func (cfg *ProcessorConfig) copyWithOverrides(o metricsGeneratorOverrides, userI
return ProcessorConfig{}, errors.Wrap(err, "fail to apply overrides")
}
}
if filterPolicies := o.MetricsGeneratorProcessorSpanMetricsFilterPolicies(userID); filterPolicies != nil {
copyCfg.SpanMetrics.FilterPolicies = filterPolicies
}

return copyCfg, nil
}
62 changes: 62 additions & 0 deletions modules/generator/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (

"github.com/grafana/tempo/modules/generator/processor/servicegraphs"
"github.com/grafana/tempo/modules/generator/processor/spanmetrics"
"github.com/grafana/tempo/pkg/spanfilter/config"
)

func TestProcessorConfig_copyWithOverrides(t *testing.T) {
Expand Down Expand Up @@ -69,4 +70,65 @@ func TestProcessorConfig_copyWithOverrides(t *testing.T) {
_, err := original.copyWithOverrides(o, "tenant")
require.Error(t, err)
})

t.Run("nil policy overrides", func(t *testing.T) {
o := &mockOverrides{
spanMetricsFilterPolicies: nil,
}

copied, err := original.copyWithOverrides(o, "tenant")
require.NoError(t, err)

assert.Equal(t, *original, copied)
})

t.Run("empty policy overrides", func(t *testing.T) {
o := &mockOverrides{
spanMetricsFilterPolicies: []config.FilterPolicy{},
}

copied, err := original.copyWithOverrides(o, "tenant")
require.NoError(t, err)

assert.NotEqual(t, *original, copied)

assert.Equal(t, []config.FilterPolicy{}, copied.SpanMetrics.FilterPolicies)
})

t.Run("policy overrides", func(t *testing.T) {
o := &mockOverrides{
spanMetricsFilterPolicies: []config.FilterPolicy{
{
Include: &config.PolicyMatch{
MatchType: config.Strict,
Attributes: []config.MatchPolicyAttribute{
{
Key: "key",
Value: "value",
},
},
},
},
},
}

copied, err := original.copyWithOverrides(o, "tenant")
require.NoError(t, err)

assert.NotEqual(t, *original, copied)

assert.Equal(t, []config.FilterPolicy{
{
Include: &config.PolicyMatch{
MatchType: config.Strict,
Attributes: []config.MatchPolicyAttribute{
{
Key: "key",
Value: "value",
},
},
},
},
}, copied.SpanMetrics.FilterPolicies)
})
}
12 changes: 10 additions & 2 deletions modules/generator/instance.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,10 @@ var (
}, []string{"tenant", "reason"})
)

const reasonOutsideTimeRangeSlack = "outside_metrics_ingestion_slack"
const (
reasonOutsideTimeRangeSlack = "outside_metrics_ingestion_slack"
reasonSpanMetricsFiltered = "span_metrics_filtered"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe something like "filter_policy" ? because I think "span_metrics" is already in the metrics name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this doesn't read well. Its used to indicate the number of spans rejected by the filter, so what about spans_rejected or filter_policy_rejections maybe? I was thinking that if we had another processor also filtering, differentiating between them would be nice. We could include a label to include a label perhaps for the name of the processor doing the rejecting. Not sure though.

)

type instance struct {
cfg *Config
Expand Down Expand Up @@ -256,9 +259,14 @@ func (i *instance) addProcessor(processorName string, cfg ProcessorConfig) error
level.Debug(i.logger).Log("msg", "adding processor", "processorName", processorName)

var newProcessor processor.Processor
var err error
switch processorName {
case spanmetrics.Name:
newProcessor = spanmetrics.New(cfg.SpanMetrics, i.registry)
filteredSpansCounter := metricSpansDiscarded.WithLabelValues(i.instanceID, reasonSpanMetricsFiltered)
newProcessor, err = spanmetrics.New(cfg.SpanMetrics, i.registry, filteredSpansCounter)
if err != nil {
return err
}
case servicegraphs.Name:
newProcessor = servicegraphs.New(cfg.ServiceGraphs, i.instanceID, i.registry, i.logger)
default:
Expand Down
2 changes: 2 additions & 0 deletions modules/generator/overrides.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package generator
import (
"github.com/grafana/tempo/modules/generator/registry"
"github.com/grafana/tempo/modules/overrides"
filterconfig "github.com/grafana/tempo/pkg/spanfilter/config"
)

type metricsGeneratorOverrides interface {
Expand All @@ -14,6 +15,7 @@ type metricsGeneratorOverrides interface {
MetricsGeneratorProcessorSpanMetricsHistogramBuckets(userID string) []float64
MetricsGeneratorProcessorSpanMetricsDimensions(userID string) []string
MetricsGeneratorProcessorSpanMetricsIntrinsicDimensions(userID string) map[string]bool
MetricsGeneratorProcessorSpanMetricsFilterPolicies(userID string) []filterconfig.FilterPolicy
}

var _ metricsGeneratorOverrides = (*overrides.Overrides)(nil)
11 changes: 10 additions & 1 deletion modules/generator/overrides_test.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
package generator

import "time"
import (
"time"

filterconfig "github.com/grafana/tempo/pkg/spanfilter/config"
)

type mockOverrides struct {
processors map[string]struct{}
Expand All @@ -9,6 +13,7 @@ type mockOverrides struct {
spanMetricsHistogramBuckets []float64
spanMetricsDimensions []string
spanMetricsIntrinsicDimensions map[string]bool
spanMetricsFilterPolicies []filterconfig.FilterPolicy
}

var _ metricsGeneratorOverrides = (*mockOverrides)(nil)
Expand Down Expand Up @@ -48,3 +53,7 @@ func (m *mockOverrides) MetricsGeneratorProcessorSpanMetricsDimensions(userID st
func (m *mockOverrides) MetricsGeneratorProcessorSpanMetricsIntrinsicDimensions(userID string) map[string]bool {
return m.spanMetricsIntrinsicDimensions
}

func (m *mockOverrides) MetricsGeneratorProcessorSpanMetricsFilterPolicies(userID string) []filterconfig.FilterPolicy {
return m.spanMetricsFilterPolicies
}
4 changes: 4 additions & 0 deletions modules/generator/processor/spanmetrics/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package spanmetrics
import (
"flag"

filterconfig "github.com/grafana/tempo/pkg/spanfilter/config"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
)
Expand Down Expand Up @@ -34,6 +35,9 @@ type Config struct {
// Subprocessor options for this Processor include Latency, Count, Size
// These are metrics categories that exist under the umbrella of Span Metrics
Subprocessors map[Subprocessor]bool

// FilterPolicies is a list of policies that will be applied to spans for inclusion or exlusion.
FilterPolicies []filterconfig.FilterPolicy `yaml:"filter_policies"`
}

func (cfg *Config) RegisterFlagsAndApplyDefaults(prefix string, f *flag.FlagSet) {
Expand Down
23 changes: 20 additions & 3 deletions modules/generator/processor/spanmetrics/spanmetrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,12 @@ import (
gen "github.com/grafana/tempo/modules/generator/processor"
processor_util "github.com/grafana/tempo/modules/generator/processor/util"
"github.com/grafana/tempo/modules/generator/registry"
"github.com/grafana/tempo/pkg/spanfilter"
"github.com/grafana/tempo/pkg/tempopb"
v1 "github.com/grafana/tempo/pkg/tempopb/resource/v1"
v1_trace "github.com/grafana/tempo/pkg/tempopb/trace/v1"
tempo_util "github.com/grafana/tempo/pkg/util"
"github.com/prometheus/client_golang/prometheus"
)

const (
Expand All @@ -31,11 +33,14 @@ type Processor struct {
spanMetricsDurationSeconds registry.Histogram
spanMetricsSizeTotal registry.Counter

filter *spanfilter.SpanFilter
filteredSpansCounter prometheus.Counter

zalegrala marked this conversation as resolved.
Show resolved Hide resolved
// for testing
now func() time.Time
}

func New(cfg Config, registry registry.Registry) gen.Processor {
func New(cfg Config, registry registry.Registry, spanDiscardCounter prometheus.Counter) (gen.Processor, error) {
labels := make([]string, 0, 4+len(cfg.Dimensions))

if cfg.IntrinsicDimensions.Service {
Expand Down Expand Up @@ -68,10 +73,18 @@ func New(cfg Config, registry registry.Registry) gen.Processor {
if cfg.Subprocessors[Size] {
p.spanMetricsSizeTotal = registry.NewCounter(metricSizeTotal, labels)
}

filter, err := spanfilter.NewSpanFilter(cfg.FilterPolicies)
if err != nil {
return nil, err
}

p.Cfg = cfg
p.registry = registry
p.now = time.Now
return p
p.filteredSpansCounter = spanDiscardCounter
p.filter = filter
return p, nil
}

func (p *Processor) Name() string {
Expand All @@ -95,7 +108,11 @@ func (p *Processor) aggregateMetrics(resourceSpans []*v1_trace.ResourceSpans) {

for _, ils := range rs.ScopeSpans {
for _, span := range ils.Spans {
p.aggregateMetricsForSpan(svcName, rs.Resource, span)
if p.filter.ApplyFilterPolicy(rs.Resource, span) {
p.aggregateMetricsForSpan(svcName, rs.Resource, span)
continue
}
p.filteredSpansCounter.Inc()
}
}
}
Expand Down
Loading