Skip to content

Commit

Permalink
Make intrinsic dimensions configurable and disable status_message by …
Browse files Browse the repository at this point in the history
…default (#1960)

* Add intrinsic dimenstions to spanmetrics config

* Remove deprecated linters from glangci-lint config

The linters deadcode, structcheck, and varcheck are deprecated since
golangci-lint v1.49.0. They are replace by unused which is already enabled
in our config

* Make intrinsic dimensions configurable for spanmetrics

* Disable intrinsic dimension status_message by default

* Add overrides for intrinsic dimension configuration

* Docs for the configuration of intrinsic dimensions

* Add PR to changelog

* Regenerate configuration schema in manifest.md

* Handle dimensions as conflicts when intrinsic dimension is absent

Custom dimensions from span attributs are also treated as conflicts
when the conflicting intrinsic dimension is not actually present. This
prevents those dimensions from being renamed when the intrinsic dimension
configuration changes

* Mark change #1794 as breaking in changelog

* Unit test for intrinsic dimension override

* Fail on invalid intrinsic dimension override keys

* Fix dimension prefix in changelog
  • Loading branch information
stoewer authored Dec 21, 2022
1 parent 1ea6e4f commit 0ed9ddc
Show file tree
Hide file tree
Showing 15 changed files with 249 additions and 106 deletions.
3 changes: 0 additions & 3 deletions .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@ output:

linters:
enable:
- deadcode
- errcheck
- goconst
- gofmt
Expand All @@ -53,12 +52,10 @@ linters:
- megacheck
- misspell
- revive
- structcheck
- typecheck
- unconvert
- unparam
- unused
- varcheck

linters-settings:
errcheck:
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,8 @@ Internal types are updated to use `scope` instead of `instrumentation_library`.
* [BUGFIX] New wal file separator '+' for the NTFS filesystem and backward compatibility with the old separator ':' [#1700](https://github.com/grafana/tempo/pull/1700) (@kilian-kier)
* [ENHANCEMENT] metrics-generator: extract `status_message` field from spans [#1786](https://github.com/grafana/tempo/pull/1786), [#1794](https://github.com/grafana/tempo/pull/1794) (@stoewer)
* [ENHANCEMENT] metrics-generator: handle collisions between user defined and default dimensions [#1794](https://github.com/grafana/tempo/pull/1794) (@stoewer)
**BREAKING CHANGE** Custom dimensions colliding with intrinsic dimensions will be prefixed with `__`.
* [ENHANCEMENT] metrics-generator: make intrinsic dimensions configurable and disable `status_message` by default [#1960](https://github.com/grafana/tempo/pull/1960) (@stoewer)
* [ENHANCEMENT] distributor: Log span names when `distributor.log_received_spans.include_all_attributes` is on [#1790](https://github.com/grafana/tempo/pull/1790) (@suraciii)
* [ENHANCEMENT] metrics-generator: truncate label names and values exceeding a configurable length [#1897](https://github.com/grafana/tempo/pull/1897) (@kvrhdn)
* [ENHANCEMENT] Add parquet WAL [#1878](https://github.com/grafana/tempo/pull/1878) (@joe-elliott, @mdisibio)
Expand Down
24 changes: 21 additions & 3 deletions docs/tempo/website/configuration/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,10 +284,26 @@ metrics_generator:
# Buckets for the latency histogram in seconds.
[histogram_buckets: <list of float> | default = 0.002, 0.004, 0.008, 0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.02, 2.05, 4.10]
# Additional dimensions to add to the metrics along with the default dimensions
# (service, span_name, span_kind, status_code, and status_message). Dimensions are searched
# for in the resource and span attributes and are added to the metrics if present.
# Configure intrinsic dimensions to add to the metrics. Intrinsic dimensions are taken
# directly from the respective resource and span properties.
intrinsic_dimensions:
# Whether to add the name of the service the span is associated with.
[service: <bool> | default = true]
# Whether to add the name of the span.
[span_name: <bool> | default = true]
# Whether to add the span kind describing the relationship between spans.
[span_kind: <bool> | default = true]
# Whether to add the span status code.
[status_code: <bool> | default = true]
# Whether to add a status message. Important note: The span status message may
# contain arbitrary strings and thus have a very high cardinality.
[status_message: <bool> | default = false]
# Additional dimensions to add to the metrics along with the intrinsic dimensions.
# Dimensions are searched for in the resource and span attributes and are added to
# the metrics if present.
[dimensions: <list of string>]
# Registry configuration
registry:
Expand Down Expand Up @@ -1152,6 +1168,8 @@ overrides:
[metrics_generator_processor_service_graphs_histogram_buckets: <list of float>]
[metrics_generator_processor_service_graphs_dimensions: <list of string>]
[metrics_generator_processor_span_metrics_histogram_buckets: <list of float>]
# Allowed keys for intrinsic dimensions are: service, span_name, span_kind, status_code, and status_message.
[metrics_generator_processor_span_metrics_intrinsic_dimensions: <map string to bool>]
[metrics_generator_processor_span_metrics_dimensions: <list of string>]

# Maximum number of active series in the registry, per instance of the metrics-generator. A
Expand Down
64 changes: 33 additions & 31 deletions docs/tempo/website/configuration/manifest.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ go run ./cmd/tempo --storage.trace.backend=local --storage.trace.local.path=/tmp

## Complete configuration

> **Note**: This manifest was generated on 2022-09-13.
> **Note**: This manifest was generated on 2022-12-19.
```yaml
target: all
Expand All @@ -31,6 +31,8 @@ server:
grpc_listen_address: ""
grpc_listen_port: 9095
grpc_listen_conn_limit: 0
tls_cipher_suites: ""
tls_min_version: ""
http_tls_config:
cert_file: ""
key_file: ""
Expand Down Expand Up @@ -86,6 +88,8 @@ distributor:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
username: ""
password: ""
multi:
Expand All @@ -104,6 +108,7 @@ distributor:
receivers: {}
override_ring_key: distributor
log_received_traces: false
forwarders: []
extend_writes: true
search_tags_deny_list: []
ingester_client:
Expand All @@ -129,6 +134,8 @@ ingester_client:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
metrics_generator_client:
pool_config:
checkinterval: 15s
Expand All @@ -152,6 +159,8 @@ metrics_generator_client:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
querier:
search:
query_timeout: 30s
Expand All @@ -163,7 +172,6 @@ querier:
max_concurrent_queries: 5
frontend_worker:
frontend_address: 127.0.0.1:9095
scheduler_address: ""
dns_lookup_duration: 10s
parallelism: 2
match_max_concurrent: true
Expand All @@ -185,39 +193,13 @@ querier:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
query_relevant_ingesters: false
query_frontend:
log_queries_longer_than: 0s
max_body_size: 0
query_stats_enabled: false
max_outstanding_per_tenant: 100
querier_forget_delay: 0s
scheduler_address: ""
scheduler_dns_lookup_period: 0s
scheduler_worker_concurrency: 0
grpc_client_config:
max_recv_msg_size: 0
max_send_msg_size: 0
grpc_compression: ""
rate_limit: 0
rate_limit_burst: 0
backoff_on_ratelimits: false
backoff_config:
min_period: 0s
max_period: 0s
max_retries: 0
tls_enabled: false
tls_cert_path: ""
tls_key_path: ""
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
instance_interface_names: []
address: ""
port: 0
downstream_url: ""
max_retries: 2
query_shards: 20
search:
concurrent_jobs: 50
target_bytes_per_job: 10485760
Expand All @@ -227,6 +209,7 @@ query_frontend:
query_backend_after: 15m0s
query_ingesters_until: 1h0m0s
trace_by_id:
query_shards: 20
hedge_requests_at: 2s
hedge_requests_up_to: 2
compactor:
Expand All @@ -252,6 +235,8 @@ compactor:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
username: ""
password: ""
multi:
Expand Down Expand Up @@ -307,6 +292,8 @@ ingester:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
username: ""
password: ""
multi:
Expand Down Expand Up @@ -342,6 +329,7 @@ ingester:
max_block_bytes: 1073741824
complete_block_timeout: 15m0s
override_ring_key: ring
use_flatbuffer_search: false
metrics_generator:
ring:
kvstore:
Expand All @@ -365,6 +353,8 @@ metrics_generator:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
username: ""
password: ""
multi:
Expand Down Expand Up @@ -410,10 +400,17 @@ metrics_generator:
- 4.096
- 8.192
- 16.384
intrinsic_dimensions:
service: true
span_name: true
span_kind: true
status_code: true
dimensions: []
registry:
collection_interval: 15s
stale_duration: 15m0s
max_label_name_length: 1024
max_label_value_length: 2048
storage:
path: ""
wal:
Expand All @@ -437,13 +434,14 @@ storage:
blocksfilepath: /tmp/tempo/wal/blocks
encoding: snappy
search_encoding: none
version: v2
ingestion_time_range_slack: 2m0s
block:
index_downsample_bytes: 1048576
index_page_size_bytes: 256000
bloom_filter_false_positive: 0.01
bloom_filter_shard_size_bytes: 102400
version: v2
version: vParquet
encoding: zstd
search_encoding: snappy
search_page_size_bytes: 1048576
Expand Down Expand Up @@ -518,6 +516,7 @@ overrides:
max_traces_per_user: 10000
max_global_traces_per_user: 0
max_search_bytes_per_trace: 5000
forwarders: []
metrics_generator_ring_size: 0
metrics_generator_processors: null
metrics_generator_max_active_series: 0
Expand All @@ -529,6 +528,7 @@ overrides:
metrics_generator_processor_service_graphs_dimensions: []
metrics_generator_processor_span_metrics_histogram_buckets: []
metrics_generator_processor_span_metrics_dimensions: []
metrics_generator_processor_span_metrics_intrinsic_dimensions: {}
block_retention: 0s
max_bytes_per_tag_values_query: 5000000
max_search_duration: 0s
Expand Down Expand Up @@ -569,6 +569,8 @@ memberlist:
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
usage_report:
reporting_enabled: true
backoff:
Expand Down
12 changes: 10 additions & 2 deletions modules/generator/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ import (
"flag"
"time"

"github.com/pkg/errors"

"github.com/grafana/tempo/modules/generator/processor/servicegraphs"
"github.com/grafana/tempo/modules/generator/processor/spanmetrics"
"github.com/grafana/tempo/modules/generator/registry"
Expand Down Expand Up @@ -50,7 +52,7 @@ func (cfg *ProcessorConfig) RegisterFlagsAndApplyDefaults(prefix string, f *flag
}

// copyWithOverrides creates a copy of the config using values set in the overrides.
func (cfg *ProcessorConfig) copyWithOverrides(o metricsGeneratorOverrides, userID string) ProcessorConfig {
func (cfg *ProcessorConfig) copyWithOverrides(o metricsGeneratorOverrides, userID string) (ProcessorConfig, error) {
copyCfg := *cfg

if buckets := o.MetricsGeneratorProcessorServiceGraphsHistogramBuckets(userID); buckets != nil {
Expand All @@ -65,6 +67,12 @@ func (cfg *ProcessorConfig) copyWithOverrides(o metricsGeneratorOverrides, userI
if dimensions := o.MetricsGeneratorProcessorSpanMetricsDimensions(userID); dimensions != nil {
copyCfg.SpanMetrics.Dimensions = dimensions
}
if dimensions := o.MetricsGeneratorProcessorSpanMetricsIntrinsicDimensions(userID); dimensions != nil {
err := copyCfg.SpanMetrics.IntrinsicDimensions.ApplyFromMap(dimensions)
if err != nil {
return ProcessorConfig{}, errors.Wrap(err, "fail to apply overrides")
}
}

return copyCfg
return copyCfg, nil
}
32 changes: 24 additions & 8 deletions modules/generator/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"testing"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"

"github.com/grafana/tempo/modules/generator/processor/servicegraphs"
"github.com/grafana/tempo/modules/generator/processor/spanmetrics"
Expand All @@ -16,20 +17,23 @@ func TestProcessorConfig_copyWithOverrides(t *testing.T) {
Dimensions: []string{},
},
SpanMetrics: spanmetrics.Config{
HistogramBuckets: []float64{1, 2},
Dimensions: []string{"namespace"},
HistogramBuckets: []float64{1, 2},
Dimensions: []string{"namespace"},
IntrinsicDimensions: spanmetrics.IntrinsicDimensions{Service: true},
},
}

t.Run("overrides buckets and dimension", func(t *testing.T) {
o := &mockOverrides{
serviceGraphsHistogramBuckets: []float64{1, 2},
serviceGraphsDimensions: []string{"namespace"},
spanMetricsHistogramBuckets: []float64{1, 2, 3},
spanMetricsDimensions: []string{"cluster", "namespace"},
serviceGraphsHistogramBuckets: []float64{1, 2},
serviceGraphsDimensions: []string{"namespace"},
spanMetricsHistogramBuckets: []float64{1, 2, 3},
spanMetricsDimensions: []string{"cluster", "namespace"},
spanMetricsIntrinsicDimensions: map[string]bool{"status_code": true},
}

copied := original.copyWithOverrides(o, "tenant")
copied, err := original.copyWithOverrides(o, "tenant")
require.NoError(t, err)

assert.NotEqual(t, *original, copied)

Expand All @@ -38,19 +42,31 @@ func TestProcessorConfig_copyWithOverrides(t *testing.T) {
assert.Equal(t, []string{}, original.ServiceGraphs.Dimensions)
assert.Equal(t, []float64{1, 2}, original.SpanMetrics.HistogramBuckets)
assert.Equal(t, []string{"namespace"}, original.SpanMetrics.Dimensions)
assert.Equal(t, spanmetrics.IntrinsicDimensions{Service: true}, original.SpanMetrics.IntrinsicDimensions)

// assert overrides were applied
assert.Equal(t, []float64{1, 2}, copied.ServiceGraphs.HistogramBuckets)
assert.Equal(t, []string{"namespace"}, copied.ServiceGraphs.Dimensions)
assert.Equal(t, []float64{1, 2, 3}, copied.SpanMetrics.HistogramBuckets)
assert.Equal(t, []string{"cluster", "namespace"}, copied.SpanMetrics.Dimensions)
assert.Equal(t, spanmetrics.IntrinsicDimensions{Service: true, StatusCode: true}, copied.SpanMetrics.IntrinsicDimensions)
})

t.Run("empty overrides", func(t *testing.T) {
o := &mockOverrides{}

copied := original.copyWithOverrides(o, "tenant")
copied, err := original.copyWithOverrides(o, "tenant")
require.NoError(t, err)

assert.Equal(t, *original, copied)
})

t.Run("invalid overrides", func(t *testing.T) {
o := &mockOverrides{
spanMetricsIntrinsicDimensions: map[string]bool{"invalid": true},
}

_, err := original.copyWithOverrides(o, "tenant")
require.Error(t, err)
})
}
5 changes: 4 additions & 1 deletion modules/generator/instance.go
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,10 @@ func (i *instance) watchOverrides() {

func (i *instance) updateProcessors() error {
desiredProcessors := i.overrides.MetricsGeneratorProcessors(i.instanceID)
desiredCfg := i.cfg.Processor.copyWithOverrides(i.overrides, i.instanceID)
desiredCfg, err := i.cfg.Processor.copyWithOverrides(i.overrides, i.instanceID)
if err != nil {
return err
}

i.processorsMtx.RLock()
toAdd, toRemove, toReplace, err := i.diffProcessors(desiredProcessors, desiredCfg)
Expand Down
Loading

0 comments on commit 0ed9ddc

Please sign in to comment.