Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/tailsamplingprocessor] config allows duplicate policy names resulting in metrics collision #26726

Closed
jmsnll opened this issue Sep 18, 2023 · 6 comments
Labels
bug Something isn't working good first issue Good for newcomers priority:p2 Medium processor/tailsampling Tail sampling processor

Comments

@jmsnll
Copy link
Contributor

jmsnll commented Sep 18, 2023

Component(s)

processor/tailsampling

What happened?

Description

As noted in: #25882 (comment)

The name of each sampling policy is attached to the count_traces_sampled metric as an attribute which records how many traces each policy is responsible for sampling.

If a user provides two policies with identical names then a collision occurs and the counts for the two policies are combined and reported under one metric.

Steps to Reproduce

Given any two or more policies with identical names:

    policies:
      - name: policy-a
        type: status_code
        status_code:
          status_codes:
            - ERROR
      - name: policy-a
        type: probabilistic
        probabilistic:
          sampling_percentage: 25

The count_traces_sampled metric will report a combined count of both policies:

otelcol_processor_tail_sampling_count_traces_sampled{policy="policy-a",sampled="false"} 249
otelcol_processor_tail_sampling_count_traces_sampled{policy="policy-a",sampled="true"} 151

Suggested fix

func (c *Config) Validate() error {
	policyNames := make(map[string]bool)

	// checks each policy is uniquely named to ensure accuracy of `count_traces_sampled` metric
	// sub policies contained within `and` or `composite` policies do not need to be checked
	for _, p := range c.PolicyCfgs {
		if _, exists := policyNames[p.Name]; exists {
			return fmt.Errorf("sampling policies must have unique names")
		}
		policyNames[p.Name] = true
	}

	return nil
}

Expected Result

Config validation checks each sampling policy's name is unique and returns an error, preventing the collector from starting.

Actual Result

Collector starts.

Collector version

v0.85.0

Environment information

Environment

OS: Mac OS Ventura 13.5.2 (22G91)
Compiler(if manually compiled): go version go1.21.1 darwin/arm64

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
      http:
processors:
  tail_sampling:
    decision_wait: 5s
    policies:
      - name: policy-a
        type: status_code
        status_code:
          status_codes:
            - ERROR
      - name: policy-a
        type: probabilistic
        probabilistic:
          sampling_percentage: 25
exporters:
  logging:
    verbosity: detailed
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling]
      exporters: [logging]

Log output

No response

Additional context

No response

@jmsnll jmsnll added bug Something isn't working needs triage New item requiring triage labels Sep 18, 2023
@github-actions github-actions bot added the processor/tailsampling Tail sampling processor label Sep 18, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jmsnll
Copy link
Contributor Author

jmsnll commented Sep 18, 2023

Although the sub-policies in composite policies aren't used for the count_traces_sampled metric, I'm realising the names still do need to be unique due to how rate allocations are assigned.

@bryan-aguilar
Copy link
Contributor

Enforcing policy names to be unique sounds like a reasonable solution to this. I think we would need to gradually roll this out with a feature gate because it could be a breaking change for existing configurations.

@bryan-aguilar bryan-aguilar added good first issue Good for newcomers priority:p2 Medium and removed needs triage New item requiring triage labels Sep 18, 2023
@bryan-aguilar
Copy link
Contributor

@jpkrohling what do you think? Could there be another option that is less impactful to existing configurations?

@jmsnll
Copy link
Contributor Author

jmsnll commented Sep 18, 2023

At the time, the only alternative solution I could think of was to append a suffix to duplicate policy names (determined by the policies position in the list) when the configuration is initially loaded into memory. So for the following:

processors:
    policies:
      - name: colliding-policy
        ...
      - name: rate-limit
        ...
      - name: error-span-filter
        ...
      - name: colliding-policy
        ...

You could choose from either of these approaches:

  1. Auto-incrementing: The first occurrence of colliding-policy would become colliding-policy-1, the second colliding-policy-2, and so on.

  2. Policy Index: Alternatively, you could use the policy's index instead of tracking a specific count for each name. For example, colliding-policy-0 for the first occurrence and colliding-policy-3 for the second.

Existing configurations would still work, although with a reduced emphasis on correctness. One potential drawback I can see to this approach is that if users were to add yet another policy with the same name to an existing configuration, the exported names of policies could potentially change depending on where they position the new policy in the list. But I'd hope at that stage they would fix the issue of having duplicate names.

@jmsnll
Copy link
Contributor Author

jmsnll commented Sep 20, 2023

Marking as closed as covered by #27016

@jmsnll jmsnll closed this as completed Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers priority:p2 Medium processor/tailsampling Tail sampling processor
Projects
None yet
Development

No branches or pull requests

2 participants