Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thanos/Prometheus Receiver unable to process "Error on series with out-of-order labels" #11931

Closed
Moep90 opened this issue Oct 4, 2022 · 4 comments
Labels
bug unexpected problem or unintended behavior

Comments

@Moep90
Copy link

Moep90 commented Oct 4, 2022

Relevant telegraf.conf

# Telegraf config

[global_tags]
  hostname         = "myhostname"
  host_ip          = "__ip__"
  host_network     = "__ip__"
  os               = "debian"
  os_major         = "11"
  telegraf_version = "1.24"

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "1s"
  logfile = "/dev/null"
  omit_hostname = false

[[outputs.http]]
  url = "https://thanos-dev-receive.example.com/api/v1/receive"
  non_retryable_statuscodes = [409, 413]
  use_batch_format = false <----- enabled/disabled doesnt matter
  data_format = "prometheusremotewrite"
  [outputs.http.headers]
    Content-Type = "application/x-protobuf"
    Content-Encoding = "snappy"
    X-Prometheus-Remote-Write-Version = "0.1.0"

# -----------------------------------------------
# INPUTS
# -----------------------------------------------
[[inputs.bcache]]
  bcachePath = "/sys/fs/bcache"
[[inputs.bond]]
[[inputs.conntrack]]
  dirs = ["/proc/sys/net/netfilter"]
[[inputs.cpu]]
[[inputs.diskio]]
  devices = ["sd*", "vd*"]
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.ipvs]]
[[inputs.processes]]
[[inputs.mdstat]]
[[inputs.mem]]
[[inputs.net]]
[[inputs.netstat]]
[[inputs.nfsclient]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.ntpq]]
  options = "-p"
[[inputs.kernel_vmstat]]

Logs from Telegraf

# Telegraf start

2022-09-30T13:37:45Z I! Loaded inputs: bcache bond conntrack cpu disk diskio ipvs kernel_vmstat mdstat mem net netstat nfsclient ntpq processes swap system
2022-09-30T13:37:45Z I! Loaded aggregators: 
2022-09-30T13:37:45Z I! Loaded processors: 
2022-09-30T13:37:45Z I! Loaded outputs: http


### System info

telegraf 1.24.1-1, Debian 10 + 11

### Docker

# Thanos Image
`image: docker.io/bitnami/thanos:0.28.0-scratch-r0`

### Steps to reproduce

1. Deploy bitnami Thanos with receiver component
2. Point telegraf with (config above) to the receiver instance
3. Start Telegraf follow logs of the thanos-receiver


### Expected behavior

Thanos should send its metrics to Prometheus/Thanos where they get processed like they usually do.

### Actual behavior

The thanos receiver throws errors like below:

# Thanos receiver logs
```json
{
  "caller": "writer.go:163",
  "component": "receive-writer",
  "level": "warn",
  "msg": "Error on series with out-of-order labels",
  "numDropped": 825,
  "tenant": "default-tenant",
  "ts": "2022-09-30T13:19:56.282263846Z"
}

Additional info

This issue should have been solved by #9365 AFAIK but it obvsl doesn't.
I can't really help myself why this happens.
When writing metrics to file etc everything looks good.

@Moep90 Moep90 added the bug unexpected problem or unintended behavior label Oct 4, 2022
@Moep90
Copy link
Author

Moep90 commented Oct 4, 2022

Update:
Reproducable with Telegraf 1.24.2-1

@Moep90
Copy link
Author

Moep90 commented Oct 8, 2022

Any Suggestion what to look at?

@powersj
Copy link
Contributor

powersj commented Oct 11, 2022

Hi,

Can you narrow down what input we can use to reproduce this? Once you have that can you enable the outputs.file output with the 'prometheus' data_format and see what gets printed to stdout?

I'm curious if one of these inputs is creating multiple metrics named the same maybe?

Thanks!

@powersj powersj added the waiting for response waiting for response from contributor label Oct 11, 2022
@Moep90
Copy link
Author

Moep90 commented Oct 12, 2022

Ok I guess I found the issue.

Multiple things came together here...

  1. I had an client with an older configuration running.... which produced the error in Thanos-Receiver
  2. I forgot/wasn't aware that thanos-receiver had to be added to the Thanos-Query endpoints.

After applying both it worked as expected
image

@Moep90 Moep90 closed this as completed Oct 12, 2022
@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants