constant bucket operation failures #1393

caarlos0 · 2019-08-09T19:17:29Z

Thanos, Prometheus and Golang version used

Thanos 0.6.0, Prometheus v2.11.1, official docker images.

What happened

When querying, for example, the last 7 days of some metrics, it's pretty common to get alerts like the following to fire:

rate(thanos_objstore_bucket_operation_failures_total{job="thanos-store-http"}[5m]) > 0

rate(grpc_server_handled_total{grpc_code=~"Unknown|ResourceExhausted|Internal|Unavailable",job="thanos-store-http"}[5m]) > 0

On the first, its usually get_range operation that fails.

On store logs, the only thing I found was:

thanos-store-6cf74bfd65-sf8wd thanos level=warn ts=2019-08-09T03:35:00.076704568Z caller=bucket.go:296 msg="loading block failed" id=01DHT734F14R83AAG283RY0SYK err="new bucket block: load meta: download meta.json: get file: storage: object doesn't exist"

But the file is there:

λ gsutil ls -r -l gs://REDACTED/01DHT734F14R83AAG283RY0SYK

gs://REDACTED/01DHT734F14R83AAG283RY0SYK/:
  15823448  2019-08-09T03:34:59Z  gs://REDACTED/01DHT734F14R83AAG283RY0SYK/index
   1530690  2019-08-09T03:35:00Z  gs://REDACTED/01DHT734F14R83AAG283RY0SYK/index.cache.json
      1079  2019-08-09T03:35:00Z  gs://REDACTED/01DHT734F14R83AAG283RY0SYK/meta.json

gs://REDACTED/01DHT734F14R83AAG283RY0SYK/chunks/:
  65074532  2019-08-09T03:34:59Z  gs://REDACTED/01DHT734F14R83AAG283RY0SYK/chunks/000001
TOTAL: 4 objects, 82429749 bytes (78.61 MiB)

What you expected to happen

Not sure... I guess more detailed logs?

How to reproduce it (as minimally and precisely as possible):

I'm not sure what causes it.

I'm thinking that maybe its the same corrupt upload issue reported on other issues.

Full logs to relevant components

Anything else we need to know

The text was updated successfully, but these errors were encountered:

bwplotka · 2019-08-10T10:32:36Z

Are you maybe hitting: #564 ? (:

caarlos0 · 2019-08-10T13:37:10Z

hmmm... maybe, any workarounds?

bwplotka · 2019-10-31T12:50:41Z

Store gateway handles now partial uploads correctly (:

jojohappy added component: compact and removed component: compact labels Aug 12, 2019

bwplotka closed this as completed Oct 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

constant bucket operation failures #1393

constant bucket operation failures #1393

caarlos0 commented Aug 9, 2019 •

edited

Loading

bwplotka commented Aug 10, 2019

caarlos0 commented Aug 10, 2019

bwplotka commented Oct 31, 2019

constant bucket operation failures #1393

constant bucket operation failures #1393

Comments

caarlos0 commented Aug 9, 2019 • edited Loading

bwplotka commented Aug 10, 2019

caarlos0 commented Aug 10, 2019

bwplotka commented Oct 31, 2019

caarlos0 commented Aug 9, 2019 •

edited

Loading