Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

constant bucket operation failures #1393

Closed
caarlos0 opened this issue Aug 9, 2019 · 3 comments
Closed

constant bucket operation failures #1393

caarlos0 opened this issue Aug 9, 2019 · 3 comments

Comments

@caarlos0
Copy link

caarlos0 commented Aug 9, 2019

Thanos, Prometheus and Golang version used

Thanos 0.6.0, Prometheus v2.11.1, official docker images.

What happened

When querying, for example, the last 7 days of some metrics, it's pretty common to get alerts like the following to fire:

rate(thanos_objstore_bucket_operation_failures_total{job="thanos-store-http"}[5m]) > 0

rate(grpc_server_handled_total{grpc_code=~"Unknown|ResourceExhausted|Internal|Unavailable",job="thanos-store-http"}[5m]) > 0

On the first, its usually get_range operation that fails.

On store logs, the only thing I found was:

thanos-store-6cf74bfd65-sf8wd thanos level=warn ts=2019-08-09T03:35:00.076704568Z caller=bucket.go:296 msg="loading block failed" id=01DHT734F14R83AAG283RY0SYK err="new bucket block: load meta: download meta.json: get file: storage: object doesn't exist"

But the file is there:

λ gsutil ls -r -l gs://REDACTED/01DHT734F14R83AAG283RY0SYK

gs://REDACTED/01DHT734F14R83AAG283RY0SYK/:
  15823448  2019-08-09T03:34:59Z  gs://REDACTED/01DHT734F14R83AAG283RY0SYK/index
   1530690  2019-08-09T03:35:00Z  gs://REDACTED/01DHT734F14R83AAG283RY0SYK/index.cache.json
      1079  2019-08-09T03:35:00Z  gs://REDACTED/01DHT734F14R83AAG283RY0SYK/meta.json

gs://REDACTED/01DHT734F14R83AAG283RY0SYK/chunks/:
  65074532  2019-08-09T03:34:59Z  gs://REDACTED/01DHT734F14R83AAG283RY0SYK/chunks/000001
TOTAL: 4 objects, 82429749 bytes (78.61 MiB)

What you expected to happen

Not sure... I guess more detailed logs?

How to reproduce it (as minimally and precisely as possible):

I'm not sure what causes it.

I'm thinking that maybe its the same corrupt upload issue reported on other issues.

Full logs to relevant components

Anything else we need to know

@bwplotka
Copy link
Member

Are you maybe hitting: #564 ? (:

@caarlos0
Copy link
Author

hmmm... maybe, any workarounds?

@bwplotka
Copy link
Member

Store gateway handles now partial uploads correctly (:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants