store: take initialization out to the background goroutine #538

xjewer · 2018-09-26T21:41:59Z

Initialization is a lingering process and can take minutes to sync meta and index files. Move it to background goroutine to allow metrics/debug handlers be ready right after application is start. It provides admins set up liveness and readindess probes at the k8s.

Recommendations: use http port for liveness and grpc for readyness

Fixes #532

@bwplotka

Verification

Run thanos store and check for thanos-store-host:port/metrics uri is giving you metrics

Initialization is a lingering process and can take minutes to sync meta and index files. Move it to background goroutine to allow metrics/debug handlers be ready right after application is start. It provides admins set up liveness and readindess probes at the k8s. Recommendations: use http port for liveness and grpc for readyness Fixed thanos-io#532

bwplotka · 2018-09-27T11:44:55Z

But you are aware that by this PR allows to compactor being completely NOT ready to work when you respond 200 on readiness probe? (:

I guess this is ok for compactor, since it is not a web service - it just starting it's job to operata on storage.. so maybe it's fine.

bwplotka

This is quite wrong.

The title states about compact but it is actually store gateway. You confused me ;p

This is wrong, as store gateway is NOT ready to take request before initial request, so that will break any rolling update in kubernetes, so we cannot mark it ready in readiness probe.

xjewer · 2018-09-27T12:03:42Z

The title states about compact but it is actually store gateway. You confused me ;p

ahh, the deep night work affects ...

This is wrong, as store gateway is NOT ready to take request before initial request, so that will break any rolling update in kubernetes, so we cannot mark it ready in readiness probe.

don't understand you, what's wrong? Rolling updates don't work properly. You can see here https://github.com/improbable-eng/thanos/pull/538/files#diff-0f89c5ba9c5a93c3de701595df39ecdaR125 that grpc will be ready as soon as initialization is done, so the same behaviour as before, the only difference here, is to start http server to serve metrics request immediately after store is run to allow k8s use them as liveness endpoint.

bwplotka · 2018-09-27T13:59:49Z

is to start http server to serve metrics request immediately after store is run to allow k8s use them as liveness endpoint.

Yes, and exactly this is the point.

The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

(https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/)

Currently the /metrics page is recommended way to test for readiness. And we cannot be ready before this init step. Unless we change internal logic and enabling StoreAPI without the cached filled (store not being aware of all blocks), but this is wrong, as the multi replica rollout would be broken:

imagine 2 store gateways. And you want to update thanos version, so do rollout
k8s deletes one pod, creates new one, and since we immdiately start metrics and readiness returns 200, kubernetes redirects traffic and removes the second replica and creates new version
for some time you cannot query most of old metrics.

Of course for 1 replica it does not matter (:

So options here:
a. Leave it like it is right now. No metrics before pod being ready, but I think Kubernetes SD will not return not ready pods anyway
b. For single replica we could allow traffic before init sync, but it is a special case for 1 replica that has other problems on its own.

I would rather focus on making init faster not removing init itself (: Thoughts?

xjewer · 2018-09-27T14:06:51Z

May be this was unclear, but my suggestion is to use GRPC port as a readiness endpoint, and only when initialization is done, grpc will start and k8s will know it's ready.

Having readiness on http for StoreAPI is useless, IMHO

bwplotka · 2018-09-27T14:14:04Z

Yea, sorry my bad misunderstood that part maybe.

Sure, but what is the goal/use case here? (: Do you want to have metric endpoint earlier? Why? Adhoc debug? Prometheus will not scrape this one anyway (am I wrong?)

Is it is just for clarity?

xjewer · 2018-09-27T14:31:53Z

I use /metics as a liveness probe, as I mentioned earlier ;) It's also handful to see, what's happening right during the initialization

Recommendations: use http port for liveness and grpc for readyness

xjewer · 2018-09-27T16:03:54Z

Regarding to having both probes, from the link above:

As you can see, configuration for a TCP check is quite similar to an HTTP check. This example uses both readiness and liveness probes. The kubelet will send the first readiness probe 5 seconds after the container starts. This will attempt to connect to the goproxy container on port 8080. If the probe succeeds, the pod will be marked as ready. The kubelet will continue to run this check every 10 seconds.

In addition to the readiness probe, this configuration includes a liveness probe. The kubelet will run the first liveness probe 15 seconds after the container starts. Just like the readiness probe, this will attempt to connect to the goproxy container on port 8080. If the liveness probe fails, the container will be restarted.

So the main point to use both is to be sure your container is started and do some job to be ready. I.e. if you forgot specify proper network policy (ingress in that case), you will know while running, that k8s doesn't see the container is alive (still not ready).

Liveness controls the restart, readiness - the traffic proxy

abursavich · 2018-11-07T22:28:25Z

I was about to submit a similar PR and I found this. Can we get some movement here?

I use tcpProbes on the http port as liveness checks on all thanos components. I didn't notice the problem with store until my bucket grew sufficiently large and the stores started crash looping (failed readiness checks) after being rescheduled. It also causes Prometheus to think the job is down.

abursavich · 2018-11-07T22:32:20Z

cmd/thanos/store.go

 			defer runutil.CloseWithLogOnErr(logger, bkt, "bucket client")

+			// Ready to start Grpc server.
+			readyToServe <- struct{}{}


It's usually better to just close the channel. That way you won't have to know how many other goroutines depend on being ready (and need to send a signal for each).

If you change it, you can drop the channel buffer too.

abursavich · 2018-11-07T22:35:47Z

cmd/thanos/store.go

@@ -148,6 +152,7 @@ func runStore(
 		storepb.RegisterStoreServer(s, bs)

 		g.Add(func() error {
+			<-readyToServe


If the initial sync fails, this will block forever.

select { case <-readyToServe: case <-ctx.Done(): return ctx.Err() }

bwplotka · 2018-12-03T13:08:49Z

@abursavich

I didn't notice the problem with store until my bucket grew sufficiently large and the stores started crash looping (failed readiness checks) after being rescheduled. It also causes Prometheus to think the job is down.

I think it is just a matter of increase timeout on failed readiness probes -> Store needs some time for initialization

abursavich · 2019-04-08T20:49:12Z

@bwplotka

I think it is just a matter of increase timeout on failed readiness probes -> Store needs some time for initialization

The liveness and readiness probes serve different purposes. If a container passes liveness, it will not be restarted. If a container passes readiness, it will be included in service endpoints.

The time it takes for the store to become ready is a function of the size of its dataset. Requiring users to manually tune their liveness probes based on this moving target isn't a good solution. It should pass liveness tests as soon as possible, whether or not it's ready to serve queries.

FUSAKLA · 2019-04-14T22:58:38Z

I'm with @abursavich on this one.
You should not be forced to tune the liveness timeout or initial back-off would be better here depending on the amount of data. So plus one from me to liveness true right from the start and readiness after the initialization finishes.

Also linking this #656

bwplotka · 2019-04-15T12:06:00Z

Thanks @abursavich that makes sense - sure, we can set liveness

xjewer · 2019-04-17T13:50:57Z

should we close this PR then?

bwplotka requested changes Sep 27, 2018

View reviewed changes

xjewer changed the title ~~compact: take initialization out to the background goroutine~~ store: take initialization out to the background goroutine Sep 27, 2018

abursavich suggested changes Nov 7, 2018

View reviewed changes

GiedriusS added the component: store label Mar 30, 2019

FUSAKLA mentioned this pull request Apr 14, 2019

WIP: Added /-/healthy and /-/ready endpoints to all thanos components #656

Closed

xjewer closed this May 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store: take initialization out to the background goroutine #538

store: take initialization out to the background goroutine #538

xjewer commented Sep 26, 2018 •

edited

Loading

bwplotka commented Sep 27, 2018

bwplotka left a comment

xjewer commented Sep 27, 2018

bwplotka commented Sep 27, 2018

xjewer commented Sep 27, 2018

bwplotka commented Sep 27, 2018 •

edited

Loading

xjewer commented Sep 27, 2018 •

edited

Loading

xjewer commented Sep 27, 2018

abursavich commented Nov 7, 2018

abursavich Nov 7, 2018

abursavich Nov 7, 2018 •

edited

Loading

bwplotka commented Dec 3, 2018

abursavich commented Apr 8, 2019

FUSAKLA commented Apr 14, 2019

bwplotka commented Apr 15, 2019

xjewer commented Apr 17, 2019

store: take initialization out to the background goroutine #538

store: take initialization out to the background goroutine #538

Conversation

xjewer commented Sep 26, 2018 • edited Loading

Verification

bwplotka commented Sep 27, 2018

bwplotka left a comment

Choose a reason for hiding this comment

xjewer commented Sep 27, 2018

bwplotka commented Sep 27, 2018

xjewer commented Sep 27, 2018

bwplotka commented Sep 27, 2018 • edited Loading

xjewer commented Sep 27, 2018 • edited Loading

xjewer commented Sep 27, 2018

abursavich commented Nov 7, 2018

abursavich Nov 7, 2018

Choose a reason for hiding this comment

abursavich Nov 7, 2018 • edited Loading

Choose a reason for hiding this comment

bwplotka commented Dec 3, 2018

abursavich commented Apr 8, 2019

FUSAKLA commented Apr 14, 2019

bwplotka commented Apr 15, 2019

xjewer commented Apr 17, 2019

xjewer commented Sep 26, 2018 •

edited

Loading

bwplotka commented Sep 27, 2018 •

edited

Loading

xjewer commented Sep 27, 2018 •

edited

Loading

abursavich Nov 7, 2018 •

edited

Loading