Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store: take initialization out to the background goroutine #538

Closed
wants to merge 1 commit into from

Conversation

xjewer
Copy link
Contributor

@xjewer xjewer commented Sep 26, 2018

Initialization is a lingering process and can take minutes to sync meta and index files. Move it to background goroutine to allow metrics/debug handlers be ready right after application is start. It provides admins set up liveness and readindess probes at the k8s.

Recommendations: use http port for liveness and grpc for readyness

Fixes #532

@bwplotka

Verification

Run thanos store and check for thanos-store-host:port/metrics uri is giving you metrics

Initialization is a lingering process and can take minutes to sync meta and index files. Move it to background goroutine to allow metrics/debug handlers be ready right after application is start. It provides admins set up liveness and readindess probes at the k8s.

Recommendations: use http port for liveness and grpc for readyness

Fixed thanos-io#532
@bwplotka
Copy link
Member

But you are aware that by this PR allows to compactor being completely NOT ready to work when you respond 200 on readiness probe? (:

I guess this is ok for compactor, since it is not a web service - it just starting it's job to operata on storage.. so maybe it's fine.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite wrong.

The title states about compact but it is actually store gateway. You confused me ;p

This is wrong, as store gateway is NOT ready to take request before initial request, so that will break any rolling update in kubernetes, so we cannot mark it ready in readiness probe.

@xjewer xjewer changed the title compact: take initialization out to the background goroutine store: take initialization out to the background goroutine Sep 27, 2018
@xjewer
Copy link
Contributor Author

xjewer commented Sep 27, 2018

The title states about compact but it is actually store gateway. You confused me ;p

ahh, the deep night work affects ...

This is wrong, as store gateway is NOT ready to take request before initial request, so that will break any rolling update in kubernetes, so we cannot mark it ready in readiness probe.

don't understand you, what's wrong? Rolling updates don't work properly. You can see here https://github.com/improbable-eng/thanos/pull/538/files#diff-0f89c5ba9c5a93c3de701595df39ecdaR125 that grpc will be ready as soon as initialization is done, so the same behaviour as before, the only difference here, is to start http server to serve metrics request immediately after store is run to allow k8s use them as liveness endpoint.

@bwplotka
Copy link
Member

is to start http server to serve metrics request immediately after store is run to allow k8s use them as liveness endpoint.

Yes, and exactly this is the point.

The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

(https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/)

Currently the /metrics page is recommended way to test for readiness. And we cannot be ready before this init step. Unless we change internal logic and enabling StoreAPI without the cached filled (store not being aware of all blocks), but this is wrong, as the multi replica rollout would be broken:

  • imagine 2 store gateways. And you want to update thanos version, so do rollout
  • k8s deletes one pod, creates new one, and since we immdiately start metrics and readiness returns 200, kubernetes redirects traffic and removes the second replica and creates new version
  • for some time you cannot query most of old metrics.

Of course for 1 replica it does not matter (:

So options here:
a. Leave it like it is right now. No metrics before pod being ready, but I think Kubernetes SD will not return not ready pods anyway
b. For single replica we could allow traffic before init sync, but it is a special case for 1 replica that has other problems on its own.

I would rather focus on making init faster not removing init itself (: Thoughts?

@xjewer
Copy link
Contributor Author

xjewer commented Sep 27, 2018

May be this was unclear, but my suggestion is to use GRPC port as a readiness endpoint, and only when initialization is done, grpc will start and k8s will know it's ready.

Having readiness on http for StoreAPI is useless, IMHO

@bwplotka
Copy link
Member

bwplotka commented Sep 27, 2018

Yea, sorry my bad misunderstood that part maybe.

Sure, but what is the goal/use case here? (: Do you want to have metric endpoint earlier? Why? Adhoc debug? Prometheus will not scrape this one anyway (am I wrong?)

Is it is just for clarity?

@xjewer
Copy link
Contributor Author

xjewer commented Sep 27, 2018

I use /metics as a liveness probe, as I mentioned earlier ;) It's also handful to see, what's happening right during the initialization

Recommendations: use http port for liveness and grpc for readyness

@xjewer
Copy link
Contributor Author

xjewer commented Sep 27, 2018

Regarding to having both probes, from the link above:

As you can see, configuration for a TCP check is quite similar to an HTTP check. This example uses both readiness and liveness probes. The kubelet will send the first readiness probe 5 seconds after the container starts. This will attempt to connect to the goproxy container on port 8080. If the probe succeeds, the pod will be marked as ready. The kubelet will continue to run this check every 10 seconds.

In addition to the readiness probe, this configuration includes a liveness probe. The kubelet will run the first liveness probe 15 seconds after the container starts. Just like the readiness probe, this will attempt to connect to the goproxy container on port 8080. If the liveness probe fails, the container will be restarted.

So the main point to use both is to be sure your container is started and do some job to be ready. I.e. if you forgot specify proper network policy (ingress in that case), you will know while running, that k8s doesn't see the container is alive (still not ready).

Liveness controls the restart, readiness - the traffic proxy

@abursavich
Copy link
Contributor

I was about to submit a similar PR and I found this. Can we get some movement here?

I use tcpProbes on the http port as liveness checks on all thanos components. I didn't notice the problem with store until my bucket grew sufficiently large and the stores started crash looping (failed readiness checks) after being rescheduled. It also causes Prometheus to think the job is down.

defer runutil.CloseWithLogOnErr(logger, bkt, "bucket client")

// Ready to start Grpc server.
readyToServe <- struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually better to just close the channel. That way you won't have to know how many other goroutines depend on being ready (and need to send a signal for each).

If you change it, you can drop the channel buffer too.

@@ -148,6 +152,7 @@ func runStore(
storepb.RegisterStoreServer(s, bs)

g.Add(func() error {
<-readyToServe
Copy link
Contributor

@abursavich abursavich Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the initial sync fails, this will block forever.

select {
case <-readyToServe:
case <-ctx.Done():
	return ctx.Err()
}

@bwplotka
Copy link
Member

bwplotka commented Dec 3, 2018

@abursavich

I didn't notice the problem with store until my bucket grew sufficiently large and the stores started crash looping (failed readiness checks) after being rescheduled. It also causes Prometheus to think the job is down.

I think it is just a matter of increase timeout on failed readiness probes -> Store needs some time for initialization

@abursavich
Copy link
Contributor

@bwplotka

I think it is just a matter of increase timeout on failed readiness probes -> Store needs some time for initialization

The liveness and readiness probes serve different purposes. If a container passes liveness, it will not be restarted. If a container passes readiness, it will be included in service endpoints.

The time it takes for the store to become ready is a function of the size of its dataset. Requiring users to manually tune their liveness probes based on this moving target isn't a good solution. It should pass liveness tests as soon as possible, whether or not it's ready to serve queries.

@FUSAKLA
Copy link
Member

FUSAKLA commented Apr 14, 2019

I'm with @abursavich on this one.
You should not be forced to tune the liveness timeout or initial back-off would be better here depending on the amount of data. So plus one from me to liveness true right from the start and readiness after the initialization finishes.

Also linking this #656

@bwplotka
Copy link
Member

Thanks @abursavich that makes sense - sure, we can set liveness

@xjewer
Copy link
Contributor Author

xjewer commented Apr 17, 2019

should we close this PR then?

@xjewer xjewer closed this May 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store: initialisation time
5 participants