-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Galley performance tuning with k8s api #8338
Comments
@irisdingbj Can you please take a look at the metrics from k8s api-server and see if we should be using a more recent version of k8s client-go . The informers should only be triggered when something changes. We should ensure that we are using informers properly and only watch apis that are absolutely necessary. Given the comment about autoscaling not working, kuat's suggestion about a possible non-informer call is quite possible |
/cc |
@cmluciano will take a look at this |
Galley is using the normal informer stuff to watch the related istio crd resources. |
@ayj see above. I think galley is safe for the k8s api server. Any comments? |
The issue with mixer was that it was watching too many CRD “kinds”. We have moved Mixer to a model that uses far fewer crds, so the load will reduce. If galley uses the same mechanism where every Kind is its own channel to api server it will face the same problem with the total number of crds in the system. We have to actually measure the load. |
Need to work on this for galley part as well since we will use separate watchers for every kinds of istio crds. |
@ayj I believe there was a naked call loop in the validator webhook sometime ago, that you were planning to replace with a watcher. Did you get a chance to do that? |
Not yet. The validation webhook currently polls a specific resource instance (not collection) every ~5 seconds. #6451 is the tracking issue for the injector and validation webhooks. |
Here is the situation, to the best of my understanding. Please keep me honest. Taking a 1 Mixer, 1 Pilot, 1 Galley setup as a baseline:
Apart from #6451, I don't see any low-hanging fruits/obvious fixes that we can apply to Galley code. (If I am missing something, please speak up). Assuming Galley has good fanout, introducing Galley based config distribution should improve our perf footprint on the host system: On the old case, scaling Mixer and Pilot would end up linearly increasing the load on the API server. With the new model, the load will be on Galley, and API Server load should stay constant (i.e. assuming # of Galleys staying constant). This doesn't mean we won't run into any issues like Azure/AKS#620, but it means introducing Galley (along with reducing Mixer CRDs) should help. If we run into such issues, then we will need to fix it within the scope of Galley. Based on this, to tune Galley performance, it is probably beneficial to understand:
|
cc @Nino-K who was also looking at scaling MCP client/server for CF use cases. |
@ayj You've been looking at Galley performance for 1.1. Any blockers for 1.1 release? Can you share some numbers? |
I think we can close this for now. The original issue was motivated by Mixer opening unnecessary duplicate CRD watches which was exacerbated by HPA. Galley only create one watch per CRD and does not have the same scaling characteristics as Mixer. We can open follow-up issues if necessary when specific problems arise. |
chat history:
The text was updated successfully, but these errors were encountered: