-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pagination support to clients #532
Comments
/kind feature sure, sounds good |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
make webhook scaffolding work with old project
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/lifecycle frozen |
/help We'll need to provide a new method called something like |
@vincepri: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi all, I'm writing with regard to the issue (linked above) that would benefit from an introduction of pagination as a configurable option to client's implementation (that's where I understand this is needed - if I'm mistaken please correct me). I can see that there hasn't been too much traction on this issue and I'm willing to help if there's a will to move forward with this. I'm kinda new to controller runtime so any tips on where the actual change needs to happen are welcome. |
Hi @pmalek , I looked through the issue you mentioned. IMHO it is different from this issue. This issue, like @vincepri said, is going to add a new method called something like However, the Kong/kubernetes-ingress-controller#2382 you mentioned, is using controller-runtime cache, which actually calls client-go informer to list and watch objects. Also I don't think it can be solved by page list, for now reflector already uses |
Hi @FillZpp 👋 Thanks for your response. So what you're saying is that even if we'd like to make a patch that would introduce pagination to the initial
By this you mean that there's a plan to change the implementation sometime soon? |
@pmalek-sumo Aha, what I'm saying are two points:
|
Other than changing the namespaces that are watched, do you have any suggestions how we can get this initial listing of objects faster, or broken up so that it doesn't trip the api server limits? |
@pmalek-sumo @shaneutt I read your issue again, not sure what actually caused the If it is caused by apiserver memory limit, you may try to page list with the client first and see if it could solve your problem (even if it get objects from etcd). secretList := v1.SecretList{}
// 1. list with resourceVersion="0", so that get all objects from apiserver cache
cli.List(context.TODO(), &secretList, &client.ListOptions{Raw: &metav1.ListOptions{ResourceVersion: "0"}})
// 2. list with limit, so that get limited objects from etcd
cli.List(context.TODO(), &secretList, &client.ListOptions{Limit: 500}) If this works for you, I would like to help you gays find out how to enable initial list with page limit in client-go and controller-runtime :) |
Thanks @FillZpp.
We have not, yet.
The memory consumption might be also an issue since the OP in Kong/kubernetes-ingress-controller#2382 did mention OOMs but the general consensus was that the culprit most likely is the long processing time and hence the timeout.
Do you suggest then that we basically create a client from |
The quit of your controller might be caused by the cache sync timeout, but that was caused by the list request error from apiserver. BTW, have you ever tried to set the CacheSyncTimeout to be longer? But I'm afraid the timeout will always be there if we don't solve the list problem.
Yeah, you could reproduce the initial list of informer with client-go or controller-runtime client (on a cluster where there's enough of them to cause an issue), which can help us figure out whether page list can solve the problem or not. I give the simple example with controller-runtime client, but it's ok to use client-go. Besides, you'd better use loop to page list all object in batches like this: // 2. list with limit, so that get limited objects from etcd
startTime := time.Now()
opts := &client.ListOptions{Limit: 500}
for {
err = cli.List(context.TODO(), &secretList, opts)
if err != nil {
return err
}
klog.Infof("Get objects number: %v", len(secretList.Items))
opts.Continue = secretList.Continue
// there is no more objects
if secretList.Continue == "" {
break
}
}
klog.Infof("Time cost for page list: %v", time.Since(startTime)) |
Alright so I was able to prepare some script to create/delete secrets and then list them with controller-runtime's client (basically what @FillZpp proposed). Listing secrets with 256 secrets in the namespace, each with 1 payload 1MB in size I got the following results: $ go build -o main && ./main
I0530 14:45:10.203416 50289 main.go:29] Starting to list secrets
I0530 14:45:22.895777 50289 main.go:43] Get objects number: 256
I0530 14:45:22.895815 50289 main.go:51] Time cost for page list: 12.692023625s
Execution time: 0h:00m:14s sec
# pagination using limit=100
$ go build -o main && ./main
I0530 14:45:29.759525 50318 main.go:29] Starting to list secrets
I0530 14:45:34.363228 50318 main.go:43] Get objects number: 100
I0530 14:45:38.881193 50318 main.go:43] Get objects number: 100
I0530 14:45:41.502130 50318 main.go:43] Get objects number: 56
I0530 14:45:41.502155 50318 main.go:51] Time cost for page list: 11.742369583s Using 350 secrets: # pagination using limit=100
$ go build -o main && ./main
I0530 14:59:05.627028 50837 main.go:29] Starting to list secrets
I0530 14:59:21.869458 50837 main.go:43] Get objects number: 100
I0530 14:59:31.968267 50837 main.go:43] Get objects number: 100
I0530 14:59:38.376587 50837 main.go:43] Get objects number: 100
I0530 14:59:41.561738 50837 main.go:43] Get objects number: 50
I0530 14:59:41.561762 50837 main.go:51] Time cost for page list: 35.934283084s
Execution time: 0h:00m:36s sec
# no pagination
$ go build -o main && ./main
I0530 14:59:54.457833 50929 main.go:29] Starting to list secrets
I0530 15:00:25.470387 50929 main.go:43] Get objects number: 350
I0530 15:00:25.470412 50929 main.go:51] Time cost for page list: 31.012180292s Secrets can be created as follows (using the secret creation script): ./main -secret-count 256 -payload-size $((1024*1024)) What I've managed to observe as well is that when I've created a touch more secrets (e.g. 350 or 500) secrets (also putting 1MB of payload to each one) my kind cluster becomes unresponsive eating up a lot of resources. It seems that ...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10101 root 20 0 5383416 4.2g 32752 S 143.0 43.0 2:30.45 kube-apiserver
10107 root 20 0 13.9g 3.4g 194900 S 3.0 34.9 0:33.83 etcd
10091 root 20 0 2186904 1.1g 16812 S 3.3 11.4 0:19.30 kube-controller Secret creating script: package main
import (
"context"
"errors"
"flag"
"fmt"
"github.com/gammazero/workerpool"
corev1 "k8s.io/api/core/v1"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/klog/v2"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/client/config"
)
func main() {
var (
secretCount int
payloadSize int
namespace string
clean bool
)
flag.IntVar(&secretCount, "secret-count", 1, "number of secrets to generate")
flag.IntVar(&payloadSize, "payload-size", 128, "payload size in bytes")
flag.StringVar(&namespace, "namespace", "secrets", "namespace name")
flag.BoolVar(&clean, "clean", false, "whether to clean afterwards")
flag.Parse()
cli, err := client.New(config.GetConfigOrDie(), client.Options{})
if err != nil {
panic(err)
}
if clean {
defer func() {
err := cli.DeleteAllOf(context.Background(),
&corev1.Secret{},
client.InNamespace(namespace),
client.MatchingLabels{"secrets": "experiment"},
)
if err != nil {
klog.ErrorS(err, "error deleting secrets")
}
}()
}
wp := workerpool.New(16)
for i := 0; i < secretCount; i++ {
i := i
wp.Submit(func() {
secretName := fmt.Sprintf("secret-%d", i)
klog.Infof("handling %s...\n", secretName)
secret := corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: secretName,
Namespace: *&namespace,
Labels: map[string]string{
"secrets": "experiment",
},
},
Data: map[string][]byte{
"payload": randomHex(payloadSize / 2), // div by 2 because we encode in hex
},
}
klog.InfoS("creating secret...", "secret", secretName)
err := cli.Create(context.Background(), &secret, &client.CreateOptions{})
if err != nil {
var statusErr *k8serrors.StatusError
if errors.As(err, &statusErr) {
if statusErr.ErrStatus.Reason == metav1.StatusReasonAlreadyExists {
klog.InfoS("secret already exists, updating it...", "secret", secretName)
err = cli.Update(context.Background(), &secret, &client.UpdateOptions{})
if err != nil {
klog.ErrorS(err, "error updating secret", "secret", secretName)
}
return
}
klog.ErrorS(err, "error creating secret", "secret", secretName)
return
}
panic(err)
}
})
}
wp.StopWait()
} Listing script package main
import (
"context"
"flag"
"fmt"
"os"
"time"
corev1 "k8s.io/api/core/v1"
"k8s.io/klog/v2"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/client/config"
)
func main() {
var namespace string
flag.StringVar(&namespace, "namespace", "secrets", "namespace name")
flag.Parse()
cli, err := client.New(config.GetConfigOrDie(), client.Options{})
if err != nil {
fmt.Println("failed to create client")
os.Exit(1)
}
secretList := corev1.SecretList{}
klog.Infof("Starting to list secrets")
// 2. list with limit, so that get limited objects from etcd
startTime := time.Now()
opts := &client.ListOptions{
Limit: 100,
Namespace: namespace,
}
for {
err = cli.List(context.TODO(), &secretList, opts)
if err != nil {
klog.ErrorS(err, "Failed listing secrets: %v")
os.Exit(1)
}
klog.Infof("Get objects number: %v", len(secretList.Items))
opts.Continue = secretList.Continue
// there is no more objects
if secretList.Continue == "" {
break
}
}
klog.Infof("Time cost for page list: %v", time.Since(startTime))
} |
Anyway, since this was already pointed out that what I'm describing might be a different issue let me summarize what I wrote above:
If you feel that based on what I wrote above we can create a new issue I'll gladly do that and we can take it from there (to stop hajicking this one). |
@pmalek So it is caused by resource limited of kube-apiserver, which means you can not handle it no matter pagination or not?
Yeah, you can create a new issue for supporting optional page list when cache initialize, and it is also related to client-go, which haven not yet provided any external options to let ppl enable the page list. |
I don't know enough here to weigh in here but it might be that handling Secrets is not as efficient as it could be? Maybe that's known fact and expected behavior, unfortunately I don't know whether that's the case. I'm not actively working on an issue but if I find some time and enough context for this problem I will raise a separate issue for it. |
I would disagree with this statement as I can observe that even numerous Secrets with smaller payloads will cause this issue to happen. However, I cannot verify that it is the CPU pressure. |
I have tried this in a hack-ish way by setting the |
@krish7919 Not sure if you hardcoded the client-go correctly, but you can simply test it as what I sugguested (1) (2) in your cluster. It reproduces the informer initial list clearly. |
@FillZpp @pmalek I have pasted my analysis in Kong/kubernetes-ingress-controller#2382 (comment) if you would want to take a look. |
I had the same kind of error
The error was raised by the controller while it was doing it initial listing of the configmaps. This is done by a lister that uses parameter I followed that thread and the proposal by @FillZpp here.
Because of this I turned again my investigation on the APIServer side. I could notice that the error was launch exactly 1 minute after the List call. I found the parameter
So I restarted the apiServers of my cluster with this parameter set to 5m and launch the test again: no more error, after 2m the list all configmaps was available on my client side. So in my case the Wondering if KEP-3157-watch-list could help in that particular case... because just bumping the timeout on apiserver is not a great solution. |
Once #341 merges, it would be nice to add implicit support for list pagination to both
typedClient
andunstructuredClient
.I'd be happy to take a stab at this.
The text was updated successfully, but these errors were encountered: