Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

paictl.py service delele hangs on Unable to retrieve pull secret #1166

Closed
mzmssg opened this issue Aug 27, 2018 · 4 comments
Closed

paictl.py service delele hangs on Unable to retrieve pull secret #1166

mzmssg opened this issue Aug 27, 2018 · 4 comments
Assignees

Comments

@mzmssg
Copy link
Member

mzmssg commented Aug 27, 2018

service delete hangs because service hasn't started actually, but we call service delete without checking it.

Details:
Now our deployment have some occasional failure, an typical error is

2018-08-27 10:59:16,887 [INFO] - k8sPaiLibrary.maintainlib.kubectl_install : Successfully install kubectl and configure it!

2018-08-27 10:59:16,888 [INFO] - k8sPaiLibrary.maintainlib.deploy : Create kube-proxy daemon for kuberentes cluster.

error: unable to recognize "kube-proxy.yaml": no matches for kind "DaemonSet" in version "apps/v1"

2018-08-27 10:59:17,194 [ERROR] - k8sPaiLibrary.maintainlib.common : Failed to create kube-proxy

Then deployment failed and PAI service haven't been created.

Jenkins will do clean job after that, in which paictl.py service delele invoked.
But paictl.py service delele have a assumption that imagePullSecrets create successfully, otherwise it will hang due to image pull failed.

E0827 11:45:27.404029   60205 pod_workers.go:186] Error syncing pod 5d56b9ae-a9e9-11e8-8499-000d3a10c86f ("delete-batch-job-zookeeper-q5ct7_default(5d56b9ae-a9e9-11e8-8499-000d3a10c86f)"), skipping: failed to "StartContainer" for "cleaning-one-shot" with ImagePullBackOff: "Back-off pulling image \"openpai.azurecr.io/paiclusterint/cleaning-image:zimiao-jenkins_fix-8a13ebe-10\""
W0827 11:45:40.402114   60205 kubelet_pods.go:875] Unable to retrieve pull secret default/pai-secret for default/delete-batch-job-zookeeper-q5ct7 due to secrets "pai-secret" not found.  The image pull may not succeed.

@mzmssg
Copy link
Member Author

mzmssg commented Aug 27, 2018

Because we don't set timeout, it will block all subsequent deployments. In this case, to skip such a build, a method is ssh to Jenkins bed and exec paictl.py service start cluster-configuration

@hao1939 hao1939 changed the title paictl.py service delele hangs in jenkins pipeline. paictl.py service delele hangs on Unable to retrieve pull secret Aug 28, 2018
@hao1939 hao1939 removed their assignment Aug 28, 2018
@ydye
Copy link
Contributor

ydye commented Aug 29, 2018

paictl service delete only clean service deployment. I think the command shouldn't appear at this place. So it's a operation mistake.

@ydye ydye added the won't fix label Aug 29, 2018
@fanyangCS
Copy link
Contributor

@ydye , if we won't fix it, shall we close this issue?

@ydye
Copy link
Contributor

ydye commented Aug 29, 2018

@fanyangCS Yes, I will close it, if no more comments after 24 hours.

@ydye ydye closed this as completed Aug 30, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants