-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running builds get stuck after Jenkins master pod deletion and recreation #542
Comments
- The operator currently has a big issue due to its backup mechanism: when deleting the Jenkins master pod while some builds are running, these builds get stuck after the new Jenkins master pod gets up and running. See jenkinsci/kubernetes-operator#542. - The chart seems more "standard", relying on a StatefulSet. - It's more complete: includes an Ingress template for instance. - The plugin descriptor is standard, allowing for instance to not specify the version and always fetch the latest. - It still provides CasC hot reload through a sidecar container.
Hello. Thank you for bringing the issue to our attention. This is a complex task that would require much thought and work to overcome to align it with our core architectural concepts (e.g. Jenkins immutability). We are currently concentrating our efforts on introducing a new schema that will allow to resolve most pressing and long-known issues in our community. While it definitely needs to be addressed this is the first time anyone has brought it up. We will have to postpone handling it as it would mean even longer waiting period for the matters that have already been brought up multiple times. |
- The operator currently has a big issue due to its backup mechanism: when deleting the Jenkins master pod while some builds are running, these builds get stuck after the new Jenkins master pod gets up and running. See jenkinsci/kubernetes-operator#542. - The chart seems more "standard", relying on a StatefulSet. - It's more complete: includes an Ingress template for instance. - The plugin descriptor is standard, allowing for instance to not specify the version and always fetch the latest. - It still provides CasC hot reload through a sidecar container.
Hi @SylwiaBrant, thanks for your answer. I understand perfectly well that there are other priorities. Congrats for the work on the operator, besides this issue it has been working very well and the documentation is clear! |
- The operator currently has a big issue due to its backup mechanism: when deleting the Jenkins master pod while some builds are running, these builds get stuck after the new Jenkins master pod gets up and running. See jenkinsci/kubernetes-operator#542. - The chart seems more "standard", relying on a StatefulSet. - It's more complete: includes an Ingress template for instance. - The plugin descriptor is standard, allowing for instance to not specify the version and always fetch the latest. - It still provides CasC hot reload through a sidecar container.
- The operator currently has a big issue due to its backup mechanism: when deleting the Jenkins master pod while some builds are running, these builds get stuck after the new Jenkins master pod gets up and running. See jenkinsci/kubernetes-operator#542. - The chart seems more "standard", relying on a StatefulSet. - It's more complete: includes an Ingress template for instance. - The plugin descriptor is standard, allowing for instance to not specify the version and always fetch the latest. - It still provides CasC hot reload through a sidecar container.
- The operator currently has a big issue due to its backup mechanism: when deleting the Jenkins master pod while some builds are running, these builds get stuck after the new Jenkins master pod gets up and running. See jenkinsci/kubernetes-operator#542. - The chart seems more "standard", relying on a StatefulSet. - It's more complete: includes an Ingress template for instance. - The plugin descriptor is standard, allowing for instance to not specify the version and always fetch the latest. - It still provides CasC hot reload through a sidecar container.
- The operator currently has a big issue due to its backup mechanism: when deleting the Jenkins master pod while some builds are running, these builds get stuck after the new Jenkins master pod gets up and running. See jenkinsci/kubernetes-operator#542. - The chart seems more "standard", relying on a StatefulSet. - It's more complete: includes an Ingress template for instance. - The plugin descriptor is standard, allowing for instance to not specify the version and always fetch the latest. - It still provides CasC hot reload through a sidecar container.
- The operator currently has a big issue due to its backup mechanism: when deleting the Jenkins master pod while some builds are running, these builds get stuck after the new Jenkins master pod gets up and running. See jenkinsci/kubernetes-operator#542. - The chart seems more "standard", relying on a StatefulSet. - It's more complete: includes an Ingress template for instance. - The plugin descriptor is standard, allowing for instance to not specify the version and always fetch the latest. - It still provides CasC hot reload through a sidecar container.
- The operator currently has a big issue due to its backup mechanism: when deleting the Jenkins master pod while some builds are running, these builds get stuck after the new Jenkins master pod gets up and running. See jenkinsci/kubernetes-operator#542. - The chart seems more "standard", relying on a StatefulSet. - It's more complete: includes an Ingress template for instance. - The plugin descriptor is standard, allowing for instance to not specify the version and always fetch the latest. - It still provides CasC hot reload through a sidecar container.
- The operator currently has a big issue due to its backup mechanism: when deleting the Jenkins master pod while some builds are running, these builds get stuck after the new Jenkins master pod gets up and running. See jenkinsci/kubernetes-operator#542. - The chart seems more "standard", relying on a StatefulSet. - It's more complete: includes an Ingress template for instance. - The plugin descriptor is standard, allowing for instance to not specify the version and always fetch the latest. - It still provides CasC hot reload through a sidecar container.
I'm also getting this issue at the moment. Is there any workaround that we can do to continue using the operator? Or is the recommendation to just use the Jenkins Helm Chart? |
Hello @NesManrique, thanks for the update. We acknowledge the severity of this issue, but currently have no capacity to work on this it as it would require significant architectural changes to fix it properly and such changes are beyond the scope of the next release, with new API schema taking priority for the time being. |
Hi @Sig00rd ! |
Describe the bug
When deleting the Jenkins master pod while some builds are running, these builds get stuck after the new Jenkins master pod gets up and running.
I'm not sure if this is expected, nor how it could be solved.
Thanks for your help.
To Reproduce
Install the Jenkins operator with the Helm chart, keeping
jenkins.enabled=true
to install a Jenkins CR. Here's a minimal set of values:Configure the Kubernetes plugin and add a pod template using CasC.
Add a multibranch pipeline job running a Jekinsfile using this pod template (
agent
in the Jenkinsfile).Run a build on the master branch of this job, let the build start and the Kubernetes plugin provision an agent based on the above mentioned pod template.
Verify that a new pod has been created for this agent in the current namespace.
Delete the Jenkins master pod, wait for the operator to recreate it and the Jenkins master to be up.
Check the build logs in the Jenkins UI, the build is stuck with:
Finally, the build is ended with:
See the related logs below.
Why does it happen?
From what I understand, once the Jenkins master pod is recreated and is up and running, the agent (aka node, aka pod, here:
nuxeo-platform-11-pppcs
) on which the build was running doesn't exist anymore in Jenkins, thus the error in the JNLP container:Fix attempt
I figured out that the backup image was only backing up the jobs but not the nodes, see the tar command line.
So, I've tried to override the backup image with a
backup.sh
script also backing up thenodes
directory, along with thejobs
one:This is working in the sense that when the backup is restored, once Jenkins is up and running, the node appears in the node list of the Jenkins master.
Yet, since the restore happens after the Jenkins master is up and running (operator logs: 2021-04-12T08:47:47.236Z), when the JNLP container tries to connect to the Jenkins master (JNLP container logs: 2021-04-12 08:46:47.647+0000), the agent isn't there yet.
Additional information
Kubernetes version:
Jenkins Operator version: v0.5.0, installed through Helm chart 0.4.3.
Logs:
The last line is were the problem starts.
After which, the JNLP container shuts down and the pod is in error.
The text was updated successfully, but these errors were encountered: