-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[occm] dnsPolicy default value recent change blocks occm / cluster from coming up. #2611
Comments
The documentation of this field is quite spectacularly bad: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy
To my eyes that clearly says that pods running with host networking should set For reasons I don't understand, most likely legacy API compatibility, Use cluster networking unless the pod uses host networking, in which case use the host's DNS.
As an early bootstrap service, I don't think CCM can rely on cluster DNS being up. I suspect it is correct to revert this. @xinity, what was the issue you were hitting which caused you to change it? It's not clear to me from reading #2594 or #2592? I appreciate that CCM is not able to resolve service names internal to the cluster, but why did that matter? |
@mdbooth occm wasn't able to query the internal coredns instance without this new value It matters because of specific internal dns zone with squid proxy that should be resolved from occm |
Right, but why? What was the internal DNS zone, and why was it important that CCM could resolve it? |
our CI passed so it should be a smaller portion of error case and I am also curious why the internal DNS zone is needed here |
I also wondered about that. Does that mean CNI comes up an an uninitialized node, and coredns tolerates uninitialised? |
I've just tested the new release of OCCM on a cluster with 1.30 and have hit this issue as well. For anyone else who is struggling to understand the root cause (being this change), the nondescript error from the CCM Pod is:
I only found the underlying error when I SSH'd onto the node where the CCM had been scheduled and looked at the container logs in |
@jichenjc @mdbooth the occm test was not really executed in #2594 that is why it did not fail. Only helm chart tests were executed. Like now we can see, our CI is basically broken. So yes, this change should be reverted testing CI in #2620 and also PR #2621 which will set the default value back what it was. |
blah at least our CI do work with this new value.. should we still use old default value just in case? |
yes, I think we may still use the old default value .. |
[occm] Recent change #2594 changed the default behaviour of dnspolicy breaking the start up process of occm when coredns has not yet been started.
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
Default value change of dnspolicy broke the bootstrapping of occm on new clusters.
The default value of dnspolicy is
Default
the change #2594 sets it toClusterFirstWithHostNet
.If coredns is pending waiting for occm to set nodes to initialized this creates issue where occm fails and won't start blocking creation of new clusters.
What you expected to happen:
Default behaviour not to change.
How to reproduce it:
Create a new cluster and use the default values of occm chart or the manifest.
Anything else we need to know?:
Environment:
The text was updated successfully, but these errors were encountered: