-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watch node object for podCIDR instead of crash + wait for restart #329
Watch node object for podCIDR instead of crash + wait for restart #329
Conversation
ecb0595
to
72715af
Compare
scripts/install-cni.sh
Outdated
host=${KUBERNETES_SERVICE_HOST} | ||
# If host contains a colon (:), it is an IPv6 address, hence needs wrapping | ||
# with [..]. | ||
if [[ "${host}" =~ : ]]; then | ||
host="[$host]" | ||
fi | ||
node_url="https://$host:${KUBERNETES_SERVICE_PORT}/api/v1/nodes/${HOSTNAME}" | ||
response=$(curl -k -s -H "Authorization: Bearer $token" "$node_url") | ||
node_url="https://$host:${KUBERNETES_SERVICE_PORT}/api/v1/watch/nodes/${HOSTNAME}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /api/v1/watch/nodes
path is deprecated:
watch changes to an object of kind Node. deprecated: use the 'watch' parameter with a list operation instead, filtered to a single item with the 'fieldSelector' parameter.
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.26/#node-v1-core
Please use the recommended approach for the list with fieldSelector for metadata.name field equal to the hostname.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
scripts/install-cni.sh
Outdated
response=$(curl -k -s -H "Authorization: Bearer $token" "$node_url") | ||
node_url="https://$host:${KUBERNETES_SERVICE_PORT}/api/v1/watch/nodes/${HOSTNAME}" | ||
for _ in {1..3}; do | ||
response=$(grep -m1 . <(curl -k -s -N -H "Authorization: Bearer $(</var/run/secrets/kubernetes.io/serviceaccount/token)" "$node_url" | jq --unbuffered -c '.object | select(.spec.podCIDR != null)')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no timeout specified, please specify the timeout explicitly using the timeoutSeconds
request param to avoid the forever hanging call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally you can add the limit=1
parameter to end the call as soon as the first update on the node object gets returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeout added, and no, we can't end as soon as the first update, in case the first update is not adding podCIDR.
grep -m1 . <(...)
can return after receiving the first line.
scripts/install-cni.sh
Outdated
node_url="https://$host:${KUBERNETES_SERVICE_PORT}/api/v1/nodes/${HOSTNAME}" | ||
response=$(curl -k -s -H "Authorization: Bearer $token" "$node_url") | ||
node_url="https://$host:${KUBERNETES_SERVICE_PORT}/api/v1/watch/nodes/${HOSTNAME}" | ||
for _ in {1..3}; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 1..3 is a magic number. make it a named const variable that says this is the number of retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
scripts/install-cni.sh
Outdated
fatal "Could not successfully watch node and wait for podCIDR." | ||
fi | ||
log "Node object fetched from $node_url" | ||
log "${response}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole block may be extracted into the bash function for better readability, i.e. get_k8s_node_descriptor. And additionally the variable name can be changed to something more meaningful than $response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The response is somehow used throughout the file... don't bother to change it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's not too many. Let me change it and get the opportunity to change and poor styles along the way.
cea4377
to
56d5748
Compare
scripts/install-cni.sh
Outdated
# for kubelet to retry the whole container if response is still not fetched. | ||
fetch_node_object 3 20 | ||
|
||
if [[ -z "${node_object:-}" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check and the following 2 log entries can be part of the fetch_node_object
function as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified a bit, but I still like to leave the full dump outside.
c01606a
to
36bddd4
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jingyuanliang, marqc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
e0f0ef5
to
7600254
Compare
c9f4198
to
817c272
Compare
This can save up to about 10 seconds (initial CrashLoopBackOff delay) before Kubelet decides to restart the pod after the crash.
/lgtm |
929f2bc
into
GoogleCloudPlatform:master
This can save up to 10 seconds before Kubelet decides to restart the pod after the crash.
Also made some logging improvements.
/assign @marqc
/assign @sypakine
/hold