-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoreOS - locksmithd semaphore on updates on local(non clustered) etcd, rebooting all nodes at once. #5407
Comments
Hi @tommij, you can set the following in your kops ClusterSpec to disable locksmithd when a CoreOS image is used: updatePolicy: external You should be able to use FileAssets in your ClusterSpec to override the update.conf file, for example: fileAssets:
- name: coreos-update-config
path: /etc/coreos/update.conf
content: |
GROUP=stable
REBOOT_STRATEGY=etcd-lock
... |
Cheers @KashifSaadat - I realise that the file could be done via fileasset, but figured hard overwriting a file which may have future required fields could be a bad idea. Turns out, that's what the ignition config does nonetheless. Was unaware of the "external" updatepolicy, wouldn't that be counterproductive to running CoreOS to begin with? What with the idea of running CoreOS is having nodes be updated at all times. All that aside, as CoreOS is considered ready for production by KOPS, would either solution not be something to default (no updates, discovery or locksmithctl targets to the internal master dns records). I wouldn't normally assume default settings in something production ready could make my cluster unavailable. |
No worries. The addition of We could potentially look at making this option a default for CoreOS, as it would of course be undesired behaviour for kops users to have their nodes all unexpectedly restart at the same time. If you want to take a look at raising a PR that'd be great. :) |
closing this, as it's already covered elsewhere, may make make a PR with HA documentation update and possibly looking into coreos provisioning PR Thanks @KashifSaadat |
Thanks for submitting an issue! Please fill in as much of the template below as
you can.
------------- BUG REPORT TEMPLATE --------------------
kops
version are you running? The commandkops version
, will displaythis information.
Version 1.9.1 (git-ba77c9ca2)
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.GitVersion:"v1.9.6"
AWS
Cluster came up as expected, problem elsewhere
Cluster came up as expected, problem elsewhere
kops get --name my.example.com -o yaml
to display your cluster manifest.You may want to remove your cluster name and other sensitive information.
-v 10
flag.Paste the logs into this report, or in a gist and provide the gist link here.
Can't do it, cluster is down, and it's irellevant as this has to do with CoreOS configuration.
from : https://coreos.com/os/docs/latest/update-strategies.html
The overarching goal of Container Linux is to secure the Internet's backend infrastructure. We believe that automatically updating the operating system is one of the best tools to achieve this goal.
defaults to etcd-lock. However without additional configuration, the etcd semaphore lock is run in a local, non-clustered etcd, potentially causing all nodes to reboot at once on-upgrade.
To replicate:
run "locksmithctl reboot" or "locksmithctl send-need-reboot" concurrently on all nodes:
Expected behavior: one node reboots at a time using a clustered etcd semaphore.
What happens: all nodes reboot immediately, because master nodes can get a semaphore locally, and worker nodes doesn't have one running.
Attempts at adding cloud-config to the template fails, as coreos doesn't support Multipart/MIME - and attempts at making it coexist with the KOPS bash script is troublesome at best, because coreos ignition runs at initramfs
Attempts to mitigate via (deprecated) coreos-cloudinit --from-file was somewhat successful.
could (probably the least fiddly application) be done via coreos discovery as well:
https://coreos.com/os/docs/latest/cluster-discovery.html.
------------- FEATURE REQUEST TEMPLATE --------------------
CoreOS etcd clustering set up by KOPS, so default, automatic updates of CoreOS, doesn't potentially down all nodes in a cluster at once
The text was updated successfully, but these errors were encountered: