-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom partitioning with no-format
doesn't work
#2281
Comments
@sarg3nt thanks for all your debugging efforts. I've been using a cloud-config similar to the stripped one you sent all the time with no problems. The only difference is that I almost never have a second disk attached. If that's the problem it should be easy to reproduce in qemu with 2 disks. I'd suggest you take the minimum config that reproduces the problem (the last one you sent), remove the Also, the system boots from an ISO right? And there is a |
@jimmykarily I'm using AuroraBoot , so not booting from a CD ROM I think I've somewhat figured some of this out. The root problem ties back to ttps://github.com/kairos-io/kairos/issues/2243 Even though I'm using
Example: This is with
Where Here's my strict: true
# enable debug logging
debug: true
install:
no-format: true
auto: true
poweroff: false
reboot: false
grub_options:
extra_cmdline: "rd.immucore.debug"
users:
- name: "kairos"
passwd: "kairos"
stages:
kairos-install.pre.before:
- if: '[ -e "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" ]'
name: "Create partitions"
commands:
- |
parted --script --machine -- "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" mklabel msdos
layout:
device:
path: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
expand_partition:
size: 0 # All available space
add_partitions:
# all sizes bellow are in MB
- fsLabel: COS_OEM
size: 64
pLabel: oem
- fsLabel: COS_RECOVERY
size: 8500
pLabel: recovery
- fsLabel: COS_STATE
size: 18000
pLabel: state
- fsLabel: COS_PERSISTENT
pLabel: persistent
size: 25000
filesystem: "ext4"
boot:
- systemd_firstboot:
keymap: us So . .even though I'm specifically setting Here's some troubleshooting output so you get a lay of the land.
As you can see I'm not sure what logs you want me to give you.
Please help. This isn't stable and usable with multiple disks which is a requirement for us. |
First of all, just to get it out of the equation, in your cloud config above, the That said, I can verify that no-format doesn't work as expected.
Kairos would automatically pick I used this config:
which is almost similar to @sarg3nt 's config but pointing to I compiled a kairos-agent with additional output and it seems that this line overwrites my
I tried to comment it out to see what happens and indeed
(printed at this point: https://github.com/kairos-io/kairos-agent/blob/2e9c85e63acf926ab9e0a00b3dabff4927c70c4b/pkg/action/install.go#L164) and it later fails with:
where it obviously tries to install grub on |
The selection of the target device doesn't take "NoFormat" into account: https://github.com/kairos-io/kairos-agent/blob/2e9c85e63acf926ab9e0a00b3dabff4927c70c4b/internal/agent/install.go#L216-L218 I think when |
Fixes kairos-io/kairos#2281 Signed-off-by: Dimitris Karakasilis <[email protected]>
I found the offending parts of the code here: kairos-io/kairos-agent#235 Needs a proper fix. |
@jimmykarily Thank you for looking into this and finding the problem. |
I can't make predictions, sorry. With the focus being on v3.0.0 and the UKI work, this only made it below the waterline this sprint. If things go well, we may be able to start on it 🤷 |
no-format
doesn't work
To be used here: kairos-io/kairos#2281 Signed-off-by: Dimitris Karakasilis <[email protected]>
Peg PR to allow creating more than one disk on a test VM: spectrocloud/peg#23 |
Fixes kairos-io/kairos#2281 Signed-off-by: Dimitris Karakasilis <[email protected]>
@jimmykarily Now that Kairos v3 is out and looks fairly stabilized, do you have an estimate as to when this is going to be fixed? |
Waiting for this to be merged: #2291 . |
Finishes: kairos-io/kairos#2281 Signed-off-by: Dimitris Karakasilis <[email protected]>
Finishes: kairos-io/kairos#2281 Signed-off-by: Dimitris Karakasilis <[email protected]>
@jimmykarily As per one of your posts I then tried adding the sgdisk command. commands:
- |
parted --script --machine -- "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" mklabel msdos
sgdisk -g "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" When I do that I get the following: I saw somewhere else that added add_partitions:
# all sizes bellow are in MB
- fsLabel: COS_GRUB
size: 64
pLabel: bios # or efi, tried both
filesystem: "fat"
- fsLabel: COS_OEM
size: 64
<snip> But that didn't help or change anything. What am I missing? |
I was struggling to find the right combination too. I ended up doing this: https://github.com/kairos-io/kairos/pull/2291/files#diff-1ff1699e612ac7f8c508e5f9f6a784b37441b01b8cfdebd8da3b068280385247R115 for legacy bios mode (see how the For EFI what worked for me, was to comment out the To avoid trying things blindinly, what I did was, I left kairos-agent install on automatically on the default disk. Then I save the partition scheme and tried to replicate it manually but pointing to the other disk. This way you'll know what partitions kairos-agent expects. |
@jimmykarily We are wanting to go production with the first cluster soon and I need to ensure my team this is going to be stable. Here's my config. strict: true
debug: true
install:
no-format: true
device: /dev/sda
auto: true
poweroff: false
reboot: true
grub_options:
extra_cmdline: "rd.immucore.debug"
bind_mounts:
- /run/k3s
stages:
kairos-install.pre.before:
- if: '[ -e "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" ]'
name: "Create partitions"
commands:
- |
parted --script --machine -- "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" mklabel gpt
# Legacy bios
sgdisk --new=1:2048:+1M --change-name=1:'bios' --typecode=1:EF02 "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
layout:
device:
path: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
expand_partition:
size: 0 # All available space
add_partitions:
# all sizes bellow are in MB
- fsLabel: COS_OEM
size: 64
pLabel: oem
- fsLabel: COS_RECOVERY
size: 8500
pLabel: recovery
- fsLabel: COS_STATE
size: 18000
pLabel: state
- fsLabel: COS_PERSISTENT
pLabel: persistent
size: 0
filesystem: "ext4" |
@sarg3nt do you have the |
@jimmykarily I don't know if I'm doing this right but I'm giving it my best shot. NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 1G 1 loop /
sda 8:0 0 150G 0 disk /run/k3s
sdb 8:16 0 80G 0 disk
├─sdb1 8:17 0 1M 0 part
├─sdb2 8:18 0 64M 0 part /oem
├─sdb3 8:19 0 2.2G 0 part
├─sdb4 8:20 0 4G 0 part /run/initramfs/cos-state
└─sdb5 8:21 0 73.6G 0 part /etc/pki/tls/certs
/var/lib/wicked
/var/lib/snapd
/var/lib/rancher
/var/lib/longhorn
/var/lib/kubelet
/var/lib/extensions
/var/lib/dbus
/var/lib/containerd
/var/lib/cni
/var/lib/ca-certificates
/etc/zfs
/etc/systemd
/etc/sysconfig
/etc/ssh
/var/snap
/etc/runlevels
/etc/rancher
/etc/modprobe.d
/var/log
/usr/libexec
/etc/kubernetes
/run/k3s
/etc/iscsi
/etc/init.d
/etc/cni
/root
/opt
/home
/usr/local
sdc 8:32 0 100G 0 disk /usr/local/.state/var-lib-rancher.bind/rke2
/var/lib/rancher/rke2 To be clear, in the above When I leave out echo "Disk 0, should be Kairos stuff"
lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
blkid "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
echo ""
echo "Disk 1, should be /var/lib/rancher/rke2"
lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0"
blkid "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0"
echo ""
echo "Disk 2, should be /run/k3s"
lsblk -f "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0"
blkid "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0" I get this output: Disk 0, should be Kairos stuff
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sda
└─sda1
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0: PTUUID="196e314b-07a0-45ee-b82b-419582391e6e" PTTYPE="gpt"
Disk 1, should be /var/lib/rancher/rke2
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sdb ext4 1.0 RKE2 2ee2eadc-7cb7-4231-a2c8-e79ca4ab61a7
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0: LABEL="RKE2" UUID="2ee2eadc-7cb7-4231-a2c8-e79ca4ab61a7" TYPE="ext4"
Disk 2, should be /run/k3s
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sdc
├─sdc1
├─sdc2 ext4 1.0 COS_OEM 43ca157c-cecb-4c6c-9340-ab2d1d61b765
├─sdc3 ext4 1.0 COS_RECOVERY 6f983e16-8658-4a44-a754-1cfa7883b3f0
├─sdc4 ext4 1.0 COS_STATE 78ce8c4d-92d5-4210-be27-13e20b3ec07f
└─sdc5 ext4 1.0 COS_PERSISTENT d32ad1ff-9738-4b34-ac3d-6f62110e6800
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0: PTUUID="a95f1345-adcc-4656-ae7d-17bfb3e08f5b" PTTYPE="gpt"
It seems to have installed to Disk 2? Or am I reading / interpreting this wrong? I"m confused. Why would setting I tried getting the logs you asked for. I can get something when I don't have it shut down and I ssh in to the node: [root@lpul-vault-k8s-server-0 kairos]# journalctl -u kairos-agent
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: Started kairos agent.
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: warning: skipping /oem/userdata (extension).
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Kairos Agent version=v2.8.11
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Kairos System version=v3.0.6
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF creating a runtime
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF detecting boot state
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Boot Mode boot_mode=livecd_boot
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Boot in uki mode result=false
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: warning: skipping /oem/userdata (extension).
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Kairos Agent version=v2.8.11
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Kairos System version=v3.0.6
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF creating a runtime
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF detecting boot state
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Boot Mode boot_mode=livecd_boot
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1388]: 2024-04-23T18:34:30Z INF Boot in uki mode result=false
Apr 23 18:34:30 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: kairos-agent.service: Deactivated successfully.
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: Started kairos agent.
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1458]: warning: skipping /oem/userdata (extension).
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1458]: 2024-04-23T18:34:31Z INF Kairos Agent version=v2.8.11
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1458]: 2024-04-23T18:34:31Z INF Kairos System version=v3.0.6
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1458]: 2024-04-23T18:34:31Z INF creating a runtime
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1458]: 2024-04-23T18:34:31Z INF detecting boot state
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1458]: 2024-04-23T18:34:31Z INF Boot Mode boot_mode=livecd_boot
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com kairos-agent[1458]: 2024-04-23T18:34:31Z INF Boot in uki mode result=false
Apr 23 18:34:31 lpul-vault-k8s-server-0.vault.ad.selinc.com systemd[1]: kairos-agent.service: Deactivated successfully. But that doesn't look that useful to me. A few notes to make sure you are aware of the whole setup:
Another peace of info that may or may not be useful. This is the AuroraBoot run statement. This is part of a shell file that is ran as a service on the node and I haven't touched it in a while. docker run --rm --net host \
-v "/usr/local/auroraboot-build:/tmp/auroraboot" \
-v "/etc/systemd/system/cloud_init.yaml:/cloud_init.yaml" \
-v /var/run/docker.sock:/var/run/docker.sock \
"quay.artifactory.metro.ad.selinc.com/kairos/auroraboot:${AURORABOOT_VERSION}" \
--set "container_image=$container_image" \
--cloud-config /cloud_init.yaml \
--set "state_dir=/tmp/auroraboot" \
--set netboot.cmdline="rd.neednet=1 ip=dhcp rd.cos.disable netboot nodepair.enable console=tty0 selinux=0" \
--debug \ I'm curios about the Hope this helps. |
See attached log file. |
When you are setting It's possible that when you don't set the device explicitly, for some reason the detection doesn't work and the target is left empty. But that would need @sarg3nt the logs you shared are not the installation logs. I'm not sure if those are available after rebooting to the system.
Maybe there are other ways to get the installation logs in the auroraboot flow but I can't think one right now. Maybe if you set One of the 2 options should allow you to collect installation logs and that will reveal more on what actually happens. Thanks for your patience in fixing this Dave, let's hope we get it sorted out soon! |
The logs you attached show immucore
the image you use should be
Something is off... thanks @Itxaka for spotting this |
@jimmykarily I rebuilt the client OS image from the latest master and our AuroraBoot image even though we were already running AuroraBoot Your info about booting into live-cd mode and running The bigger surprise is that To explain in more detail. The target node gets our custom config from two sources. The first is the basic config that is the same for all cluster nodes and is served from AuroraBoot. This config has the I'll include redacted copies of each below, to help clarify what is happening. From AuroraBoot: #cloud-config
strict: true
debug: true
install:
no-format: true
auto: false
poweroff: false
reboot: false
grub_options:
extra_cmdline: "rd.immucore.debug"
bind_mounts:
- /run/k3s
users:
- name: "kairos-auroraboot"
passwd: "<redacted>"
ssh_authorized_keys:
- <redacted>
write_files:
- encoding: b64
content: <redacted>
path: <redacted>
permissions: "0444"
runcmd:
- Some run commands here.
stages:
kairos-install.pre.before:
- if: '[ -e "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" ]'
name: "Create partitions"
commands:
- |
parted --script --machine -- "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" mklabel gpt
# Legacy bios
sgdisk --new=1:2048:+1M --change-name=1:'bios' --typecode=1:EF02 "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
layout:
device:
path: "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
expand_partition:
size: 0 # All available space
add_partitions:
- fsLabel: COS_OEM
size: 64
pLabel: oem
- fsLabel: COS_RECOVERY
size: 8500
pLabel: recovery
- fsLabel: COS_STATE
size: 18000
pLabel: state
- fsLabel: COS_PERSISTENT
pLabel: persistent
size: 0
filesystem: "ext4"
boot:
- systemd_firstboot:
keymap: us
- name: "Environment Variables"
environment:
HTTP_PROXY: "<redacted>"
<snip>
- name: "Setup services"
systemctl:
disable:
- dnf-makecache
- name: "Setup NTP"
systemctl:
enable:
- systemd-timesyncd
timesyncd:
NTP: "<redacted>"
FallbackNTP: ""
after-install-chroot:
- name: "Create data directories"
commands:
- make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0" "/var/lib/rancher/rke2" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
- make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0" "/run/k3s" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
- make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0" "/var/lib/rancher/longhorn" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
- name: "Format disks"
commands:
- make_disk.sh "format_disk" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0" "/var/lib/rancher/rke2" "RKE2" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
- make_disk.sh "format_disk" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0" "/run/k3s" "K3S" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
- make_disk.sh "format_disk" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0" "/var/lib/rancher/longhorn" "LONGHORN" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
after-reset-chroot:
- name: "Create data directories"
commands:
- make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0" "/var/lib/rancher/rke2" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
- make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0" "/run/k3s" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
- make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0" "/var/lib/rancher/longhorn" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
after-upgrade-chroot:
- name: "Create data directories"
commands:
- make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0" "/var/lib/rancher/rke2" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
- make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0" "/run/k3s" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
- make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0" "/var/lib/rancher/longhorn" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
initramfs:
- name: "Mount disks"
commands:
# Making the /run/k3s directory here as well as it fixes the directory going missing bug
- make_disk.sh "make_directory" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0" "/run/k3s" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
- make_disk.sh "mount_disk" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0" "/var/lib/rancher/rke2" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
- make_disk.sh "mount_disk" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:2:0" "/run/k3s" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
- make_disk.sh "mount_disk" "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:3:0" "/var/lib/rancher/longhorn" 2>&1 | tee -a /var/log/sel/make_disk.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi The file that is injected via VSphere and is running on startup: This ends up in #cloud-config
users:
- name: "kairos"
passwd: "<redacted>"
ssh_authorized_keys:
- ssh-rsa <redacted>
write_files:
# These files exist after startup.
- encoding: b64
content: '<redacted>'
path: /etc/rancher/rke2/config.yaml
permissions: "0644"
owner: "root"
- encoding: b64
content: '<redacted>'
path: /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
permissions: "0644"
owner: "root"
stages:
initramfs:
- name: "Set hostname"
hostname: "lpul-vault-k8s-server-0.vault.ad.selinc.com"
- name: "Run commands"
commands:
- bash /usr/bin/initramfs_scripts.sh 2>&1 | tee -a /var/log/sel/initramfs_scripts.log && if [[ $PIPESTATUS[0] -ne 0 ]]; then exit 1; fi
boot:
- name: "Setup services"
systemctl:
enable:
- rke2-server.timer
- vmtoolsd.timer
- qualys-cloud-agent.timer
- falcon-sensor.timer
start:
- rke2-server.timer
- vmtoolsd.timer
- qualys-cloud-agent.timer
- falcon-sensor.timer And here's the log after running
|
Multiple directories get scanned when the The files in those directories are filtered by yaml extension and valid header and they are merged into one config. You can see the result of that merge at the beginning of the installation logs. In that merged struct, I see this:
which seems to originate in the Aurora boot config you attached. This means, it's being read. This block exists too:
So, although there is a lot happening here and I could easily miss something important, it seems that both configs are merged in the final one. The installation output even shows partitions being created:
(I'm not sure where the "Setting name!" text is coming from) In the installation logs above there are some errors (not necessarily explaining the original issue). E.g.:
maybe you can tell where these are coming from? |
it's from the
|
Ohh, I see, the reason I couldn't find the echo "#cloud-config" > /tmp/config.yaml
kairos-agent manual-install /tmp/config.yaml 2>&1 | tee /tmp/out.log The temp file is required or it won't run, but it doesn't really need anything in it. Re: my question above.
Is that expected or a bug? |
The |
umm, no, I cant understand why would the install auto start if the install.auto is set to false.... cmdline in aurora is not supposed to start the install either if the auto is set to false. Maybe we got a bug around that? |
-i expects a hexadecimal value -n should be used for the filesystem label https://man7.org/linux/man-pages/man8/mkfs.vfat.8.html Needed as part of kairos-io/kairos#2281 so that this config: ``` add_partitions: - fsLabel: COS_GRUB size: 64 pLabel: efi filesystem: "fat" ``` won't result in this error: ``` Volume ID must be a hexadecimal number ``` Signed-off-by: Dimitris Karakasilis <[email protected]> (cherry picked from commit 4ebbc75)
* Set the vfat label using the correct flag (#141) -i expects a hexadecimal value -n should be used for the filesystem label https://man7.org/linux/man-pages/man8/mkfs.vfat.8.html Needed as part of kairos-io/kairos#2281 so that this config: ``` add_partitions: - fsLabel: COS_GRUB size: 64 pLabel: efi filesystem: "fat" ``` won't result in this error: ``` Volume ID must be a hexadecimal number ``` Signed-off-by: Dimitris Karakasilis <[email protected]> (cherry picked from commit 4ebbc75) * Reuse existing user id if it exists (#145) (cherry picked from commit 9484451) * Handle duplicated names in a stage (mudler#147) (cherry picked from commit 9394813) --------- Co-authored-by: Dimitris Karakasilis <[email protected]> Co-authored-by: Itxaka <[email protected]> Co-authored-by: Mauro Morales <[email protected]>
Kairos version:
/kairos/rockylinux:9-core-amd64-generic-v2.4.3
CPU architecture, OS, and Version:
Linux lpul-vault-k8s-agent-2.vault.ad.selinc.com 5.14.0-362.8.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov 8 17:36:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Describe the bug
device: auto
Kairos will install, reboot and install again infinitely, i.e. an install loop.no-format: true
and the boot disk is manually created as per the instructions then the OS installs, reboots and immediately gets stuck at a black screen with a flashing cursor in the top left cornerdevice: /dev/sda
then it will work sometimes but this is not stable as/dev/sda
can change from boot to boot due to async device assignments. See Kairos not installing on correct device #2243To Reproduce
Note: We install via netboot from AuroraBoot, not sure if this is a factor but I doubt it.
device: auto
in thecloud_init.yaml
Expected behavior
Larger volumes than the boot volume should be supported.
This may require fixing the
device
bug as mentioned in #2243Logs
Have not been able to attain logs due to failure.
Additional context
kairos/rockylinux:9-core-amd64-generic-v2.4.3
and add several customizations and add-ons but I've also tested from the basekairos/rockylinux:9-core-amd64-generic-v2.4.3
image and have gotten the same results.kairos/rockylinux:9-core-amd64-generic-v3.0.0-alpha3
with the same results.The
cloud_init.yaml
file for a custom formatted disk resulting in a blank screen after install.Even more stripped down YAML file without custom formatted disk resulting in an install loop.
As stated above, if I set
device: /dev/sda
it will work with some of the nodes and boot lock on others, which is not acceptable.The text was updated successfully, but these errors were encountered: