Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpi v2.0.2 custom image boots in emergency shell, immucore can't find cos_state #1342

Closed
Ognian opened this issue Apr 23, 2023 · 24 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@Ognian
Copy link
Contributor

Ognian commented Apr 23, 2023

Kairos version:

kairos-opensuse-leap-arm-rpi:v2.0.2-k3sv1.26.3-k3s1

CPU architecture, OS, and Version:

Describe the bug

immucore can't find cos_state, see attached log

To Reproduce

Dockerfile:

FROM quay.io/kairos/kairos-opensuse-leap-arm-rpi:v2.0.2-k3sv1.26.3-k3s1
WORKDIR /usr/sbin
RUN curl -L https://github.com/Ognian/sdmon/releases/download/latest/sdmon-arm64.tar.gz | tar zxf -

custom build .sh file

mkdir -p build
docker build -t kairos-rpi-ogi . 
docker pull quay.io/kairos/osbuilder-tools:latest
docker run -v $PWD:/HERE -v /var/run/docker.sock:/var/run/docker.sock --privileged -i --rm --entrypoint=/build-arm-image.sh quay.io/kairos/osbuilder-tools:latest \
 --model rpi64 \
 --state-partition-size 6200 \
 --recovery-partition-size 4200 \
 --size 15200 \
 --images-size 2000 \
 --local \
 --config /HERE/cloud-config.yaml \
 --efi-dir /HERE/boot \
 --docker-image kairos-rpi-ogi /HERE/build/out.img

Expected behavior

Logs

rdsosreport.txt

Additional context

@Ognian Ognian added the bug Something isn't working label Apr 23, 2023
@jimmykarily jimmykarily moved this from Todo 🖊 to In Progress 🏃 in 🧙Issue tracking board Apr 24, 2023
@jimmykarily jimmykarily moved this from In Progress 🏃 to Todo 🖊 in 🧙Issue tracking board Apr 24, 2023
@Itxaka
Copy link
Member

Itxaka commented Apr 24, 2023

Looks like immucore is not able to infer the FS of the cos-state partition.... immucore v0.0.18 has a patch to avoid this

@Itxaka
Copy link
Member

Itxaka commented Apr 26, 2023

Fix released as part of v2.0.3!

Images will be ready shortly: https://github.com/kairos-io/provider-kairos/releases/tag/v2.0.3

@Ognian let us know if this still fails (it should not!)

@Itxaka Itxaka closed this as completed Apr 26, 2023
@github-project-automation github-project-automation bot moved this from Under review 🔍 to Done ✅ in 🧙Issue tracking board Apr 26, 2023
@Ognian
Copy link
Contributor Author

Ognian commented Apr 27, 2023

@Itxaka unfortunately it still fails :-(
rdsosreport.txt
In my setup the rpi is booting from an USB HD; with previous versions this worked flawless ...

@Itxaka
Copy link
Member

Itxaka commented Apr 27, 2023

umm, weird, empty label for cos_state and empty fs type??

[75.102240] localhost immucore[621]:<nil>ERR error="no such device" options=["ro"] type=  what=/dev/disk/by-label where=/sysroot/run/initramfs/cos-state

@Itxaka
Copy link
Member

Itxaka commented Apr 27, 2023

In my setup the rpi is booting from an USB HD; with previous versions this worked flawless ...

interesting. How are you setting up the usb boot? Pivoting from sd to usb or just having an usb disk with kairos installed?

I might be able to reproduce it if I know how its setup @Ognian

@Itxaka Itxaka reopened this Apr 27, 2023
@github-project-automation github-project-automation bot moved this from Done ✅ to Under review 🔍 in 🧙Issue tracking board Apr 27, 2023
@Ognian
Copy link
Contributor Author

Ognian commented Apr 27, 2023

no sd card just an USB3 HD

@Itxaka
Copy link
Member

Itxaka commented Apr 27, 2023

Built an image with the steps given, it boots on sdcard but fails on usb-hdd

Need to dust my serial to usb to track this :D

@Itxaka
Copy link
Member

Itxaka commented Apr 28, 2023

Cant get state label for some reason:

[   25.210201] immucore[561]: <nil> DBG work/internal/utils/common.go:168 > Get state label what=/dev/disk/by-label
[   25.250430] immucore[561]: <nil> DBG work/internal/utils/common.go:168 > Get state label what=/dev/disk/by-label

@Itxaka
Copy link
Member

Itxaka commented Apr 28, 2023

looks like a race condition becuase the usb takes a bit to come up:

[   24.530190] immucore[282]: <nil> INF work/pkg/mount/dag_steps.go:373 > Setting sentinel file to=active_mode
[   24.570207] immucore[282]: <nil> ERR work/internal/utils/mounts.go:69 > blkid error="exit status 2"
[   24.610185] immucore[282]: <nil> DBG work/internal/utils/mounts.go:72 > Partition FS type type= what=/dev/disk/by-label
[   24.650184] immucore[282]: <nil> ERR work/internal/utils/mounts.go:69 > blkid error="exit status 2"
[   24.690182] immucore[282]: <nil> DBG work/internal/utils/mounts.go:72 > Partition FS type type= what=tmpfs
[   24.730199] immucore[282]: <nil> DBG work/internal/utils/mounts.go:272 > vgchange out=
[   24.780430] immucore[282]: <nil> DBG work/internal/utils/mounts.go:170 > fsck command cmd="fsck /dev/disk/by-label -a"
[   24.820235] immucore[282]: <nil> INF work/pkg/mount/state.go:177 > mount done options=["rw"] type=tmpfs what=tmpfs where=/tmp
[   24.860201] immucore[282]: <nil> DBG work/internal/utils/mounts.go:173 > fsck error="exit status 8" out="fsck from util-linux 2.37.2\nfsck.ext2: No such file or directory while trying tol
[   24.910195] immucore[282]: <nil> ERR work/pkg/mount/state.go:174 > error="no such device" options=["ro"] type= what=/dev/disk/by-label where=/sysroot/run/initramfs/cos-state
[   24.960213] immucore[282]: <nil> ERR work/internal/utils/mounts.go:69 > blkid error="exit status 2"
[   25.000185] immucore[282]: <nil> DBG work/internal/utils/mounts.go:72 > Partition FS type type= what=/dev/disk/by-label
[   25.047746][   T75] scsi 0:0:0:0: Direct-Access     SanDisk  Ultra USB 3.0    1.00 PQ: 0 ANSI: 6
[   25.057702][   T75] scsi 0:0:0:0: Attached scsi generic sg0 type 0
[   25.079257][   T75] sd 0:0:0:0: [sda] 240254976 512-byte logical blocks: (123 GB/115 GiB)
[   25.088880][   T75] sd 0:0:0:0: [sda] Write Protect is off
[   25.095105][   T75] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   25.191180][   T75]  sda: sda1 sda2 sda3 sda4
[   25.199294][   T75] sd 0:0:0:0: [sda] Attached SCSI removable disk

@Itxaka
Copy link
Member

Itxaka commented Apr 28, 2023

Gonna try to introduce some safe checks arounds the cos_state partition and a retry function so it tries to get it for a bit.

@Itxaka
Copy link
Member

Itxaka commented Apr 28, 2023

This will be fixed byt the next version of immucore with this patch: kairos-io/immucore#115

Tested myself and indeed its due to the usb appearing later so immucore could not find the state label. That patch introduces a retry in order to get label or panic if it fails to.

@Ognian
Copy link
Contributor Author

Ognian commented Apr 28, 2023

@Itxaka thanks for keeping us up to date.

@Itxaka Itxaka mentioned this issue Apr 28, 2023
29 tasks
AnthonyEnr1quez referenced this issue in AnthonyEnr1quez/k3s-gitops May 1, 2023
@jimmykarily
Copy link
Contributor

I'll close this since it's already fixed and will be included in the next release. @Itxaka @Ognian feel free to re-open if this is not working or I missed something. Thanks.

@github-project-automation github-project-automation bot moved this from Under review 🔍 to Done ✅ in 🧙Issue tracking board May 2, 2023
@Ognian
Copy link
Contributor Author

Ognian commented May 2, 2023

@jimmykarily is there a way to install the newest immucore in my custom Dockerfile (see my first entry) so that I can test this before v2.1.0 ( I don't know how long v2.1.0 would take but it looks like there are still a lot of open tasks...) ...

@jimmykarily
Copy link
Contributor

@Ognian we bumped immucore on latest kairos@master. You just need to build a kairos core image from latest master. latest from here should do: https://quay.io/repository/kairos/core-opensuse-leap-arm-rpi?tab=tags

@Ognian
Copy link
Contributor Author

Ognian commented May 3, 2023

@jimmykarily cool, I started my build now, lets see... Is there also a latest-k3sv1.26.3-k3s1 build also? If not could you initiated it?

@Ognian
Copy link
Contributor Author

Ognian commented May 3, 2023

@Itxaka
Unfortunately it does not boot:
image
rdsosreport-3.5.23.txt

@jimmykarily jimmykarily reopened this May 5, 2023
@github-project-automation github-project-automation bot moved this from Done ✅ to Under review 🔍 in 🧙Issue tracking board May 5, 2023
@jimmykarily jimmykarily moved this from Under review 🔍 to Todo 🖊 in 🧙Issue tracking board May 8, 2023
@Itxaka Itxaka moved this from Todo 🖊 to In Progress 🏃 in 🧙Issue tracking board May 8, 2023
@Itxaka
Copy link
Member

Itxaka commented May 8, 2023

Same error, cos-state label nor fs could be read:

   27.403231] localhost immucore[620]: �[90m<nil>�[0m �[1m�[31mERR�[0m�[0m �[36merror=�[0m�[31m"no such device"�[0m �[36moptions=�[0m["ro"] �[36mtype=�[0m �[36mwhat=�[0m/dev/disk/by-label �[36mwhere=�[0m/sysroot/run/initramfs/cos-state

Although Im a bit worried due to not getting the proper immucore version:

[   25.958752] localhost immucore[620]: �[90m<nil>�[0m �[32mINF�[0m Immucore �[36mcommit=�[0mnone �[36mcompiled with=�[0mgo1.20.2 �[36mversion=�[0mv0.0.1

So I annot pinpoint the error exactly but Im not sure that is the proper immucore version with the fix. The new one has a panic if we cant get the label: https://github.com/kairos-io/immucore/pull/115/files#diff-310bfebac0279274ed530947884107087ff0aea1845732db964697735a848212R185 and in this case it looks like it cant get the label. So I would have expected it to panic and stop booting.

@Ognian can you provide the same logs with rd.immucore.debug in the grub command line to enable debug mode? I would like to see whats going on under there, thanks!

@Ognian
Copy link
Contributor Author

Ognian commented May 8, 2023

@Itxaka here it is
new.txt
In the meanwhile I'll try to manually pull the latest image before building...

@Ognian
Copy link
Contributor Author

Ognian commented May 8, 2023

@Itxaka pulled the newest manually, but the result is the same.... version is also v0.0.1
hope the previous with debug info helps...
1.txt

@Itxaka
Copy link
Member

Itxaka commented May 8, 2023

Definitively, that version does not contain the fix for the usb-hdd stuff as it only tries to get it once and then continues:

[   25.920990] localhost immucore[621]:  <nil>  DBG  luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:168 Get state label  what=/dev/disk/by-label
[   25.920990] localhost immucore[621]:  <nil>  DBG  luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:168 Get state label  what=/dev/disk/by-label

instead you should see something like this:

:/ # cat /proc/cmdline 
root=LABEL=COS_ACTIVE cos-img/filename=test.img rd.immcure.debug
:/ # immucore
<nil> INF luetbuild/go/src/github.com/kairos-io/immucore/main.go:29 > Immucore commit=none compiled with=go1.20.2 version=v0.0.1
<nil> DBG luetbuild/go/src/github.com/kairos-io/immucore/main.go:32 > cmdline content="root=LABEL=COS_ACTIVE cos-img/filename=test.img rd.immucore.debug\n\n"
<nil> DBG luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:130 > Target device what=test.img
<nil> DBG luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:131 > Target label what=/dev/disk/by-label/COS_ACTIVE
<nil> INF luetbuild/go/src/github.com/kairos-io/immucore/main.go:56 > Booting on active/passive/recovery.
<nil> DBG luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:181 > Cannot get state label, retrying try=0
<nil> DBG luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:181 > Cannot get state label, retrying try=1
<nil> DBG luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:181 > Cannot get state label, retrying try=2
<nil> DBG luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:181 > Cannot get state label, retrying try=3
<nil> DBG luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:181 > Cannot get state label, retrying try=4
<nil> DBG luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:181 > Cannot get state label, retrying try=5
<nil> DBG luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:181 > Cannot get state label, retrying try=6
<nil> DBG luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:181 > Cannot get state label, retrying try=7
<nil> DBG luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:181 > Cannot get state label, retrying try=8
<nil> DBG luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:181 > Cannot get state label, retrying try=9
<nil> PNC luetbuild/go/src/github.com/kairos-io/immucore/internal/utils/common.go:185 > Could not get state label error="All attempts fail:\n#1: could not get label\n#2: could not get label\n#3: could not get label\n#4: could not get label\n#5: could not get label\n#6: could not get label\n#7: could not get label\n#8: could not get label\n#9: could not get label\n#10: could not get label"
panic: Could not get state label

This last log comes from the base image at quay.io/kairos/core-opensuse-leap-arm-rpi:latest so... are you building from that base or from a provider-kairos image?

@Itxaka
Copy link
Member

Itxaka commented May 8, 2023

Ummm, maybe 2.0.3 did not get the bumped repositories for the newest immucore, let me have a look, because with the latest image from quay.io/kairos/kairos-opensuse-leap-arm-rpi:latest I get the same behaviour.

@Itxaka
Copy link
Member

Itxaka commented May 8, 2023

omg, Im an idiot. fix was released on v0.0.19. No wonder its not fixed, kairos v2.0.3 contains immucore v0.0.18 which I thougth was the one to fix this.

Sorry about that @Ognian I totally screwed from the start thinking it was already in for v2.0.3 😭 Even more stupider is that myself wrote this card into 2.1.0 saying that it was fixed by v0.0.19 and that would be released as part of kairos 2.1.0 🤦

Anyway, then the only thing you can do for now is wait for 2.1.0 to be released or build the image yourself from the provider-kairos repo with something like:

earthly +all_arm --CORE_VERSION=latest --MODEL=rpi64 --K3S_VERSION=v1.26.3-k3s

That should build a opensuse-leap based image with k3s v1.26.3-k3s for the rpi based on the latest images published (master).

@Itxaka Itxaka moved this from In Progress 🏃 to Under review 🔍 in 🧙Issue tracking board May 12, 2023
@Ognian
Copy link
Contributor Author

Ognian commented May 15, 2023

@Itxaka tested with quay.io/kairos/core-opensuse-leap-arm-rpi:v2.1.0 and at least boots now... I'm having now the problem that the config is not applied (I don't know why yet..) and of course Im waiting on the "standard" v2.1.0 version including k3s...
BUT at least it boots now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

4 participants