Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make real installer image, as own package #4248

Merged
merged 11 commits into from
Sep 19, 2024
Merged

Conversation

deitch
Copy link
Contributor

@deitch deitch commented Sep 12, 2024

Complete revamp of installer build.

This is a fairly large PR. Unfortunately, I could not really break it into multiple.

In the past, the installer and verifier were heavily duplicated, and did a lot of "ugly hacking" to start the usual rootfs and inject the right installer/verifier container in before others. It was brittle, and made redoing the installer terribly painful and changing a lot of things. Even changing unrelated things could break them.

This PR completely revamps that.

Now we have two main parts:

  • images/installer.yml.in, which builds a dedicated installer image, just like rootfs.yml.in
  • pkg/verifier, which is the container that does install and verify. This is included in installer.yml.in just like pkg/uefi or pkg/pillar are included in rootfs.yml.in

If you want to change how install works, or verification, or add a GUI or TUI, or options, or whatever, just change pkg/verifier and rebuild.

The only links between pkg/verifier and the outside are the "debug" options. The installer reads /proc/cmdline to determine if a certain setting is enabled, and if so, runs in debug mode and all of its output to the console. This is set in a grub menu option, which is in pkg/eve/installer/grub_installer.cfg.

Also redoes how grub files include each other. It was necessary to make all of the scenarios work, but is much cleaner.
Rather than getting pkg/eve/installer/grub.cfg to be called sometimes, and maybe accidentally overwrite the main grub.cfg, and worry about duplicate functions, etc., the main grub.cfg always is called, but it knows how to include grub_include.cfg. For installer images, that will be present, injected as part of installer.yml.in. We can expand this facility in the future if we need. But the execution chain is much simpler now. Always the same starting point of grub.cfg (whose source is pkg/grub/rootfs.cfg), and it knows to include others if they are found.

Running net installer locally over qemu, via make run-installer-net, used to extract the installer.net into the $(BUILD_DIR)/installer directory. That could (and did) overwrite existing things by accident, and then you worry about why future things won't run. Instead, it now creates a new dir $(BUILD_DIR)/netboot and boots from there.

Because verification image no longer exists, removed all of the references to it in Makefile.

Added docs, notably to pkg/installer/README.md and detailed the boot chain in docs/BOOT-INSTALLER.md. Hopefully no one will have to go through the weeks I went through to figure it all out.

@deitch
Copy link
Contributor Author

deitch commented Sep 12, 2024

I would ask no one to approve this until multiple people have commented. I think eden tests are a waste of resources until then.

@jsfakian has commented that he has tried it, and he thinks the installer might work on raw but not iso. Comment here please.

@deitch
Copy link
Contributor Author

deitch commented Sep 12, 2024

What does this mean in yetus errors?

  ./out has been created
  Modes: GitHubActions Robot InContainer ResetRepo UnitTests
  Processing: GH:4248
  GITHUB PR #4248 is being downloaded from
  https://api.github.com/repos/lf-edge/eve/pulls/4248
    JSON data at Thu Sep 12 11:29:42 AM UTC 2024
    Patch data at Thu Sep 12 11:29:43 AM UTC 2024
  ERROR: Unsure how to process GH:4248. Permissions missing?

@OhmSpectator
Copy link
Member

OhmSpectator commented Sep 12, 2024

@yash-zededa, could you please help understand why Yetus failed to run?

Upd. Just restarting the workflow does not help =(

@yash-zededa
Copy link
Collaborator

@yash-zededa, could you please help understand why Yetus failed to run?

Upd. Just restarting the workflow does not help =(

Just checked. GHA throws an error

ERROR: Unsure how to process GH:4248. Permissions missing?

@OhmSpectator
Copy link
Member

OhmSpectator commented Sep 12, 2024

@yash-zededa, could you please help understand why Yetus failed to run?
Upd. Just restarting the workflow does not help =(

Just checked. GHA throws an error

ERROR: Unsure how to process GH:4248. Permissions missing?

Yep, we saw it... But how come?

@yash-zededa
Copy link
Collaborator

@yash-zededa, could you please help understand why Yetus failed to run?
Upd. Just restarting the workflow does not help =(

Just checked. GHA throws an error

ERROR: Unsure how to process GH:4248. Permissions missing?

Yep, we saw it... But how come?

Permission shouldn't be a problem. I've never encountered this error before. I'm looking into it to get more details.

@OhmSpectator
Copy link
Member

The new docs and README files would be helpful to put into https://github.com/lf-edge/eve/blob/master/docs/mkdocs/mkdocs.yml, as the team members use it for easier documentation exploration.

@deitch
Copy link
Contributor Author

deitch commented Sep 12, 2024

@OhmSpectator done, good point, thank you.

@deitch
Copy link
Contributor Author

deitch commented Sep 12, 2024

Permission shouldn't be a problem. I've never encountered this error before. I'm looking into it to get more details

I think we saw it once before, but don't rely on my memory

@deitch
Copy link
Contributor Author

deitch commented Sep 12, 2024

Of course, now that I split it into 10 distinct commits, to make it easier to review and understand each one, lint commit complains about the short commit messages. If you put it together, like the original single commit, it is quite large and detailed. Now it is 10 small and detailed.

I am ignoring that complaint.

@@ -0,0 +1,1279 @@
// Copyright (c) 2024 Zededa, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file a copy of pillar/cmd/tpmmgr/tpmmgr.go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost. I trimmed it down a lot, just to what we need in installer. And to run standalone.

This all should be replaced soon enough by @jsfakian, I believe.

@OhmSpectator
Copy link
Member

Is it ok? I ran make pkg/installer eve

#17 [raw 1/2] RUN rm /initrd.img /installer.img
#17 0.061 rm: cannot remove '/installer.img': No such file or directory
#17 ERROR: process "/bin/ash -eo pipefail -c rm /initrd.img /installer.img" did not complete successfully: exit code: 1
------
 > [raw 1/2] RUN rm /initrd.img /installer.img:
0.061 rm: cannot remove '/installer.img': No such file or directory
------
Error: error building "lfedge/eve:0.0.0-installer-img-db02f1ea-kvm": error building for arch amd64: failed to solve: process "/bin/ash -eo pipefail -c rm /initrd.img /installer.img" did not complete successfully: exit code: 1
2024/09/12 15:03:16 error during command execution: error building "lfedge/eve:0.0.0-installer-img-db02f1ea-kvm": error building for arch amd64: failed to solve: process "/bin/ash -eo pipefail -c rm /initrd.img /installer.img" did not complete successfully: exit code: 1
make: *** [Makefile:844: eve] Error 1
rm images/out/installer-kvm-generic.yml.in images/out/rootfs-kvm-generic.yml.in

@deitch
Copy link
Contributor Author

deitch commented Sep 12, 2024

Is it ok? I ran make pkg/installer eve

I will check it.

# SPDX-License-Identifier: Apache-2.0

# Dockerfile to build installer img initrd
FROM lfedge/eve-alpine:c114cf1d3ea51534f061f9aa949beb6ac5c12fb3 AS build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use 82df60e43ab9f8c935584b8c7b4d0a4b0271d608 now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will bump.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure?

$ lkt pkg show-tag ./pkg/alpine
lfedge/eve-alpine:2ed33236f1eea746532f612d6c4b7779c5da2193

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I will take yours. If they have to be update, we will do them all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, @rene updated it not that long ago: 72e392c

@deitch
Copy link
Contributor Author

deitch commented Sep 12, 2024

Yup, CI caught the same thing. That is what it is there for 😄

@deitch
Copy link
Contributor Author

deitch commented Sep 12, 2024

Is it ok? I ran make pkg/installer eve

Yeah, good catch. The eve build is fine, but still assumes some older stuff is there. I will update it.

ADD http://sources.buildroot.net/edid-decode/edid-decode-188950472c19492547e298b27f9da0d72cf826df.tar.gz /out/edid-decode-188950472c19492547e298b27f9da0d72cf826df.tar.gz
ADD https://github.com/linuxhw/build-stuff/releases/download/1.6/hw-probe-1.6-AI.tar.gz /out/hw-probe-1.6-AI.tar.gz
# Temporarily removed ubuntu VM image to make the verification image smaller.
#ADD https://cloud-images.ubuntu.com/minimal/releases/jammy/release/ubuntu-22.04-minimal-cloudimg-amd64.img /out/ubuntu-22.04-minimal-cloudimg-amd64.img
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea. 🤷‍♂️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do. These are tools used in the verifier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove 34 and 35 if you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was talking about the commented lines...

@deitch
Copy link
Contributor Author

deitch commented Sep 12, 2024

OK, this is going to take some work to get working with make eve. Just when you thought you were out of the woods!

It turns out that pkg/eve/runme.sh not only has lots of duplication of stuff from other builds, but assumes a lot about the internal structure.

Probably a day or so to figure this out.

wireless-tools libxrandr eudev-libs libusb sudo fio iperf3 sysstat \
lm-sensors acpi iw libdrm hwinfo dhclient dmidecode smartmontools \
pciutils libgcc pixman glib libvncserver musl-utils dhclient \
qemu-system-x86_64 tpm2-tss-dev tpm2-tss-esys tpm2-tss-fapi tpm2-tss-rc \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need qemu-system-x86_64 in the installer?...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. But I don't want to change too much from how current installer+verifier works or what it has. The goal in this one is to replace the process for getting a working installer image (and clean up all of the complexity on the way, which includes how we build and grub and ipxe etc)

If this was there before, and doesn't break anything, let's leave it. We always can do another PR that just touches pkg/verifier when done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm... It wasn't obvious to me that this came from an old version. I thought it was a new file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right, it wasn't clear. It consolidates everything that was in both the old installer and the old verification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe describe in the commit message where you introduce a new package which part of it is taken from the old one?

@OhmSpectator
Copy link
Member

OK, this is going to take some work to get working with make eve. Just when you thought you were out of the woods!

It turns out that pkg/eve/runme.sh not only has lots of duplication of stuff from other builds, but assumes a lot about the internal structure.

Probably a day or so to figure this out.

Ok, could we mark the PR as a draft for now? I review it with a lower priority then. Ok?

@deitch
Copy link
Contributor Author

deitch commented Sep 17, 2024

How do I get the eden tests to run again?

@OhmSpectator OhmSpectator self-requested a review September 17, 2024 09:24
@OhmSpectator
Copy link
Member

How do I get the eden tests to run again?

I reapproved to trigger the tests again.

@eriknordmark
Copy link
Contributor

I don't understand this part:

pkg/verifier, which is the container that does install and verify. This is included in installer.yml.in just like pkg/uefi or pkg/pillar are included in rootfs.yml.in

What is the impact on the verifier image? Will that be included in the installer image?
The verifier might include guest VM images, hence can be significantly larger than the EVE installer image.

@deitch
Copy link
Contributor Author

deitch commented Sep 17, 2024

I'm not sure what that means. My understanding is that verifier is basically installer plus a bunch of verification tests. Tons of duplication. I dug into the code and that's what I saw. So I merged them.

What does "verification image plus guest VMs" mean? It didn't fit with anything I saw?

@jsfakian
Copy link
Contributor

I'm not sure what that means. My understanding is that verifier is basically installer plus a bunch of verification tests. Tons of duplication. I dug into the code and that's what I saw. So I merged them.

What does "verification image plus guest VMs" mean? It didn't fit with anything I saw?

@eriknordmark, @deitch, Initially, with the verifier, I wanted to include various tests with guest VMs with different parameters. To achieve that, I included a small Ubuntu image (~250 MB) in the verifier. This caused some problems with the verification process, and I temporarily removed that Ubuntu image. So, Erik asks what we are going to do if we decide to add guest images to the verifier in the future (which is now the same as the installer). Are we okay with having a huge installer image?

My idea was to add a very small Alpine VM (~10MB), which would be okay. What do you think?

@deitch
Copy link
Contributor Author

deitch commented Sep 17, 2024

My idea was to add a very small Alpine VM (~10MB), which would be okay. What do you think?

To my mind, that would be fine. Minimal impact.

We aren't going to start having dozens of different verifier images, nor one with 20 VMs in it, making it many GBs of size. So either it will be static and small with no VMs (or 1-2 really small ones, like @jsfakian said), or dynamic, which will pull things down from somewhere (Internet? attached drive?), which means a whole new design for verifier than what we have today.

Either way, we should be fine with this.

@deitch
Copy link
Contributor Author

deitch commented Sep 17, 2024

@OhmSpectator it doesn't look to me like eden tests ran?

@OhmSpectator
Copy link
Member

@OhmSpectator it doesn't look to me like eden tests ran?

strange... @yash-zededa, can you check, please?

@deitch
Copy link
Contributor Author

deitch commented Sep 17, 2024

I tried rerunning it manually by selecting, "run all jobs" (i.e. those that were skipped), it does another run, but immediately says, "this job was skipped", not clear why, though.

@rene
Copy link
Contributor

rene commented Sep 17, 2024

I tried rerunning it manually by selecting, "run all jobs" (i.e. those that were skipped), it does another run, but immediately says, "this job was skipped", not clear why, though.

I think that's because of Yetus failure....

@rene
Copy link
Contributor

rene commented Sep 17, 2024

@deitch , have you opened the ticket on LF's Jira for the creation of pkg/installer at dockerhub? We need to have this repository in place before merging.....

@deitch
Copy link
Contributor Author

deitch commented Sep 17, 2024

Of course, no one seems to know why yetus is failing (i.e. not even running).

./out has been created
  Modes: GitHubActions Robot InContainer ResetRepo UnitTests
  Processing: GH:4248
  GITHUB PR #4248 is being downloaded from
  https://api.github.com/repos/lf-edge/eve/pulls/4248
    JSON data at Mon Sep 16 05:35:02 PM UTC 2024
    Patch data at Mon Sep 16 05:35:03 PM UTC 2024
  ERROR: Unsure how to process GH:4248. Permissions missing?

Copy link
Contributor

@rene rene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deitch , thanks for this hard work, the installer booting and overall process is much clear now... and I hope it can solve the installation for RPi (didn't test it yet)....

@deitch
Copy link
Contributor Author

deitch commented Sep 17, 2024

installer booting and overall process is much clear now

and documented. Funny, that is what I am most proud of.

and I hope it can solve the installation for RPi (didn't test it yet)

If not, we can focus on fixing it just on a small part.

@deitch
Copy link
Contributor Author

deitch commented Sep 17, 2024

have you opened the ticket on LF's Jira for the creation of pkg/installer at dockerhub?

Yup, all done yesterday. Docker Hub repo in place.

@deitch
Copy link
Contributor Author

deitch commented Sep 17, 2024

Rate pull limits!

@deitch
Copy link
Contributor Author

deitch commented Sep 17, 2024

eden is happy! Can we get request for changes, or approval (written) so we can merge?

@eriknordmark
Copy link
Contributor

@deitch , thanks for this hard work, the installer booting and overall process is much clear now... and I hope it can solve the installation for RPi (didn't test it yet)....

@rene will you test the RPi installation before we merge this?

@rene
Copy link
Contributor

rene commented Sep 18, 2024

@deitch , thanks for this hard work, the installer booting and overall process is much clear now... and I hope it can solve the installation for RPi (didn't test it yet)....

@rene will you test the RPi installation before we merge this?

@eriknordmark , we will test it tomorrow, if you want to wait, I can merge it if everything works....

@deitch
Copy link
Contributor Author

deitch commented Sep 18, 2024

Yeah let's wait the extra day.

@rene
Copy link
Contributor

rene commented Sep 19, 2024

Tested on:

  • Pollux board
  • Raspberry Pi 4

Thanks @christoph-zededa for the help 👍

@rene rene merged commit 6169cf9 into lf-edge:master Sep 19, 2024
81 of 95 checks passed
@deitch deitch deleted the installer-img branch September 19, 2024 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants