Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to upgrade from focal to noble #7406

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Conversation

legoktm
Copy link
Member

@legoktm legoktm commented Jan 7, 2025

Status

Ready for review

Description of Changes

The script is split into various stages where progress is tracked on-disk. The script is able to resume where it was at any point, and needs to, given multiple reboots in the middle.

The new noble-upgrade.json file shipped in the securedrop-config package is used to control the upgrade process.

Fixes #7332.

Testing

How should the reviewer test this PR?

Preparation

  • Build focal and noble packages off this PR (UBUNTU_VERSION=focal make build-debs and UBUNTU_VERSION=noble make build-debs).
  • Copy the respective focal and noble folders to a accessible website
  • Create a dist folder alongside focal and noble, run apt-ftparchive packages ./focal > dists/focal/main/binary-amd64/Packages and apt-ftparchive packages ./noble > dists/noble/main/binary-amd64/Packages (you'll need to create the intermediate directories in dist/ manually)
    • As an example, you can look at how I've set up https://legoktm.com/apt/ - but don't use these packages, they're out of date and volatile!
  • Set up a 2.11.1 staging/prod install (n.b. I've only tested this on physical hardware, nothing should stop it from working on a virtualized setup though)

The next step of steps need to be applied to both app and mon:

  • Add your custom apt repository to the /etc/apt/sources.list.d/apt_freedom_press.list file with a line like deb [trusted=yes] https://example.org/apt focal main. The [trusted=yes] bypasses PGP signature checking so we don't need to also fiddle with signing the temporary packages and installing the keyring.
  • Run sudo apt-get update && sudo unattended-upgrade to upgrade to the 2.12.0-rc0 packages.
  • Edit /lib/systemd/systemd/securedrop-noble-migration-upgrade.service to add the line Environment=EXTRA_APT_SOURCE="deb [trusted=yes] https://example.org/apt noble main" (note that this says noble and not focal! also keep an eye on the quotes)
  • Reboot.

Upgrading

This should be repeated twice, once for app and then once that's done, for mon.

  • verify /etc/securedrop-noble-migration-state.json was created with {"finished": None, "bucket": 1-5}.
  • open a background window that runs journalctl -f - primarily to follow along the progress
  • edit /usr/share/securedrop/noble-migration.json, to set app.enabled = true and app.bucket = 5. (or mon if that's what you're upgrading)
  • Wait for the securedrop-noble-migration-upgrade systemd timer to start (no more than 3 minutes). You can also initiate it manually with sudo systemctl start securedrop-noble-migration-upgrade.
  • The server should reboot (you'll need to restart your journalctl -f window). Once it comes back, /etc/securedrop-noble-migration-state.json should now have finished PendingUpdates.
  • Wait again for the securedrop-noble-migration-upgrade systemd timer to start. You should see apt's progress in the journalctl window, it'll take a while depending on internet and hardware speed.
  • if this is the app migration, you should be able to verify that the SI/JI are unreachable (apache is masked)
  • Eventually it should reboot again. Once it's back, /etc/securedrop-noble-migration-state.json should be at Reboot.
  • Wait for the securedrop-noble-migration-upgrade systemd timer to start again, once it does it should pretty quickly reach the Done stage.
  • cat /etc/os-release should output noble.

Verification

After the upgrade:

  • Running ./securedrop-admin verify should pass.
  • Basic SI/JI functionality should work
  • OSSEC notifications are coming through like expected

Misc.

Deployment

Any special considerations for deployment? Consider both:

  1. Upgrading existing production instances.
  2. New installs.

Checklist

If you made changes to the server application code:

  • Linting (make lint) and tests (make test) pass in the development container

If you made changes to securedrop-admin:

  • Linting and tests (make -C admin test) pass in the admin development container

If you made changes to the system configuration:

If you added or removed a file deployed with the application:

  • I have updated AppArmor rules to include the change

If you made non-trivial code changes:

  • I have written a test plan and validated it for this PR

Choose one of the following:

  • I have opened a PR in the docs repo for these changes, or will do so later
  • I would appreciate help with the documentation
  • These changes do not require documentation

If you added or updated a reference to a production code dependency:

Production code dependencies are defined in:

  • admin/requirements.in
  • admin/requirements-ansible.in
  • securedrop/requirements/python3/requirements.in
  • securedrop/requirements/python3/translation.in (used in the build
    container)

If you changed another requirements.in file that applies only to development
or testing environments, then no diff review is required, and you can skip
(remove) this section.

Choose one of the following:

  • I have performed a diff review and pasted the contents to the packaging wiki
  • I would like someone else to do the diff review
  • I am silencing an alert related to a production dependency, because (please explain below):

@legoktm legoktm force-pushed the stg-upgrade-script branch 3 times, most recently from 843ac2a to ce32f1e Compare January 8, 2025 20:57
@legoktm
Copy link
Member Author

legoktm commented Jan 8, 2025

I think the script is basically complete at this point, but I haven't actually tried it yet. So I need to do that, and then figure out how we're going to do CI on it. I think we should ideally be able to take the focal staging environment, upgrade it, and then re-run testinfra (noble) checks on it.

@legoktm legoktm force-pushed the stg-upgrade-script branch 4 times, most recently from b72bef2 to 6e433aa Compare January 14, 2025 21:28
@legoktm
Copy link
Member Author

legoktm commented Jan 14, 2025

Fixed a number of issues found by actual test runs, currently hit:

# apt-get upgrade --without-new-pkgs --force-confold --force-confdef
E: Command line option --force-confold is not understood in combination with the other options

Will get to that tomorrow.

@legoktm
Copy link
Member Author

legoktm commented Jan 15, 2025

Interesting wrinkle, because the systemd unit is installed by the Debian package, it smartly wants to restart the service during package upgrade. Except that kills the apt-get process upgrading the package, which totally breaks everything.

I'm trying to figure out how to stop that restart, but it doesn't seem like dh_systemd_start's --no-restart-after-upgrade is doing what it's documented to do. One option I'm trying is systemd's RefuseManualStart/RefuseManualStop.

Alternatively we could have the script fork in a way that it doesn't get killed when the service stops - similar to what unattended-upgrades does.

@legoktm legoktm force-pushed the stg-upgrade-script branch from 6e433aa to e1396bb Compare January 15, 2025 20:16
@legoktm
Copy link
Member Author

legoktm commented Jan 15, 2025

I'm thinking of getting rid of the --without-new-pkgs step and just doing a single full-upgrade step, it'll avoid a lot of the dependency constraint weirdness that keeps manifesting in weird ways. For example, I dropped the apparmor-utils dependency, which meant that Python 3.12 got pulled in later, and then app-code got upgraded too early.

@legoktm legoktm force-pushed the stg-upgrade-script branch 3 times, most recently from 19c39ae to 5bf94fe Compare January 15, 2025 22:29
@legoktm
Copy link
Member Author

legoktm commented Jan 16, 2025

Alternatively we could have the script fork in a way that it doesn't get killed when the service stops - similar to what unattended-upgrades does.

I think this is the way to go, but with a slightly different variant. We should just ensure the apt-get/dpkg processes don't get killed. It's okay if the upgrade script dies, as long as apt-get keeps going. The commands are all idempotent so when the script gets restarted by the timer, it'll re-run the apt-get command it'll do nothing and not kill itself, and then keep moving on.

Plus by just keeping apt-get alive, we don't need to implement any locking, etc ourselves, because dpkg already takes care of all of that in a battle-tested manner.

@legoktm legoktm force-pushed the stg-upgrade-script branch from 5bf94fe to 087599e Compare January 16, 2025 19:31
@legoktm
Copy link
Member Author

legoktm commented Jan 16, 2025

The changes I just pushed introduce a new check_call_nokill that runs commands in a separate process group. Combined with KillMode=process in the systemd unit, the nokill commands will continue to run and live, even when the systemd unit is restarted.

To visualize:

  • systemd starts securedrop-noble-migration-upgrade.service via timer
    • upgrade script runs apt-get full-upgrade (in nokill mode)
      • apt-get updates packages, including securedrop-config
  • securedrop-config's postinst stops the upgrade.service, killing the securedrop-noble-migration-upgrade process, but not apt-get
  • systemd starts securedrop-noble-migration-upgrade.service via timer
    • upgrade script runs apt-get full-upgrade (in nokill mode)
      • if apt has not yet finished from the previous invocation: it'll error on the lock and the upgrade script will die (to be resumed on the next timer invocation)
      • if apt has finished, then it'll be a quick no-op and the upgrade script will proceed.

One gotcha here is that we can no longer reliably capture stdout/stderr because if the parent is killed, it'll go nowhere. So we will need to send it to a file presumably.

@legoktm legoktm force-pushed the stg-upgrade-script branch from 087599e to 14ebc22 Compare January 17, 2025 15:31
@legoktm
Copy link
Member Author

legoktm commented Jan 17, 2025

I mostly got through a full automated migration; something is going wrong during the installation of iptables-persistent/nftables-persistent and /etc/iptables/rules.{v4,v6} are blank, so there's no firewall up, causing the integrity check to fail. Once I bypassed that, it reached the done stage. \o/

@legoktm legoktm force-pushed the stg-upgrade-script branch 4 times, most recently from 239e0fd to bf03fa6 Compare January 17, 2025 22:43
@legoktm legoktm changed the title WIP: Script to upgrade from focal to noble Script to upgrade from focal to noble Jan 17, 2025
@legoktm legoktm marked this pull request as ready for review January 17, 2025 22:45
@legoktm legoktm requested a review from a team as a code owner January 17, 2025 22:45
@legoktm
Copy link
Member Author

legoktm commented Jan 17, 2025

I successfully completed a fully-automated app migration today, so I'm marking this as ready for review. I still need to write up a more comprehensive test plan and stuff but at least the code can begin to be looked at.

@legoktm
Copy link
Member Author

legoktm commented Jan 21, 2025

I've written up the full test plan now.

@legoktm
Copy link
Member Author

legoktm commented Jan 21, 2025

Also as far as the code review goes, I'll try to split this up into more commits to simplify review. I also want to write a brief architecture document that explains how it's all supposed to work.

@legoktm legoktm added this to the SecureDrop 2.12.0 milestone Jan 22, 2025
@cfm cfm self-requested a review January 22, 2025 19:29
@legoktm
Copy link
Member Author

legoktm commented Jan 22, 2025

I also want to write a brief architecture document that explains how it's all supposed to work.

https://github.com/freedomofpress/securedrop/wiki/noble-upgrade-architecture

@legoktm legoktm force-pushed the stg-upgrade-script branch from bf03fa6 to 6753f8a Compare January 22, 2025 22:18
@legoktm
Copy link
Member Author

legoktm commented Jan 22, 2025

Also as far as the code review goes, I'll try to split this up into more commits to simplify review.

I thought it was going to be more, but in the end it's two commits: 1) move some logic out of the check.rs file into a new Rust lib.rs and 2) the migration script and everything else.

As part of the upgrade script, we want to run the check one last time to
ensure that everything is ready to go. Instead of shelling out to it,
move the logc into a Rust library that can be shared by both binaries.
@legoktm legoktm force-pushed the stg-upgrade-script branch from 6753f8a to d9ccbd2 Compare January 23, 2025 21:35
@legoktm
Copy link
Member Author

legoktm commented Jan 23, 2025

(Rebased on top of the Rust upgrade)

@legoktm legoktm force-pushed the stg-upgrade-script branch from d9ccbd2 to faf3129 Compare January 24, 2025 19:54
The script is split into various stages where progress is tracked
on-disk. The script is able to resume where it was at any point, and
needs to, given multiple reboots in the middle.

The new noble-upgrade.json file shipped in the securedrop-config package
is used to control the upgrade process.

Further details of the script are explained inline and at
<https://github.com/freedomofpress/securedrop/wiki/noble-upgrade-architecture>.

Fixes #7332.
@legoktm legoktm force-pushed the stg-upgrade-script branch from faf3129 to 87d6e1a Compare January 24, 2025 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready For Review
Development

Successfully merging this pull request may close these issues.

Create focal -> noble upgrade script
2 participants