Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no secrets on boot #45

Closed
nrdxp opened this issue May 24, 2021 · 18 comments
Closed

no secrets on boot #45

nrdxp opened this issue May 24, 2021 · 18 comments

Comments

@nrdxp
Copy link

nrdxp commented May 24, 2021

There seems to be an issue after upgrading to latest nixos-unstable where the secrets don't get created properly on boot. Not sure what the root cause is yet, but I can tell you a simple nixos-rebuild switch does successfully make the secrets so it's only a boot time issue.

Strangely the files don't seem to exist on boot as the boot log is full of these message:

May 24 17:39:59 serval-ws stage-2-init: decrypting /nix/store/mxhq2zldxmbqabzhsayqjmnv7hm3r6mj-source/secrets/nrd.age to /run/secrets/nrd...
May 24 17:39:59 serval-ws stage-2-init: Error: No such file or directory (os error 2)
May 24 17:39:59 serval-ws stage-2-init: [ Did rage not do what you expected? Could an error be more useful? ]
May 24 17:39:59 serval-ws stage-2-init: [ Tell us: https://str4d.xyz/rage/report                            ]
May 24 17:39:59 serval-ws stage-2-init: chmod: cannot access '/run/secrets/nrd.tmp': No such file or directory
May 24 17:39:59 serval-ws stage-2-init: chown: cannot access '/run/secrets/nrd.tmp': No such file or directory
May 24 17:39:59 serval-ws stage-2-init: mv: cannot stat '/run/secrets/nrd.tmp': No such file or directory
May 24 17:39:59 serval-ws stage-2-init: decrypting /nix/store/mxhq2zldxmbqabzhsayqjmnv7hm3r6mj-source/secrets/root.age to /run/secrets/root...
May 24 17:39:59 serval-ws stage-2-init: Error: No such file or directory (os error 2)
May 24 17:39:59 serval-ws stage-2-init: [ Did rage not do what you expected? Could an error be more useful? ]
May 24 17:39:59 serval-ws stage-2-init: [ Tell us: https://str4d.xyz/rage/report                            ]
May 24 17:39:59 serval-ws stage-2-init: chmod: cannot access '/run/secrets/root.tmp': No such file or directory
May 24 17:39:59 serval-ws stage-2-init: chown: cannot access '/run/secrets/root.tmp': No such file or directory
May 24 17:39:59 serval-ws stage-2-init: mv: cannot stat '/run/secrets/root.tmp': No such file or directory
May 24 17:39:59 serval-ws stage-2-init: decrypting /nix/store/mxhq2zldxmbqabzhsayqjmnv7hm3r6mj-source/secrets/wireguard.age to /run/secrets/wireguard...
May 24 17:39:59 serval-ws stage-2-init: Error: No such file or directory (os error 2)
May 24 17:39:59 serval-ws stage-2-init: [ Did rage not do what you expected? Could an error be more useful? ]
May 24 17:39:59 serval-ws stage-2-init: [ Tell us: https://str4d.xyz/rage/report                            ]
May 24 17:39:59 serval-ws stage-2-init: chmod: cannot access '/run/secrets/wireguard.tmp': No such file or directory
May 24 17:39:59 serval-ws stage-2-init: chown: cannot access '/run/secrets/wireguard.tmp': No such file or directory
May 24 17:39:59 serval-ws stage-2-init: mv: cannot stat '/run/secrets/wireguard.tmp': No such file or directory
May 24 17:39:59 serval-ws stage-2-init: Activation script snippet 'agenixRoot' failed (1)
May 24 17:39:59 serval-ws stage-2-init: warning: password file ‘/run/secrets/nrd’ does not exist
May 24 17:39:59 serval-ws stage-2-init: warning: password file ‘/run/secrets/root’ does not exist
May 24 17:39:59 serval-ws stage-2-init: [agenix] decrypting non-root secrets...
May 24 17:39:59 serval-ws stage-2-init: decrypting /nix/store/mxhq2zldxmbqabzhsayqjmnv7hm3r6mj-source/secrets/aws.age to /run/secrets/aws...
May 24 17:39:59 serval-ws stage-2-init: Error: No such file or directory (os error 2)
May 24 17:39:59 serval-ws stage-2-init: [ Did rage not do what you expected? Could an error be more useful? ]
...

These files do in fact exist in the nix store and, as I said, all my secrets are properly created with a simple nixos-rebuild switch. My only guess so far is that stage-2-init doesn't have proper access to the nix/store somehow, but I'm not sure why.

@ryantm
Copy link
Owner

ryantm commented May 25, 2021

Darn, I'm sorry that's happening to you!

Don't jump to the conclusion that the missing file is the cyphertext, it could also be the decryption key, or something wrong with the /run/secrets directory.

Are you able to boot with your previous generation?

One thing we should do in the future is make a module option that allows debugging these kinds of failures easily, but for you could try using this module instead which adds some debugging lines to the decryption activation scripts:

https://gist.githubusercontent.com/ryantm/385279275fa740f9fa7dec2973d5d90f/raw/5f583b2f220b36c810c9b11aaccccd56cae845a4/age.nix

Could you also share what options you are setting in the age module? And what version of the agenix repo you are using?

@nrdxp
Copy link
Author

nrdxp commented May 25, 2021

So I've isolated the problem at least, but it's not really agenix fault after all, sorry aboout that.

Before I posted the issue, I had first assumed it may be a problem with the impermanence module that I'm using to maintain some state on a clean root directory after each boot. First thing I tried was commenting the statement that resets the zfs root directory to an empty state at boot. When that didn't work I figured the problem must be elsewhere, but after looking more closely at the logs it looks like my /etc/ssh is still getting mounted after secrets are generated so they don't have access to my host key.

Don't know how to fix yet, but I noticed that my persistent /var/log directory is the only one of my persistent directories to be mounted before secrets (during stage-1). I'm trying to figure out what's so special about it and why all the other persistent dirs only mount in stage-2. I'll close this since it's not really an issue with agenix, but I'll report back when I find a fix.

@nrdxp
Copy link
Author

nrdxp commented May 25, 2021

This PR fixed it by allowing me to set proper mount dependencies on early boot:
NixOS/nixpkgs#86967 (review)

@ryantm
Copy link
Owner

ryantm commented May 25, 2021

Great, I'm glad you figured it out!

@andrevmatos
Copy link

I'm experiencing this issue. It seems that agenix in activationScripts is running before setting up /etc. On normal systems, this would not be a problem, but with impermanence with ssh keys as files links, environment.etc is responsible for putting the symlinks in place, and by the time agenix runs, /etc isn't ready yet. Right now, the workaround is to set age.sshKeyPaths explicitly to the persistent directory (/nix/state in my case), which gets mounted on stage1, but ideally, agenixRoot and agenix should run only after activationScripts."etc".
For reference, the script order I'm seeing on my system (unstable) is stdio, specialfs, agenixRoot, users, groups, agenix, binfmt, binsh, createDirsIn--nix-state, domain, etc, hostname, ...

@ryantm
Copy link
Owner

ryantm commented Aug 17, 2021

I'm not sure it is obvious that the activation scripts should run after "etc" because it depends on users/groups existing, and sometimes users and groups depend on secrets existing to be set up correctly. Maybe we need a more flexible dependency graph.

@ryantm
Copy link
Owner

ryantm commented Aug 17, 2021

Maybe each secret should be its own activation script, then you could use the normal activation script ordering features to get the order you want.

@nrdxp
Copy link
Author

nrdxp commented Aug 17, 2021

@andrevmatos, did you try to solution that worked for me? The above mentioned PR. It is available in nixpkgs-unstable, and it would probably even qualify for a backport

@andrevmatos
Copy link

andrevmatos commented Aug 18, 2021

@nrdxp Thank you for your response. I did check it, but it seems to be useful only for mountpoints. In my case, the persisted volume did get mounted properly on time, but my /etc/ssh isn't [bind]-mounted directly. Instead, I use impermanence's files option to link individual files in /etc, and therefore, /etc needs to be set up before agenix could do its thing. But, on my activation script, agenixRoot ended up firing before /etc had been set up, so it fails. The workaround for now is to use age.sshKeyPaths option to specify the ssh host keys directly from the persisted directory, which did get mounted properly and then the keys are already available when agenix needs them, just the default keypaths didn't work since they become available only after /etc links got set up.
Thanks, @ryantm as well, I understand the issue with user/group setup/ordering, so I think I'll stick with my workaround for now.

@Philipp-M
Copy link

I tried pretty much everything with a zfs fully encrypted pool with something like this:

{
  # tried nixos-unstable with depends
  fileSystems."/nix".depends = [ "/run" "/srv" "/tmp" ];
  boot.initrd.postDeviceCommands = lib.mkAfter ''
        zfs rollback -r rpool/root@blank
  '';
  boot.tmpOnTmpfs = true;
  age.secrets.nextcloud-env = {
    file = ./nextcloud/.env;
    # my preferred path:
    path = "/run/keys/secrets/nextcloud-env";
    # or
    path = "/tmp/secrets/nextcloud-env";
  };
  # even something like this, where the ssh key lies on the unencrypted boot (vfat):
  age.sshKeyPaths = [ "/boot/ssh_host_ed25519_key" ];
}

but the error is always the same:

Oct 30 15:17:20 yasnix stage-1-init: [Sat Oct 30 13:15:30 UTC 2021] importing root ZFS pool "rpool"...
Oct 30 15:17:20 yasnix stage-1-init: [Sat Oct 30 13:17:18 UTC 2021] Enter passphrase for 'rpool':
Oct 30 15:17:20 yasnix stage-1-init: [Sat Oct 30 13:17:18 UTC 2021] 1 / 1 key(s) successfully loaded
Oct 30 15:17:20 yasnix stage-1-init: [Sat Oct 30 13:17:18 UTC 2021] mounting rpool/root on /...
Oct 30 15:17:20 yasnix stage-1-init: [Sat Oct 30 13:17:18 UTC 2021] mounting rpool/root/nix on /nix...
Oct 30 15:17:20 yasnix unknown: booting system configuration /nix/store/42gc5hpy4s8ch483pr2fngqpd8akmnnz-nixos-system-yasnix-21.05.20211029.6c0c301
Oct 30 15:17:20 yasnix stage-2-init: running activation script...
Oct 30 15:17:20 yasnix stage-2-init: [agenix] decrypting root secrets...
Oct 30 15:17:20 yasnix stage-2-init: decrypting /nix/store/mk73kkjzc0466a8qfv4b81ls56wixl0x-.env to /run/keys/secrets/nextcloud-env...
Oct 30 15:17:20 yasnix stage-2-init: Error: No such file or directory (os error 2)
Oct 30 15:17:20 yasnix stage-2-init: [ Did rage not do what you expected? Could an error be more useful? ]
Oct 30 15:17:20 yasnix stage-2-init: [ Tell us: https://str4d.xyz/rage/report                            ]
Oct 30 15:17:20 yasnix stage-2-init: chmod: cannot access '/run/keys/secrets/nextcloud-env.tmp': No such file or directory
Oct 30 15:17:20 yasnix stage-2-init: chown: cannot access '/run/keys/secrets/nextcloud-env.tmp': No such file or directory
Oct 30 15:17:20 yasnix stage-2-init: mv: cannot stat '/run/keys/secrets/nextcloud-env.tmp': No such file or directory
Oct 30 15:17:20 yasnix stage-2-init: Activation script snippet 'agenixRoot' failed (1)
Oct 30 15:17:20 yasnix stage-2-init: [agenix] decrypting non-root secrets...
Oct 30 15:17:20 yasnix stage-2-init: setting up /etc...

I'm still not sure if it fails to find the ssh key or to write the decrypted file...

@nrdxp
Copy link
Author

nrdxp commented Nov 2, 2021

@Philipp-M are you also using impermanence? If not, you may not need the depends statement at all. FWIW I am also using a fully encrypted zfs setup with the following layout:

rpool
rpool/local
rpool/local/games
rpool/local/nix
rpool/local/reserved
rpool/local/root
rpool/safe
rpool/safe/home
rpool/safe/persist

The only depends I have is:

{
  fileSystems."/etc/ssh" = {
    depends = [ "/persist" ];
    neededForBoot = true;
  };
}

due to impermanence.

@Philipp-M
Copy link

It was a simple

{
  fileSystems."/srv".neededForBoot = true;
}

(where the ssh keys are)

darn how could I have missed this 🤦‍♂️ Thanks!
Yes I don't need depends for that since I don't use impermanence

@alarsyo
Copy link

alarsyo commented Jan 18, 2022

Just lost a few hours debugging a similar problem on my machine. Turns out my home wasn't mounted yet, and some keys were in it. rage seems to throw the "No such file or directory" error in that case, even though there was another private key in /etc/ssh available...

neededForBoot did the trick for me as well.

@ghost
Copy link

ghost commented Jul 3, 2022

Maybe each secret should be its own activation script, then you could use the normal activation script ordering features to get the order you want.
@ryantm

Wouldn't consider this solved, as if I use a flake that gets the keys from a private repo this would be the cleanest solution. I can create some workarounds but they'll be ugly.

@n8henrie
Copy link
Collaborator

n8henrie commented Feb 4, 2023

Also just lost a day or two on this, as my config was working fine after nixos-rebuild switch but upon reboot I was locked out of my system due to a combination of:

  • home is on a separate btrfs subvolume that didn't have neededForBoot
  • my ssh key was on that subvolume for my user's passwordfile
  • mutableUsers = false and I've disallowed root logins over SSH (so I could ssh in as my user but not escalate privileges for a nixos-rebuild)

Thankfully was able to recover, but could have saved a decent chunk of time with a warning about this in the README.

@klarkc
Copy link

klarkc commented Sep 12, 2023

neededForBoot did not work for me, as I'm using virtualisation.sharedDirectories and not a fileSystem to mount the host directory in the guest virtual machine.

@klarkc
Copy link

klarkc commented Sep 13, 2023

age.sshKeyPaths is now age.identityPaths, and also it worked as a workaround for me.

@FlafyDev
Copy link

  environment.persistence = {
    "/persist" = {
      # ...
      users.${config.users.main} = {
        directories = [
          ".ssh"
        ];
      };
    };
  };

In my case /persist/home/user/.ssh/agenix didn't mount to /home/user/.ssh/agenix before Agenix ran in stage 2. As a workaround I changed Agenix's identityPath to /persist/home/user/.ssh/agenix and it worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants