Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard power cycle of centos 7 ostree image typically causes boot to fail with error: failed to load selinux policy #1963

Closed
ghost opened this issue Jan 16, 2020 · 11 comments

Comments

@ghost
Copy link

ghost commented Jan 16, 2020

Custom Centos 7 atomic install.

rpm-ostree:
 Version: 2018.5
 Git: 4a4d4fb373d9c0c276d78149f68fff9ab90ab3a0
 Features:
libostree:
 Version: '2019.1'
 Git: eea64d15f33af6e3846b9d12502a16a10c3242c6
 Features:
  - libcurl
  - libsoup
  - gpgme
  - libarchive
  - selinux
  - openssl
  - libmount
  - release
  - p2p
Deployments:
* ostree://ostree-local:centos/7/x86_64/standard/XXXXXXXX
                 Timestamp: 2020-01-15 15:27:56
                    Commit: XX
              GPGSignature: Valid signature by XXX

Expected vs actual behavior

We should be able to power cycle the PC.

Steps to reproduce it

  1. Install customised Centos 7 from USB
  2. wait for login prompt
  3. sometime later hard power cycle the PC
  4. boot cycle fails at switch root, with error unable to load SELinux Policy.

Further notes

Soft reboot works OK. Power cycling often fails as above, but not always.

From a freshly installed ostree deployment I can soft reboot several times and then hard powercycle and the boot will subseqently often fail in the same way.

I have tried running semodule -B in a freshly installed deployment, soft powercycle and then hard powercycle. This often leads to the same failure

By connecting the disk from a PC that has failed to boot to another PC, many of the selinux files in the deployed etc/selinux/targeted/active folder are seen to be different to the deployed usr/etc/selinux/targeted/active/ - in fact they seem to be empty!

eg sample from diff across the two folders:

Files ./targeted/active/modules/100/zosremote/hll and ../../usr/etc/selinux/targeted/active/modules/100/zosremote/hll differ
Files ./targeted/active/modules/100/zosremote/lang_ext and ../../usr/etc/selinux/targeted/active/modules/100/zosremote/lang_ext differ
Files ./targeted/active/modules/200/container/cil and ../../usr/etc/selinux/targeted/active/modules/200/container/cil differ
Files ./targeted/active/modules/200/container/hll and ../../usr/etc/selinux/targeted/active/modules/200/container/hll differ
Files ./targeted/active/modules/200/container/lang_ext and ../../usr/etc/selinux/targeted/active/modules/200/container/lang_ext differ
Files ./targeted/active/modules/200/nginx/cil and ../../usr/etc/selinux/targeted/active/modules/200/nginx/cil differ
Files ./targeted/active/modules/200/nginx/hll and ../../usr/etc/selinux/targeted/active/modules/200/nginx/hll differ
Files ./targeted/active/modules/200/nginx/lang_ext and ../../usr/etc/selinux/targeted/active/modules/200/nginx/lang_ext differ
Files ./targeted/active/policy.kern and ../../usr/etc/selinux/targeted/active/policy.kern differ
Files ./targeted/active/policy.linked and ../../usr/etc/selinux/targeted/active/policy.linked differ
Files ./targeted/active/ports.local and ../../usr/etc/selinux/targeted/active/ports.local differ
Files ./targeted/active/seusers and ../../usr/etc/selinux/targeted/active/seusers differ
Files ./targeted/active/seusers.linked and ../../usr/etc/selinux/targeted/active/seusers.linked differ
Files ./targeted/active/users_extra and ../../usr/etc/selinux/targeted/active/users_extra differ
Files ./targeted/active/users_extra.linked and ../../usr/etc/selinux/targeted/active/users_extra.linked differ
Files ./targeted/contexts/files/file_contexts and ../../usr/etc/selinux/targeted/contexts/files/file_contexts differ
Files ./targeted/contexts/files/file_contexts.bin and ../../usr/etc/selinux/targeted/contexts/files/file_contexts.bin differ
Files ./targeted/contexts/files/file_contexts.homedirs and ../../usr/etc/selinux/targeted/contexts/files/file_contexts.homedirs differ
Files ./targeted/contexts/files/file_contexts.homedirs.bin and ../../usr/etc/selinux/targeted/contexts/files/file_contexts.homedirs.bin differ
Files ./targeted/contexts/files/file_contexts.local and ../../usr/etc/selinux/targeted/contexts/files/file_contexts.local differ
Files ./targeted/contexts/files/file_contexts.local.bin and ../../usr/etc/selinux/targeted/contexts/files/file_contexts.local.bin differ

In addition, if I copy from usr/etc/selinux to etc/linux and then reconnect to the PC, the PC boots successfully, however subsequent hard powercycles often lead to the same failure.

Something seems to vaping each file in the selinux folder.

@jlebon
Copy link
Member

jlebon commented Jan 16, 2020

Hmm, this sounds like it might be related to the staging API. In v2018.5, it was still pretty new, and looking through the git logs, the only way we supported staged deployments was via AutomaticUpdatePolicy=ex-stage. Are you using that?

@ghost
Copy link
Author

ghost commented Jan 16, 2020

We've not explicitly set that, so probably no. A text search in our repository shows an object file with #AutomaticUpdatePolicy=none. ie commented out.

@ghost
Copy link
Author

ghost commented Jan 16, 2020

I've been doing a few more tests - power on the PC, wait for the login screen to at least appear and then hard power off after that. Looking at the selinux folder by mounting the disk on another PC - the files have already been cleared before boot starts, so this suggests to me a process is regularly running which is doing something to these files? Is that possible?

@jlebon
Copy link
Member

jlebon commented Jan 16, 2020

Hmm, I'd check the journal for any hints. A misconfigured systemd-tmpfiles dropin perhaps? I'm not aware offhand of anything that would nuke SELinux files like this.

@ghost
Copy link
Author

ghost commented Jan 16, 2020

I'll take a look.

@ghost
Copy link
Author

ghost commented Jan 20, 2020

Finding it a bit difficult to debug, but best guess so far is that selinux is rebuilding the policy after first boot. If this process is interrupted, eg power off, the policy in /etc/selinux/targeted/active are empty which essentially bricks the PC (selinux unable to load policy at next boot). Still need to confirm this is actually what is happening, but I do get the same behaviour if I run semodule -B and then immediately hard power off. In any case, I really think semodule shouldn't be this sensitive and should be more atomic (eg build the policy in a temp/secondary location) and then switch when complete.

@ghost
Copy link
Author

ghost commented Jan 21, 2020

The source of the problem was a script running setsebool on startup. As above, this causes the selinux policy files to be rebuilt and if the PC is power cycled during that process, it typically will fail to boot. This happens on non ostree PCs. I will close this ticket and raise an issue with selinux on this - I think it would be better if it is was more resiliant to this.

@ghost ghost closed this as completed Jan 21, 2020
@jlebon
Copy link
Member

jlebon commented Jan 21, 2020

Hmm, is it a service actually shipped with SELinux though? Likely the script is using -P when it shouldn't. See also #27 (comment).

@ghost
Copy link
Author

ghost commented Jan 21, 2020

No, setsebool was in the ExecStartPre section for a custom service (setsebool -P httpd_can_network_connect on), and yes it had a -P. I think this should be in the rpm, however this doesn't work as selinux isn't running when composing ostree. Any idea waht the 'correct' way of setting this from inside an rpm spec?

@jlebon
Copy link
Member

jlebon commented Jan 24, 2020

however this doesn't work as selinux isn't running when composing ostree.

Hmm, you mean disabled completely or in permissive mode? rpm-ostree today does need SELinux on (even if permissive) on the compose side. setsebool -P -N from %post should work.

@ghost
Copy link
Author

ghost commented Feb 6, 2020

I'm pretty convinced setsetbool isn't working in our %post scripts. Could this be because of #1634? Building on centos 7, we have policycoreutils-2.5-33.el7.x86_64, which seems to be before the fix introduced in policycoreutils-2.8-15.fc29.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant