Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSAP-1428: read only root filesystem #1099

Conversation

jmencak
Copy link
Contributor

@jmencak jmencak commented Jul 1, 2024

In line with the "Principle of least privilege", add
readOnlyRootFilesystem to the NTO operand's container securityContext.

Key changes:

  • NTO's operand daemonset sets the readOnlyRootFilesystem container
    securityContext.
  • Use memory-backed emptyDir for /etc/tuned.
  • /tmp uses memory-backed emptyDir now to allow the operand's TuneD
    daemon writing temporary files when using profiles such as the
    cpu-partitioning profile.
  • TuneD container's home directory is now /run/ocp-tuned with a link to
    host's ocp-tuned persistent directory:
    persist -> /host/var/lib/ocp-tuned
  • Make /var/lib/tuned directory persistent on the host by:
    /var/lib/tuned -> /host/var/lib/tuned
    The persistent directory is populated by files such as ksm-masked coming
    from cpu-partitioning profile.
  • Change the ocp-tuned-one-shot systemd service to mount the hosts's
    persistent host /var/lib/{ocp-,}tuned directories to
    /host/var/lib/{ocp-,}tuned to simplify the operand code.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 1, 2024

@jmencak: This pull request references PSAP-1428 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.17.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 1, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 1, 2024
@openshift-ci openshift-ci bot requested review from MarSik and Tal-or July 1, 2024 16:25
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 1, 2024
@jmencak
Copy link
Contributor Author

jmencak commented Jul 2, 2024

This looks like CI issues
/retest

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 9, 2024
@jmencak jmencak force-pushed the 4.17-PSAP-1428-readOnlyRootFilesystem branch from e6b28ee to 6893a25 Compare July 10, 2024 09:47
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 10, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 10, 2024

@jmencak: This pull request references PSAP-1428 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.17.0" version, but no target version was set.

In response to this:

In line with the "Principle of least privilege" add readOnlyRootFilesystem to the NTO operand's container securityContext.

Other changes:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 10, 2024

@jmencak: This pull request references PSAP-1428 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.17.0" version, but no target version was set.

In response to this:

In line with the "Principle of least privilege" add readOnlyRootFilesystem to the NTO operand's container securityContext.

Other changes:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jmencak jmencak force-pushed the 4.17-PSAP-1428-readOnlyRootFilesystem branch from 6893a25 to 080d1e2 Compare July 22, 2024 14:13
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 22, 2024
@jmencak jmencak force-pushed the 4.17-PSAP-1428-readOnlyRootFilesystem branch from 080d1e2 to a72f8f8 Compare July 22, 2024 14:29
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 22, 2024
@jmencak jmencak force-pushed the 4.17-PSAP-1428-readOnlyRootFilesystem branch 3 times, most recently from eec1555 to f11e3eb Compare July 24, 2024 12:27
@jmencak
Copy link
Contributor Author

jmencak commented Jul 24, 2024

Finally figured out why the PAO tests are failing! The ocp-tuned-one-shot.service is failing with this PR. Fix to follow hopefully soon.

@jmencak jmencak force-pushed the 4.17-PSAP-1428-readOnlyRootFilesystem branch from f11e3eb to 4c56c41 Compare July 25, 2024 10:48
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 25, 2024
@jmencak jmencak force-pushed the 4.17-PSAP-1428-readOnlyRootFilesystem branch from 4c56c41 to 256e012 Compare July 27, 2024 11:31
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 27, 2024
@jmencak jmencak force-pushed the 4.17-PSAP-1428-readOnlyRootFilesystem branch from 256e012 to 67b1be5 Compare July 29, 2024 14:42
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 29, 2024

@jmencak: This pull request references PSAP-1428 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.17.0" version, but no target version was set.

In response to this:

In line with the "Principle of least privilege", add
readOnlyRootFilesystem to the NTO operand's container securityContext.

NTO's operator deployment does not set readOnlyRootFilesystem here. To
set the securityContext to true for the operator, a complete rewrite of
the metrics server would be needed.

Key changes:

  • NTO's operand daemonset sets the readOnlyRootFilesystem container
    securityContext.
  • /tmp is symlinked to /run/ocp-tuned and the directory ownership is set
    to the operator ID. This allows:
    • the operator metrics server writing the root certificate bundle
      which is then used to verify client certificates and
    • the operand's TuneD daemon writing temporary files when using profiles
      such as the cpu-partitioning profile.
  • Make /var/lib/tuned directory persistent on the host.
  • Change the ocp-tuned-one-shot systemd service to mount the hosts's
    persistent host /var/lib/{ocp-,}tuned directories to
    /host/var/lib/{ocp-,}tuned to simplify the operand code.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@MarSik
Copy link
Contributor

MarSik commented Aug 1, 2024

I think this will break https://github.com/openshift/must-gather/blob/master/collection-scripts/gather_ppc#L56 and we need to find a path to keep it working.

@MarSik
Copy link
Contributor

MarSik commented Aug 1, 2024

On the other hand.. maybe not, we use custom container spec with volume mounts in must gather.

@jmencak jmencak force-pushed the 4.17-PSAP-1428-readOnlyRootFilesystem branch from 67b1be5 to 76c3005 Compare October 10, 2024 19:46
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 10, 2024

@jmencak: This pull request references PSAP-1428 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.18.0" version, but no target version was set.

In response to this:

Read-only container root filesystem

In line with the "Principle of least privilege", add
readOnlyRootFilesystem to the NTO operand's container securityContext.

Key changes:

  • NTO's operand daemonset sets the readOnlyRootFilesystem container
    securityContext.
  • /tmp is symlinked to /run/ocp-tuned and the directory ownership is set
    to the operator ID. This allows:
    • the operand's TuneD daemon writing temporary files when using profiles
      such as the cpu-partitioning profile.
  • Make /var/lib/tuned directory persistent on the host.
  • Change the ocp-tuned-one-shot systemd service to mount the hosts's
    persistent host /var/lib/{ocp-,}tuned directories to
    /host/var/lib/{ocp-,}tuned to simplify the operand code.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jmencak
Copy link
Contributor Author

jmencak commented Oct 17, 2024

/retest

@jmencak jmencak force-pushed the 4.17-PSAP-1428-readOnlyRootFilesystem branch from 76c3005 to 76f0f9c Compare October 25, 2024 17:33
@jmencak
Copy link
Contributor Author

jmencak commented Oct 28, 2024

Seems like infra issues to me
/retest

@jmencak jmencak force-pushed the 4.17-PSAP-1428-readOnlyRootFilesystem branch from 76f0f9c to 4d04a58 Compare October 29, 2024 06:24
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 29, 2024

@jmencak: This pull request references PSAP-1428 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.18.0" version, but no target version was set.

In response to this:

In line with the "Principle of least privilege", add
readOnlyRootFilesystem to the NTO operand's container securityContext.

Key changes:

  • NTO's operand daemonset sets the readOnlyRootFilesystem container
    securityContext.
  • /tmp uses memory-backed emptyDir now to allow the operand's TuneD
    daemon writing temporary files when using profiles such as the
    cpu-partitioning profile.
  • TuneD container's home directory is now /run/ocp-tuned with a link to
    host's ocp-tuned persistent directory:
    persist -> /host/var/lib/ocp-tuned
  • Make /var/lib/tuned directory persistent on the host by:
    /var/lib/tuned -> /host/var/lib/tuned
    The persistent directory is populated by files such as ksm-masked coming
    from cpu-partitioning profile.
  • Change the ocp-tuned-one-shot systemd service to mount the hosts's
    persistent host /var/lib/{ocp-,}tuned directories to
    /host/var/lib/{ocp-,}tuned to simplify the operand code.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jmencak
Copy link
Contributor Author

jmencak commented Nov 1, 2024

Keeping this still WiP as we need to evaluate the impact of this on the installer rendering manifests.

@jmencak jmencak force-pushed the 4.17-PSAP-1428-readOnlyRootFilesystem branch from 4d04a58 to 6af14fd Compare November 1, 2024 19:40
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 1, 2024

@jmencak: This pull request references PSAP-1428 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.18.0" version, but no target version was set.

In response to this:

In line with the "Principle of least privilege", add
readOnlyRootFilesystem to the NTO operand's container securityContext.

Key changes:

  • NTO's operand daemonset sets the readOnlyRootFilesystem container
    securityContext.
  • Use memory-backed emptyDir for /etc/tuned.
  • /tmp uses memory-backed emptyDir now to allow the operand's TuneD
    daemon writing temporary files when using profiles such as the
    cpu-partitioning profile.
  • TuneD container's home directory is now /run/ocp-tuned with a link to
    host's ocp-tuned persistent directory:
    persist -> /host/var/lib/ocp-tuned
  • Make /var/lib/tuned directory persistent on the host by:
    /var/lib/tuned -> /host/var/lib/tuned
    The persistent directory is populated by files such as ksm-masked coming
    from cpu-partitioning profile.
  • Change the ocp-tuned-one-shot systemd service to mount the hosts's
    persistent host /var/lib/{ocp-,}tuned directories to
    /host/var/lib/{ocp-,}tuned to simplify the operand code.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

In line with the "Principle of least privilege", add
readOnlyRootFilesystem to the NTO operand's container securityContext.

Key changes:
  * NTO's operand daemonset sets the readOnlyRootFilesystem container
    securityContext.
  * Use memory-backed emptyDir for /etc/tuned.
  * /tmp uses memory-backed emptyDir now to allow the operand's TuneD
    daemon writing temporary files when using profiles such as the
    cpu-partitioning profile.
  * TuneD container's home directory is now /run/ocp-tuned with a link to
    host's ocp-tuned persistent directory:
    persist -> /host/var/lib/ocp-tuned
  * Make /var/lib/tuned directory persistent on the host by:
    /var/lib/tuned -> /host/var/lib/tuned
    The persistent directory is populated by files such as ksm-masked coming
    from cpu-partitioning profile.
  * Change the ocp-tuned-one-shot systemd service to mount the hosts's
    persistent host /var/lib/{ocp-,}tuned directories to
    /host/var/lib/{ocp-,}tuned to simplify the operand code.
@jmencak jmencak force-pushed the 4.17-PSAP-1428-readOnlyRootFilesystem branch from 6af14fd to 224fb45 Compare November 4, 2024 18:40
@jmencak jmencak changed the title WiP: PSAP-1428: read only root filesystem PSAP-1428: read only root filesystem Nov 5, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 5, 2024
@jmencak
Copy link
Contributor Author

jmencak commented Nov 5, 2024

Removing WiP. I've successfully tested that this functionality no longer breaks installer rendering manifests. I've verified that both by locally testing the render functionality by running podman and by installing SNO with a custom openshift/release payload and adding a PerformanceProfile as a manifest. The manifest was rendered successfully, MachineConfig was created and /proc/cmdline successfully updated.

This PR is ready for review.

@jmencak
Copy link
Contributor Author

jmencak commented Nov 7, 2024

I think this will break https://github.com/openshift/must-gather/blob/master/collection-scripts/gather_ppc#L56 and we need to find a path to keep it working.

Forgot about this one. Let me test this.
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 7, 2024
@jmencak
Copy link
Contributor Author

jmencak commented Nov 7, 2024

I think this will break https://github.com/openshift/must-gather/blob/master/collection-scripts/gather_ppc#L56 and we need to find a path to keep it working.

Forgot about this one. Let me test this. /hold

I believe this still works @MarSik .

/hold cancel

Collected data from a SNO install

total 1596
drwxr-xr-x. 2 mencak mencak    300 Nov  7 11:28 .
drwxr-xr-x. 3 mencak mencak     80 Nov  7 11:28 ..
-rw-r--r--. 1 mencak mencak 266600 Nov  7 11:27 cpu_affinities.json
-rw-r--r--. 1 mencak mencak 444117 Nov  7 11:27 dmesg
-rw-r--r--. 1 mencak mencak    154 Nov  7 11:27 ethtool_channels
-rw-r--r--. 1 mencak mencak   1794 Nov  7 11:27 ethtool_features
-rw-r--r--. 1 mencak mencak   2489 Nov  7 11:27 irq_affinities.json
-rw-r--r--. 1 mencak mencak    378 Nov  7 11:27 lscpu
-rw-r--r--. 1 mencak mencak  17833 Nov  7 11:27 lspci
-rw-r--r--. 1 mencak mencak  15892 Nov  7 11:27 podresources.json
-rw-r--r--. 1 mencak mencak  32873 Nov  7 11:27 pods_info.json
-rw-r--r--. 1 mencak mencak    739 Nov  7 11:27 proc_cmdline
-rw-r--r--. 1 mencak mencak 409069 Nov  7 11:27 sno_logs_kubelet.gz
-rw-r--r--. 1 mencak mencak 384273 Nov  7 11:27 sysinfo.log
-rw-r--r--. 1 mencak mencak  26550 Nov  7 11:27 sysinfo.tgz

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 7, 2024
Copy link
Contributor

@MarSik MarSik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 7, 2024
Copy link
Contributor

openshift-ci bot commented Nov 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmencak, MarSik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 7bca466 and 2 for PR HEAD 224fb45 in total

@jmencak
Copy link
Contributor Author

jmencak commented Nov 7, 2024

/retest

Copy link
Contributor

openshift-ci bot commented Nov 7, 2024

@jmencak: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 34a7dc6 into openshift:master Nov 7, 2024
16 checks passed
@jmencak jmencak deleted the 4.17-PSAP-1428-readOnlyRootFilesystem branch November 7, 2024 22:00
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: cluster-node-tuning-operator
This PR has been included in build cluster-node-tuning-operator-container-v4.18.0-202411080007.p0.g34a7dc6.assembly.stream.el9.
All builds following this will include this PR.

jmencak added a commit to jmencak/cluster-node-tuning-operator that referenced this pull request Nov 20, 2024
PR openshift#1099 broke upstream and OKD builds as it added a dependency on
rsync.  quay.io/centos/centos:stream9 image does not ship rsync
by default.

Once we can use the new golang's recursive copy (CopyFS) functionality
in go 1.23 (golang/go#62484), use it and
remove the dependency on rsync.
openshift-merge-bot bot pushed a commit that referenced this pull request Nov 21, 2024
PR #1099 broke upstream and OKD builds as it added a dependency on
rsync.  quay.io/centos/centos:stream9 image does not ship rsync
by default.

Once we can use the new golang's recursive copy (CopyFS) functionality
in go 1.23 (golang/go#62484), use it and
remove the dependency on rsync.

Co-authored-by: Jiri Mencak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants