Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add podman system check #22733

Merged
merged 2 commits into from
Jun 13, 2024
Merged

Conversation

nalind
Copy link
Member

@nalind nalind commented May 16, 2024

Does this PR introduce a user-facing change?

Added `podman system check`.

@openshift-ci openshift-ci bot added release-note do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels May 16, 2024
@github-actions github-actions bot added the kind/api-change Change to remote API; merits scrutiny label May 16, 2024
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

Copy link

Cockpit tests failed for commit 701dff4c6439193118bd1fddd2915d9daf2951ee. @martinpitt, @jelly, @mvollmer please check.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan at all of actually having commands in podman and the API to corrupt the local storage, it bloats podman, makes it a bit slower due extra argument setup and most importantly should never ever be used by any user outside of testing.

If we need these extra commands for testing then I would move them into their own command, i.e. cmd/podman-testing or something like this.

"github.com/containers/podman/v5/pkg/domain/entities/types"
)

func CreateStorageLayer(ctx context.Context, options *CreateStorageLayerOptions) (*types.CreateStorageLayerReport, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is technically part of a public API as we support pkg/bindings for consumers which is very bad

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to an internal package.

Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll stay out of the broader conversation for now. Some specific comments inline about system tests.

}

function make_layer_blob() {
local tmpdir=$(mktemp -d --tmpdir=${BATS_TMPDIR:-/tmp} podman_bats.XXXXXX)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already have $PODMAN_TMPDIR for exactly this purpose

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to using that as a parent directory.

@@ -1178,5 +1178,57 @@ function wait_for_command_output() {
die "Timed out waiting for '$cmd' to return '$want'"
}

function make_random_file() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new functions are unlikely to be used anywhere outside the new test file; they probably belong there instead of the global helpers file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved two of the three, left make_random_file() where it was.


function testing_make_image_metadata_for_layer_blobs() {
local tmpdir=$(mktemp -d --tmpdir=${BATS_TMPDIR:-/tmp} podman_bats.XXXXXX)
local imageID=$1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: imageID=${1?Missing IMAGEID argument} to catch caller errors early

#

load helpers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module urgently needs a teardown(): otherwise, any individual test failure will leave the system in an undefined and probably unrecoverable state. I don't know what said teardown() will look like; it's possible that it will involve system reset -f.

Or, perhaps better, run all these tests with temporary storage; see https://github.com/containers/podman/blob/main/test/system/330-corrupt-images.bats

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that the current testing framework makes it possible to do any of that for a subset of the tests, when they are being run as part of a remote testing suite.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True: there is no way to use --root et al with podman-remote. Is there any actual reason to run these tests under podman-remote?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected that we would be keenly interested in having this function available for Podman Desktop and podman-machine cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you have written a lot of badh test code already but maybe doing this in e2e would be better?
All tests have a custom --roor/--runroot by default including remote. Also I assume these checks are expensive and slow? In this case it would be another reason for e2e as they run parallel so slow tests are not that bad there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poring over actual data, they can be time-intensive. The ones we set up for testing are small enough that the new tests finish in about 15 seconds on my dev machine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear I don't want you to force you to write it all to e2e if you don't want this.
15s is fine I guess, I am working on strategies to make the system tests faster so I will keep this in mind but I know there are worse offenders so not a reason to block this one

@nalind nalind force-pushed the system-check branch 5 times, most recently from 407aba9 to 7edf6b6 Compare May 22, 2024 21:54
Makefile Outdated
@@ -678,7 +688,7 @@ remotesystem:
if timeout -v 1 true; then \
SOCK_FILE=$(shell mktemp --dry-run --tmpdir podman_tmp_XXXX);\
export PODMAN_SOCKET=unix://$$SOCK_FILE; \
./bin/podman system service --timeout=0 $$PODMAN_SOCKET > $(if $(PODMAN_SERVER_LOG),$(PODMAN_SERVER_LOG),/dev/null) 2>&1 & \
./bin/podman system service --timeout=0 --features=testing $$PODMAN_SOCKET > $(if $(PODMAN_SERVER_LOG),$(PODMAN_SERVER_LOG),/dev/null) 2>&1 & \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I Still see no point in running the system service for this? You could just execute the podman-testing binary now and there really should not need to exists a remote API to corrupt the storage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we're doing remote testing, the client has access to the storage that the server is using? I thought that we'd taken steps to make that harder in order to avoid cases where we unintentionally used the non-remote code paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in both e2e and system tests, it is not as straight forward but in both tests suite there are some tests that manually overwrite the default command

# $PODMAN may be a space-separated string, e.g. if we include a --url.

So I don't think there is any problem in just executing the local test binary to corrupt the storage of the service.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then, giving that a try.

@nalind nalind force-pushed the system-check branch 4 times, most recently from 4d9a87e to a837188 Compare May 28, 2024 21:42
@nalind
Copy link
Member Author

nalind commented May 28, 2024

/retitle Add podman system check

@openshift-ci openshift-ci bot changed the title WIP: add podman system check Add podman system check May 28, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 28, 2024
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@lsm5
Copy link
Member

lsm5 commented May 29, 2024

ignore rawhide

@nalind nalind force-pushed the system-check branch 2 times, most recently from 95b0563 to c172e1e Compare May 29, 2024 15:10
@@ -104,6 +106,62 @@ func PrintNetworkPruneResults(networkPruneReport []*entities.NetworkPruneReport,
return errs.PrintErrors()
}

func PrintSystemCheckResults(report *entities.SystemCheckReport) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think? there is precedent for output related functions to live in cmd/podman/foo alongside the command itself? I think we have a bunch of PrintJSON and so forths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, for the ones that aren't called from multiple packages, that does appear to be the pattern. Moved it.

nalind added 2 commits June 4, 2024 10:00
Add a `podman system check` that performs consistency checks on local
storage, optionally removing damaged items so that they can be
recreated.

Signed-off-by: Nalin Dahyabhai <[email protected]>
Testing `podman system check` requires that we have a way to
intentionally introduce storage corruptions.  Add a hidden `podman
testing` command that provides the necessary internal logic in
subcommands.  Stub out the tunnel implementation for now.

Signed-off-by: Nalin Dahyabhai <[email protected]>
@rhatdan
Copy link
Member

rhatdan commented Jun 11, 2024

LGTM
@Luap99 @baude @mheon PTAL

@mheon
Copy link
Member

mheon commented Jun 13, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 13, 2024
@mheon
Copy link
Member

mheon commented Jun 13, 2024

/approve

Copy link
Contributor

openshift-ci bot commented Jun 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon, nalind

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 13, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 00bcd9a into containers:main Jun 13, 2024
75 checks passed
@nalind nalind deleted the system-check branch June 13, 2024 13:34
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 12, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Sep 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/api-change Change to remote API; merits scrutiny lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants