-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add podman system check
#22733
Add podman system check
#22733
Conversation
Ephemeral COPR build failed. @containers/packit-build please check. |
Cockpit tests failed for commit 701dff4c6439193118bd1fddd2915d9daf2951ee. @martinpitt, @jelly, @mvollmer please check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan at all of actually having commands in podman and the API to corrupt the local storage, it bloats podman, makes it a bit slower due extra argument setup and most importantly should never ever be used by any user outside of testing.
If we need these extra commands for testing then I would move them into their own command, i.e. cmd/podman-testing or something like this.
pkg/bindings/testing/testing.go
Outdated
"github.com/containers/podman/v5/pkg/domain/entities/types" | ||
) | ||
|
||
func CreateStorageLayer(ctx context.Context, options *CreateStorageLayerOptions) (*types.CreateStorageLayerReport, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is technically part of a public API as we support pkg/bindings for consumers which is very bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to an internal package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll stay out of the broader conversation for now. Some specific comments inline about system tests.
test/system/helpers.bash
Outdated
} | ||
|
||
function make_layer_blob() { | ||
local tmpdir=$(mktemp -d --tmpdir=${BATS_TMPDIR:-/tmp} podman_bats.XXXXXX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already have $PODMAN_TMPDIR
for exactly this purpose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to using that as a parent directory.
@@ -1178,5 +1178,57 @@ function wait_for_command_output() { | |||
die "Timed out waiting for '$cmd' to return '$want'" | |||
} | |||
|
|||
function make_random_file() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new functions are unlikely to be used anywhere outside the new test file; they probably belong there instead of the global helpers file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved two of the three, left make_random_file()
where it was.
test/system/helpers.bash
Outdated
|
||
function testing_make_image_metadata_for_layer_blobs() { | ||
local tmpdir=$(mktemp -d --tmpdir=${BATS_TMPDIR:-/tmp} podman_bats.XXXXXX) | ||
local imageID=$1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: imageID=${1?Missing IMAGEID argument}
to catch caller errors early
# | ||
|
||
load helpers | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module urgently needs a teardown()
: otherwise, any individual test failure will leave the system in an undefined and probably unrecoverable state. I don't know what said teardown()
will look like; it's possible that it will involve system reset -f
.
Or, perhaps better, run all these tests with temporary storage; see https://github.com/containers/podman/blob/main/test/system/330-corrupt-images.bats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that the current testing framework makes it possible to do any of that for a subset of the tests, when they are being run as part of a remote testing suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True: there is no way to use --root
et al with podman-remote
. Is there any actual reason to run these tests under podman-remote
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expected that we would be keenly interested in having this function available for Podman Desktop and podman-machine cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you have written a lot of badh test code already but maybe doing this in e2e would be better?
All tests have a custom --roor/--runroot by default including remote. Also I assume these checks are expensive and slow? In this case it would be another reason for e2e as they run parallel so slow tests are not that bad there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Poring over actual data, they can be time-intensive. The ones we set up for testing are small enough that the new tests finish in about 15 seconds on my dev machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear I don't want you to force you to write it all to e2e if you don't want this.
15s is fine I guess, I am working on strategies to make the system tests faster so I will keep this in mind but I know there are worse offenders so not a reason to block this one
407aba9
to
7edf6b6
Compare
Makefile
Outdated
@@ -678,7 +688,7 @@ remotesystem: | |||
if timeout -v 1 true; then \ | |||
SOCK_FILE=$(shell mktemp --dry-run --tmpdir podman_tmp_XXXX);\ | |||
export PODMAN_SOCKET=unix://$$SOCK_FILE; \ | |||
./bin/podman system service --timeout=0 $$PODMAN_SOCKET > $(if $(PODMAN_SERVER_LOG),$(PODMAN_SERVER_LOG),/dev/null) 2>&1 & \ | |||
./bin/podman system service --timeout=0 --features=testing $$PODMAN_SOCKET > $(if $(PODMAN_SERVER_LOG),$(PODMAN_SERVER_LOG),/dev/null) 2>&1 & \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I Still see no point in running the system service for this? You could just execute the podman-testing binary now and there really should not need to exists a remote API to corrupt the storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we're doing remote testing, the client has access to the storage that the server is using? I thought that we'd taken steps to make that harder in order to avoid cases where we unintentionally used the non-remote code paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes in both e2e and system tests, it is not as straight forward but in both tests suite there are some tests that manually overwrite the default command
podman/test/system/001-basic.bats
Line 114 in e53b96c
# $PODMAN may be a space-separated string, e.g. if we include a --url. |
So I don't think there is any problem in just executing the local test binary to corrupt the storage of the service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, then, giving that a try.
4d9a87e
to
a837188
Compare
/retitle Add |
podman system check
podman system check
Ephemeral COPR build failed. @containers/packit-build please check. |
ignore rawhide |
95b0563
to
c172e1e
Compare
cmd/podman/utils/utils.go
Outdated
@@ -104,6 +106,62 @@ func PrintNetworkPruneResults(networkPruneReport []*entities.NetworkPruneReport, | |||
return errs.PrintErrors() | |||
} | |||
|
|||
func PrintSystemCheckResults(report *entities.SystemCheckReport) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think? there is precedent for output related functions to live in cmd/podman/foo alongside the command itself? I think we have a bunch of PrintJSON and so forths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, for the ones that aren't called from multiple packages, that does appear to be the pattern. Moved it.
Add a `podman system check` that performs consistency checks on local storage, optionally removing damaged items so that they can be recreated. Signed-off-by: Nalin Dahyabhai <[email protected]>
Testing `podman system check` requires that we have a way to intentionally introduce storage corruptions. Add a hidden `podman testing` command that provides the necessary internal logic in subcommands. Stub out the tunnel implementation for now. Signed-off-by: Nalin Dahyabhai <[email protected]>
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mheon, nalind The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Does this PR introduce a user-facing change?