Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update to support docker metadata object #213

Closed
wants to merge 31 commits into from
Closed

update to support docker metadata object #213

wants to merge 31 commits into from

Conversation

edalily
Copy link
Collaborator

@edalily edalily commented Jun 6, 2024

Addresses #195.

@edalily edalily requested a review from nightlark June 6, 2024 02:58
Comment on lines +49 to +55
expected_members = portable_path_list(
"index.json",
"manifest.json",
"oci-layout",
"repositories",
"blobs/sha256",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is combining the set of files that are present in the two different container specs (Docker and OCI), which in my limited testing are mutually exclusive so neither type of container will match a check for all of the files being present.

The PR adding Docker Scout support (#193) got merged, which added a check to the file magic filetypeid code to see if a TAR file matches the Docker spec (e.g. has a manifest.json file within it). I think for the Docker config support you're adding we should use the file type id happening there, which will only run once on a tar file (instead of once per plugin).

That said -- I think a similar is_oci_archive function should be added to the info extractor that detects if oci-layout and index.json files, and a blobs directory (the spec says it may be empty) are present, and then the id_magic filetypeid code can return a type of OCI_TAR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my knowledge, Docker was based on the OCI specs and OCI they are not mutually exclusive; rather, they are designed to be interoperable, with Docker being one of the primary implementations of OCI-compliant containers.

If you have any containers that shows them to be mutually exclusive, may I have them for testing please?

Copy link
Collaborator

@nightlark nightlark Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutually exclusive might not be the right term, since there could be a polyglot container image that matches both specs; the set of files present for an archive following the OCI specification (Docker spec v2?) don't also need to be present alongside the files for the Docker (v1.x?) specification.

As I was saving images in podman and Docker for testing, I found that with podman there is no overlap in the files output for the two archive types, but Docker includes manifest.json and repositories files from the old v1.x spec in its OCI archives for backwards compatibility with older Docker versions.

Note: all of these archives are gzipped so that they could be uploaded to GitHub.

podman save --format=oci-archive -o alpine-podman-oci-archive.tar alpine:latest

podman save --format=docker-archive -o alpine-podman-docker-archive.tar alpine:latest

sudo docker save -o alpine-docker-save.tar alpine:latest (in Docker version <=24.x this matches docker-archive format from podman)

sudo docker save -o alpine-docker-save.tar alpine:latest (in Docker version >=25.x this is similar to oci-archive format from podman, except with manifest.json and repositories files for backward compatibility)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on handling the overlap? The 3 diff cases of Docker && OCI, only Docker and only OCI ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For file type identification, I think OCI can take priority if it is both a valid (older) Docker image and OCI image. Same for information extraction, most of the information should be the same.

I don't have any good ideas for how to handle polyglot files in general -- right now our file type identification plugins assume a file is only one type. To support polyglots we'd probably need to make the file type recognizers start returning lists of types matched.

Comment on lines 75 to 85
root_key = "dockerImageConfigs"
image_info: Dict[str, List[Dict[str, Any]]] = {root_key: []}
with tarfile.open(filename) as tarball:
# we know the manifest file is present or we wouldn't be this far
assert (manifest_file := optics.tarball.manifest_file(tarball))
manifest = json.load(manifest_file)
for config_path in optics.manifest.config_path(manifest):
assert (config_file := optics.tarball.config_file(tarball, config_path))
config = json.load(config_file)
image_info[root_key].append(config)
return image_info
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this isn't too crazy, but I'm thinking let's move the code for extracting this config info into the same infoextractor file that runs docker scout (https://github.com/LLNL/Surfactant/blob/main/surfactant/infoextractors/docker_image.py).

The tricky bit I see will be restructuring the logic in that file some so that this config extraction will still happen even if docker scout isn't installed, and then making sure the results from both this and docker scout are included in the output.

For OCI container files, we could rename this file or add a separate infoextractor for handling files identified as OCI_TAR (the files to read and JSON keys to walk for that is different enough from Docker that I don't think combining them makes sense).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here could be dropped once the docker config info extraction is merged into the file currently used for docker scout (or if docker_tarball_file gets renamed and updated to handle oci files the name registered here could be updated).

pre-commit-ci bot and others added 22 commits August 20, 2024 09:34
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Williams, Tyler J <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kendall Harter <[email protected]>
Co-authored-by: Kendall Harter <[email protected]>
Co-authored-by: Ryan Mast <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Shayna Kapadia <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
dependabot bot and others added 4 commits August 20, 2024 09:36
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@edalily edalily closed this by deleting the head repository Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants