-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update to support docker metadata object #213
Conversation
Support Python 3.8
for more information, see https://pre-commit.ci
expected_members = portable_path_list( | ||
"index.json", | ||
"manifest.json", | ||
"oci-layout", | ||
"repositories", | ||
"blobs/sha256", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is combining the set of files that are present in the two different container specs (Docker and OCI), which in my limited testing are mutually exclusive so neither type of container will match a check for all of the files being present.
The PR adding Docker Scout support (#193) got merged, which added a check to the file magic filetypeid code to see if a TAR file matches the Docker spec (e.g. has a manifest.json file within it). I think for the Docker config support you're adding we should use the file type id happening there, which will only run once on a tar file (instead of once per plugin).
That said -- I think a similar is_oci_archive
function should be added to the info extractor that detects if oci-layout
and index.json
files, and a blobs
directory (the spec says it may be empty) are present, and then the id_magic filetypeid code can return a type of OCI_TAR
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my knowledge, Docker was based on the OCI specs and OCI they are not mutually exclusive; rather, they are designed to be interoperable, with Docker being one of the primary implementations of OCI-compliant containers.
If you have any containers that shows them to be mutually exclusive, may I have them for testing please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutually exclusive might not be the right term, since there could be a polyglot container image that matches both specs; the set of files present for an archive following the OCI specification (Docker spec v2?) don't also need to be present alongside the files for the Docker (v1.x?) specification.
As I was saving images in podman and Docker for testing, I found that with podman there is no overlap in the files output for the two archive types, but Docker includes manifest.json and repositories files from the old v1.x spec in its OCI archives for backwards compatibility with older Docker versions.
Note: all of these archives are gzipped so that they could be uploaded to GitHub.
podman save --format=oci-archive -o alpine-podman-oci-archive.tar alpine:latest
podman save --format=docker-archive -o alpine-podman-docker-archive.tar alpine:latest
sudo docker save -o alpine-docker-save.tar alpine:latest
(in Docker version <=24.x this matches docker-archive format from podman)
sudo docker save -o alpine-docker-save.tar alpine:latest
(in Docker version >=25.x this is similar to oci-archive format from podman, except with manifest.json and repositories files for backward compatibility)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on handling the overlap? The 3 diff cases of Docker && OCI, only Docker and only OCI ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For file type identification, I think OCI can take priority if it is both a valid (older) Docker image and OCI image. Same for information extraction, most of the information should be the same.
I don't have any good ideas for how to handle polyglot files in general -- right now our file type identification plugins assume a file is only one type. To support polyglots we'd probably need to make the file type recognizers start returning lists of types matched.
root_key = "dockerImageConfigs" | ||
image_info: Dict[str, List[Dict[str, Any]]] = {root_key: []} | ||
with tarfile.open(filename) as tarball: | ||
# we know the manifest file is present or we wouldn't be this far | ||
assert (manifest_file := optics.tarball.manifest_file(tarball)) | ||
manifest = json.load(manifest_file) | ||
for config_path in optics.manifest.config_path(manifest): | ||
assert (config_file := optics.tarball.config_file(tarball, config_path)) | ||
config = json.load(config_file) | ||
image_info[root_key].append(config) | ||
return image_info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this isn't too crazy, but I'm thinking let's move the code for extracting this config info into the same infoextractor file that runs docker scout (https://github.com/LLNL/Surfactant/blob/main/surfactant/infoextractors/docker_image.py).
The tricky bit I see will be restructuring the logic in that file some so that this config extraction will still happen even if docker scout isn't installed, and then making sure the results from both this and docker scout are included in the output.
For OCI container files, we could rename this file or add a separate infoextractor for handling files identified as OCI_TAR
(the files to read and JSON keys to walk for that is different enough from Docker that I don't think combining them makes sense).
surfactant/plugin/manager.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here could be dropped once the docker config info extraction is merged into the file currently used for docker scout (or if docker_tarball_file gets renamed and updated to handle oci files the name registered here could be updated).
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ins (#217) Co-authored-by: Michael Cutshaw <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Williams, Tyler J <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kendall Harter <[email protected]> Co-authored-by: Kendall Harter <[email protected]> Co-authored-by: Ryan Mast <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Shayna Kapadia <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Addresses #195.