-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update to support docker metadata object #213
Changes from 5 commits
6d122a3
84bd26d
ca5e470
d24ed9d
79882b1
e977238
3570a85
f124282
629934f
ba586bc
387d6fc
8716591
8d2b08a
872ba3f
c08f507
6989d3c
bf34a3a
ddd4fd4
209e23d
1a408ba
1e15466
a702473
14de70d
c2b1d06
7fd7120
3dd0262
9d8f9bf
9099673
d86a277
34676aa
074edb3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# Copyright 2024 Lawrence Livermore National Security, LLC | ||
# see: ${repository}/LICENSE | ||
# | ||
# SPDX-License-Identifier: MIT | ||
|
||
import json | ||
import tarfile | ||
from pathlib import PurePosixPath | ||
from typing import IO, Any, Dict, List, Union | ||
|
||
import surfactant.plugin | ||
from surfactant.sbomtypes import SBOM, Software | ||
|
||
|
||
class optics: | ||
class tarball: | ||
@staticmethod | ||
def manifest_file(tarball: tarfile.TarFile) -> Union[IO[bytes], None]: | ||
return tarball.extractfile( | ||
{tarinfo.name: tarinfo for tarinfo in tarball.getmembers()}["manifest.json"] | ||
) | ||
|
||
@staticmethod | ||
def config_file(tarball: tarfile.TarFile, path: str) -> Union[IO[bytes], None]: | ||
return tarball.extractfile( | ||
{tarinfo.name: tarinfo for tarinfo in tarball.getmembers()}[path] | ||
) | ||
|
||
class manifest: | ||
@staticmethod | ||
def config_path(manifest: List[Dict[str, Any]]) -> List[str]: | ||
path = "Config" | ||
return [entry[path] for entry in manifest] | ||
|
||
@staticmethod | ||
def repo_tags(manifest: List[Dict[str, Any]]) -> List[str]: | ||
path = "RepoTags" | ||
return [entry[path] for entry in manifest] | ||
|
||
|
||
def portable_path_list(*paths: str): | ||
"""Convert paths to a portable format acknowledged by""" | ||
return tuple([str(PurePosixPath(path_str)) for path_str in paths]) | ||
|
||
|
||
def supports_file(filename: str, filetype: str) -> bool: | ||
EXPECTED_FILETYPE = "TAR" | ||
|
||
expected_members = portable_path_list( | ||
"index.json", | ||
"manifest.json", | ||
"oci-layout", | ||
"repositories", | ||
"blobs/sha256", | ||
) | ||
|
||
if filetype != EXPECTED_FILETYPE: | ||
return False | ||
|
||
with tarfile.open(filename) as this_tarfile: | ||
found_members = portable_path_list(*[member.name for member in this_tarfile.getmembers()]) | ||
|
||
return all([expected_member in found_members for expected_member in expected_members]) | ||
|
||
|
||
@surfactant.plugin.hookimpl | ||
def extract_file_info(sbom: SBOM, software: Software, filename: str, filetype: str) -> object: | ||
if not supports_file(filename, filetype): | ||
return None | ||
return extract_image_info(filename) | ||
|
||
|
||
def extract_image_info(filename: str): | ||
"""Return image configuration objects mapped by their paths.""" | ||
root_key = "dockerImageConfigs" | ||
image_info: Dict[str, List[Dict[str, Any]]] = {root_key: []} | ||
with tarfile.open(filename) as tarball: | ||
# we know the manifest file is present or we wouldn't be this far | ||
assert (manifest_file := optics.tarball.manifest_file(tarball)) | ||
manifest = json.load(manifest_file) | ||
for config_path in optics.manifest.config_path(manifest): | ||
assert (config_file := optics.tarball.config_file(tarball, config_path)) | ||
config = json.load(config_file) | ||
image_info[root_key].append(config) | ||
return image_info | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hopefully this isn't too crazy, but I'm thinking let's move the code for extracting this config info into the same infoextractor file that runs docker scout (https://github.com/LLNL/Surfactant/blob/main/surfactant/infoextractors/docker_image.py). The tricky bit I see will be restructuring the logic in that file some so that this config extraction will still happen even if docker scout isn't installed, and then making sure the results from both this and docker scout are included in the output. For OCI container files, we could rename this file or add a separate infoextractor for handling files identified as |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The changes here could be dropped once the docker config info extraction is merged into the file currently used for docker scout (or if docker_tarball_file gets renamed and updated to handle oci files the name registered here could be updated). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is combining the set of files that are present in the two different container specs (Docker and OCI), which in my limited testing are mutually exclusive so neither type of container will match a check for all of the files being present.
The PR adding Docker Scout support (#193) got merged, which added a check to the file magic filetypeid code to see if a TAR file matches the Docker spec (e.g. has a manifest.json file within it). I think for the Docker config support you're adding we should use the file type id happening there, which will only run once on a tar file (instead of once per plugin).
That said -- I think a similar
is_oci_archive
function should be added to the info extractor that detects ifoci-layout
andindex.json
files, and ablobs
directory (the spec says it may be empty) are present, and then the id_magic filetypeid code can return a type ofOCI_TAR
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my knowledge, Docker was based on the OCI specs and OCI they are not mutually exclusive; rather, they are designed to be interoperable, with Docker being one of the primary implementations of OCI-compliant containers.
If you have any containers that shows them to be mutually exclusive, may I have them for testing please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutually exclusive might not be the right term, since there could be a polyglot container image that matches both specs; the set of files present for an archive following the OCI specification (Docker spec v2?) don't also need to be present alongside the files for the Docker (v1.x?) specification.
As I was saving images in podman and Docker for testing, I found that with podman there is no overlap in the files output for the two archive types, but Docker includes manifest.json and repositories files from the old v1.x spec in its OCI archives for backwards compatibility with older Docker versions.
Note: all of these archives are gzipped so that they could be uploaded to GitHub.
podman save --format=oci-archive -o alpine-podman-oci-archive.tar alpine:latest
podman save --format=docker-archive -o alpine-podman-docker-archive.tar alpine:latest
sudo docker save -o alpine-docker-save.tar alpine:latest
(in Docker version <=24.x this matches docker-archive format from podman)sudo docker save -o alpine-docker-save.tar alpine:latest
(in Docker version >=25.x this is similar to oci-archive format from podman, except with manifest.json and repositories files for backward compatibility)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on handling the overlap? The 3 diff cases of Docker && OCI, only Docker and only OCI ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For file type identification, I think OCI can take priority if it is both a valid (older) Docker image and OCI image. Same for information extraction, most of the information should be the same.
I don't have any good ideas for how to handle polyglot files in general -- right now our file type identification plugins assume a file is only one type. To support polyglots we'd probably need to make the file type recognizers start returning lists of types matched.