Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update to support docker metadata object #213

Closed
wants to merge 31 commits into from
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
6d122a3
update to support docker metadata object
edalily Jun 6, 2024
84bd26d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 6, 2024
ca5e470
Update docker_tarball_file.py
edalily Jun 6, 2024
d24ed9d
support python 3.8
edalily Jun 6, 2024
79882b1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 6, 2024
e977238
[pre-commit.ci] pre-commit autoupdate (#214)
pre-commit-ci[bot] Jun 10, 2024
3570a85
[pre-commit.ci] pre-commit autoupdate (#219)
pre-commit-ci[bot] Jun 17, 2024
f124282
Bump cyclonedx-python-lib from 7.4.0 to 7.4.1 (#218)
dependabot[bot] Jun 17, 2024
629934f
resolve conflict
edalily Aug 20, 2024
ba586bc
Recommend pipx for user installation and show how to pip install plug…
nightlark Jun 20, 2024
387d6fc
[pre-commit.ci] pre-commit autoupdate (#223)
pre-commit-ci[bot] Jun 24, 2024
8716591
[pre-commit.ci] pre-commit autoupdate (#224)
pre-commit-ci[bot] Jul 1, 2024
8d2b08a
Cvebin2vex plugin (#178)
theStache Jul 12, 2024
872ba3f
[pre-commit.ci] pre-commit autoupdate (#226)
pre-commit-ci[bot] Jul 12, 2024
c08f507
Bump dnfile from 0.14.1 to 0.15.0 (#207)
dependabot[bot] Jul 12, 2024
6989d3c
Bump cyclonedx-python-lib from 7.4.1 to 7.5.1 (#225)
dependabot[bot] Jul 15, 2024
bf34a3a
[pre-commit.ci] pre-commit autoupdate (#228)
pre-commit-ci[bot] Jul 15, 2024
ddd4fd4
[pre-commit.ci] pre-commit autoupdate (#230)
pre-commit-ci[bot] Jul 22, 2024
209e23d
Pin dependency versions (#229)
KendallHarterAtWork Jul 22, 2024
1a408ba
Add Grype plugin (#227)
KendallHarterAtWork Jul 23, 2024
1e15466
Implement CLI add subcommand (#209)
shaynakapadia Jul 24, 2024
a702473
[pre-commit.ci] pre-commit autoupdate (#234)
pre-commit-ci[bot] Jul 29, 2024
14de70d
Fix: plugins running multiple times on the same file due to symlinks …
nightlark Jul 30, 2024
c2b1d06
Fix hashlib.md5 error with FIPS-compliant OpenSSL builds (#222)
nightlark Jun 18, 2024
7fd7120
[pre-commit.ci] pre-commit autoupdate (#237)
pre-commit-ci[bot] Aug 5, 2024
3dd0262
Add support for user config file for plugin options (#231)
nightlark Aug 7, 2024
9d8f9bf
[pre-commit.ci] pre-commit autoupdate (#240)
pre-commit-ci[bot] Aug 13, 2024
9099673
Bump cyclonedx-python-lib from 7.5.1 to 7.6.0 (#241)
dependabot[bot] Aug 19, 2024
d86a277
[pre-commit.ci] pre-commit autoupdate (#242)
pre-commit-ci[bot] Aug 19, 2024
34676aa
update to support docker metadata object
edalily Jun 6, 2024
074edb3
conformed to pylint rules
edalily Aug 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 85 additions & 0 deletions surfactant/infoextractors/docker_tarball_file.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Copyright 2024 Lawrence Livermore National Security, LLC
# see: ${repository}/LICENSE
#
# SPDX-License-Identifier: MIT

import json
import tarfile
from pathlib import PurePosixPath
from typing import IO, Any, Dict, List, Union

import surfactant.plugin
from surfactant.sbomtypes import SBOM, Software


class optics:
class tarball:
@staticmethod
def manifest_file(tarball: tarfile.TarFile) -> Union[IO[bytes], None]:
return tarball.extractfile(
{tarinfo.name: tarinfo for tarinfo in tarball.getmembers()}["manifest.json"]
)

@staticmethod
def config_file(tarball: tarfile.TarFile, path: str) -> Union[IO[bytes], None]:
return tarball.extractfile(
{tarinfo.name: tarinfo for tarinfo in tarball.getmembers()}[path]
)

class manifest:
@staticmethod
def config_path(manifest: List[Dict[str, Any]]) -> List[str]:
path = "Config"
return [entry[path] for entry in manifest]

@staticmethod
def repo_tags(manifest: List[Dict[str, Any]]) -> List[str]:
path = "RepoTags"
return [entry[path] for entry in manifest]


def portable_path_list(*paths: str):
"""Convert paths to a portable format acknowledged by"""
return tuple([str(PurePosixPath(path_str)) for path_str in paths])


def supports_file(filename: str, filetype: str) -> bool:
EXPECTED_FILETYPE = "TAR"

expected_members = portable_path_list(
"index.json",
"manifest.json",
"oci-layout",
"repositories",
"blobs/sha256",
)
Comment on lines +47 to +53
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is combining the set of files that are present in the two different container specs (Docker and OCI), which in my limited testing are mutually exclusive so neither type of container will match a check for all of the files being present.

The PR adding Docker Scout support (#193) got merged, which added a check to the file magic filetypeid code to see if a TAR file matches the Docker spec (e.g. has a manifest.json file within it). I think for the Docker config support you're adding we should use the file type id happening there, which will only run once on a tar file (instead of once per plugin).

That said -- I think a similar is_oci_archive function should be added to the info extractor that detects if oci-layout and index.json files, and a blobs directory (the spec says it may be empty) are present, and then the id_magic filetypeid code can return a type of OCI_TAR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my knowledge, Docker was based on the OCI specs and OCI they are not mutually exclusive; rather, they are designed to be interoperable, with Docker being one of the primary implementations of OCI-compliant containers.

If you have any containers that shows them to be mutually exclusive, may I have them for testing please?

Copy link
Collaborator

@nightlark nightlark Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutually exclusive might not be the right term, since there could be a polyglot container image that matches both specs; the set of files present for an archive following the OCI specification (Docker spec v2?) don't also need to be present alongside the files for the Docker (v1.x?) specification.

As I was saving images in podman and Docker for testing, I found that with podman there is no overlap in the files output for the two archive types, but Docker includes manifest.json and repositories files from the old v1.x spec in its OCI archives for backwards compatibility with older Docker versions.

Note: all of these archives are gzipped so that they could be uploaded to GitHub.

podman save --format=oci-archive -o alpine-podman-oci-archive.tar alpine:latest

podman save --format=docker-archive -o alpine-podman-docker-archive.tar alpine:latest

sudo docker save -o alpine-docker-save.tar alpine:latest (in Docker version <=24.x this matches docker-archive format from podman)

sudo docker save -o alpine-docker-save.tar alpine:latest (in Docker version >=25.x this is similar to oci-archive format from podman, except with manifest.json and repositories files for backward compatibility)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on handling the overlap? The 3 diff cases of Docker && OCI, only Docker and only OCI ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For file type identification, I think OCI can take priority if it is both a valid (older) Docker image and OCI image. Same for information extraction, most of the information should be the same.

I don't have any good ideas for how to handle polyglot files in general -- right now our file type identification plugins assume a file is only one type. To support polyglots we'd probably need to make the file type recognizers start returning lists of types matched.


if filetype != EXPECTED_FILETYPE:
return False

with tarfile.open(filename) as this_tarfile:
found_members = portable_path_list(*[member.name for member in this_tarfile.getmembers()])

return all([expected_member in found_members for expected_member in expected_members])


@surfactant.plugin.hookimpl
def extract_file_info(sbom: SBOM, software: Software, filename: str, filetype: str) -> object:
if not supports_file(filename, filetype):
return None
return extract_image_info(filename)


def extract_image_info(filename: str):
"""Return image configuration objects mapped by their paths."""
root_key = "dockerImageConfigs"
image_info: Dict[str, List[Dict[str, Any]]] = {root_key: []}
with tarfile.open(filename) as tarball:
# we know the manifest file is present or we wouldn't be this far
assert (manifest_file := optics.tarball.manifest_file(tarball))
manifest = json.load(manifest_file)
for config_path in optics.manifest.config_path(manifest):
assert (config_file := optics.tarball.config_file(tarball, config_path))
config = json.load(config_file)
image_info[root_key].append(config)
return image_info
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this isn't too crazy, but I'm thinking let's move the code for extracting this config info into the same infoextractor file that runs docker scout (https://github.com/LLNL/Surfactant/blob/main/surfactant/infoextractors/docker_image.py).

The tricky bit I see will be restructuring the logic in that file some so that this config extraction will still happen even if docker scout isn't installed, and then making sure the results from both this and docker scout are included in the output.

For OCI container files, we could rename this file or add a separate infoextractor for handling files identified as OCI_TAR (the files to read and JSON keys to walk for that is different enough from Docker that I don't think combining them makes sense).

2 changes: 2 additions & 0 deletions surfactant/plugin/manager.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here could be dropped once the docker config info extraction is merged into the file currently used for docker scout (or if docker_tarball_file gets renamed and updated to handle oci files the name registered here could be updated).

Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ def _register_plugins(pm: pluggy.PluginManager) -> None:
from surfactant.infoextractors import (
a_out_file,
coff_file,
docker_tarball_file,
elf_file,
java_file,
js_file,
Expand All @@ -43,6 +44,7 @@ def _register_plugins(pm: pluggy.PluginManager) -> None:
id_extension,
a_out_file,
coff_file,
docker_tarball_file,
elf_file,
java_file,
js_file,
Expand Down
Loading