Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name and Version empty for Java package when scanning provided image #2132

Closed
spiffcs opened this issue Sep 14, 2023 · 2 comments · Fixed by #3257
Closed

Name and Version empty for Java package when scanning provided image #2132

spiffcs opened this issue Sep 14, 2023 · 2 comments · Fixed by #3257
Assignees
Labels
enhancement New feature or request

Comments

@spiffcs
Copy link
Contributor

spiffcs commented Sep 14, 2023

What happened:
When using syft from the tip of main for image caphill4/syft-manifest-bug:latest the following behavior was experienced:

  • Name field was blank for multiple discovered package
  • Version field was blank for multiple discovered package
  • Java Metadata provided a manifest version, but no other manifest details were made available
syft -o json caphill4/syft-manifest-bug:latest

Example package below - note there were multiple of these:

  {
   "id": "40358cd756d70d11",
   "name": "",
   "version": "",
   "type": "java-archive",
   "foundBy": "java-cataloger",
   "locations": [
    {
     "path": "/opt/asserts/api-server/enterprise-server.jar",
     "layerID": "sha256:4d8a814cf85fcbdaa2cff3f001c705392dcc05e1bf659fcaac718b84e9dfc662",
     "annotations": {
      "evidence": "primary"
     }
    }
   ],
   "licenses": [],
   "language": "java",
   "cpes": [],
   "purl": "pkg:maven/",
   "metadataType": "JavaMetadata",
   "metadata": {
    "virtualPath": "/opt/asserts/api-server/enterprise-server.jar:BOOT-INF/lib/1-555680818.jar",
    "manifest": {
     "main": {
      "Manifest-Version": "1.0"
     }
    },
    "digest": [
     {
      "algorithm": "sha1",
      "value": "4c1415ccb35494ea281446ce12463ff40263c910"
     }
    ]
   }
  },

What you expected to happen:
Syft should have an option or config to eliminate packages after the fact if there is not enough identifying information.

Alternatively, the file cataloger could be enhanced to show nested jar information so this information is not lost, but instead moved from package information to file information.

Example:

syft -o json --prune caphill4/syft-manifest-bug:latest

The offending jars also had some warnings, but these seem to be related to regex matching. Packages are still being created for these jars, but given they have almost no identifying information the package is blank besides the path and virtualPath fields showing their location

[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/57-30595491.jar'
[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/67-1304492339.jar'
[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/67-1304492339.jar'
[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/32-409283951.jar'
[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/32-409283951.jar'
[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/6-710714459.jar'
[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/6-710714459.jar'
[0003]  WARN unexpectedly empty matches for archive 'BOOT-INF/lib/17-2051466981.jar'
......

Steps to reproduce the request:

syft -o json caphill4/syft-manifest-bug:latest

Inspect the output for the above characteristics

Anything else we need to know?:
Built from syft main as of - a46d122

Environment:

  • Output of syft version: a46d122
  • OS (e.g: cat /etc/os-release or similar): OSX
@spiffcs spiffcs added the bug Something isn't working label Sep 14, 2023
@spiffcs spiffcs moved this to In Progress in OSS Sep 14, 2023
@spiffcs spiffcs self-assigned this Sep 14, 2023
@spiffcs spiffcs added enhancement New feature or request and removed bug Something isn't working labels Sep 14, 2023
@spiffcs
Copy link
Contributor Author

spiffcs commented Sep 19, 2023

Alright - I’ve hit the end of investigating this and have this update -

Currently the behavior is correct in that syft is identifying the main parent jar enterprise-server . A package does exist in the SBOM for that main package along with the manifest information.
The confusion about it possibly not existing comes from path and virtualPath fields being conflated. This might lead the user to incorrectly believe that blank information is being inserted for path=/opt/enterprise-server.jar . If we look at the virtualPath it shows that these blank entries actually come from nested jars with limited identifying information virtualPath=/opt/enterprise-server.jar:BOOT-INF/lib/1-555680818.jar

Potential solutions:

  1. A Prune option which eliminates packages in a post processing step that do not have both name and version fields. This presents some challenges in that the file cataloger by default does not account for nested jar paths. This kind of option would remove any kind of detection or representation of these nested jars leading to an incomplete SBOM

  2. While the catalogers are logically detached at the moment, I would be more in favor of the above pruning option if I knew the results showed up somewhere else on the SBOM. The files field could be enhanced via the fileCataloger to show something like below, with an option to also create a relationship to the parent package:

  {
   "id": "e82d211f6cc65681",
   "location": {
      "path": "/opt/enterprise-server.jar:BOOT-INF/lib/1-555680818.jar",
      "layerID": "sha256:4693057ce2364720d39e57e85a5b8e0bd9ac3573716237736d6470ec5b7b7230"
     }
  },

Let me know in this thread comments or thoughts 😃 but I've put this into our backlog for future discussion during community/team sync

@spiffcs spiffcs moved this from In Progress to Backlog in OSS Sep 19, 2023
@spiffcs spiffcs removed their assignment Sep 19, 2023
@zhill
Copy link
Member

zhill commented Jun 12, 2024

@spiffcs I think this has come up with a couple of other users as well, so probably worth restarting the discussion on what the correct answer is for Syft here.

My (current) 2c is that a jar without metadata or any other identifiers as a java artifact is really just the same as a tar file or zip file. In terms of artifact relationships it should be treated like an archive that is recursed into by the cataloger to find other artifacts rather than an identified package. The file cataloger can gather digests etc, but the java cataloger should probably skip if it cannot identify the actual java software in the jar file.

This would also be a good candidate for any "known-unknown" classification logic, to identify to the SBOM reader that theres is content that is known to be likely a part of an application or artifact but that cannot be identified.

@wagoodman wagoodman self-assigned this Sep 6, 2024
@wagoodman wagoodman moved this from Backlog to In Progress in OSS Sep 6, 2024
@wagoodman wagoodman moved this from In Progress to In Review in OSS Sep 19, 2024
@github-project-automation github-project-automation bot moved this from In Review to Done in OSS Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants