Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package duplicated by different cataloger #931

Closed
WhyJee opened this issue Mar 31, 2022 · 22 comments · Fixed by #1948
Closed

Package duplicated by different cataloger #931

WhyJee opened this issue Mar 31, 2022 · 22 comments · Fixed by #1948
Assignees
Labels
enhancement New feature or request filtering Related to selecting or filtering results

Comments

@WhyJee
Copy link

WhyJee commented Mar 31, 2022

What happened:

Scanning almalinux:latest image with various tools to compare the generated SBOM. At first sight Syft sounds better as total number of identified components was greater. But... making a deeper analysis showed that Syft had identified the same packages from rpm and from python.

Example Rpm cataloger finding:

   "name": "libcomps",
   "version": "0.1.16-2.el8",
   "type": "rpm",
   "foundBy": "rpmdb-cataloger",
   "locations": [
    {
     "path": "/var/lib/rpm/Packages",
    }
   ],
   "licenses": [],
   "language": "",
   "cpes": [
    "cpe:2.3:a:almalinux:libcomps:0.1.16-2.el8:*:*:*:*:*:*:*",
    "cpe:2.3:a:libcomps:libcomps:0.1.16-2.el8:*:*:*:*:*:*:*"
   ],
   "purl": "pkg:rpm/almalinux/[email protected]?arch=x86_64&upstream=libcomps-0.1.16-2.el8.src.rpm&distro=almalinux-8.5",

Example Python Cataloger finding:

   "name": "libcomps",
   "version": "0.1.16",
   "type": "python",
   "foundBy": "python-package-cataloger",
   "locations": [
    {
     "path": "/usr/lib64/python3.6/site-packages/libcomps-0.1.16-py3.6.egg-info",
    }
   ],
   ],
   "licenses": [
    "GPLv2+"
   ],
   "language": "python",
   "cpes": [
    "cpe:2.3:a:rpm_software_management:python-libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm_software_management:python_libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm_software_management:libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python-libcomps:python-libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python-libcomps:python_libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python_libcomps:python-libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python_libcomps:python_libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm-ecosystem:python-libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm-ecosystem:python_libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm_ecosystem:python-libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm_ecosystem:python_libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:libcomps:python-libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:libcomps:python_libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python-libcomps:libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python_libcomps:libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python:python-libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python:python_libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm-ecosystem:libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm_ecosystem:libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:libcomps:libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python:libcomps:0.1.16:*:*:*:*:*:*:*"
   ],
   "purl": "pkg:pypi/[email protected]",
    "license": "GPLv2+",

In fact this python package is delivered by above rpm so shall point to the same.

What you expected to happen:

In fact I am not sure if this is a good or bad to have duplicate for the same. Note that purl/cpe are different.

Searching on NVD https://nvd.nist.gov/products/cpe/search/results?namingFormat=2.3&keyword=libcomps
the CPEs are only rpm based cpe:2.3:a:rpm:libcomps:.... Thus we may think that only rpm is necessary. Maybe a way to reduce the findings when one also belongs to another packager could be provided.

How to reproduce it (as minimally and precisely as possible):

docker run \
            --rm \
            -it \
            -v /var/run/docker.sock:/var/run/docker.sock \
            -v $PWD:/tmp/workdir \
            anchore/syft:latest \
            -v \
            packages \
            -s Squashed \
            -o json \
            --file /tmp/workdir/bom.json \
            docker:almalinux:latest

Anything else we need to know?:

Environment:

  • Output of syft version: 0.38.0 (and same result with 0.42.4)
  • OS (e.g: cat /etc/os-release or similar): N/A
@WhyJee WhyJee added the bug Something isn't working label Mar 31, 2022
@WhyJee WhyJee changed the title Package deplicated by different cataloger Package duplicated by different cataloger Mar 31, 2022
@spiffcs spiffcs added this to OSS Apr 27, 2022
@luhring
Copy link
Contributor

luhring commented May 2, 2022

Hi @WhyJee, good eye! The short answer is: this is known behavior. And you're right about rpm superseding here.

Syft's philosophy is to surface as much data as it's aware of. And in this case it found evidence of a Python package and also of an RPM package.

But Syft also is aware of the relationship between these two packages. For this "libcomps" example, you should see an item in the "artifactRelationships" array (in Syft's JSON output) that looks kind of like this:

{
  "parent": "f3a95e529e656bd7",
  "child": "56567855bd1c8c05",
  "type": "ownership-by-file-overlap",
  "metadata": {
    "files": [
      "/usr/lib64/python3.6/site-packages/libcomps-0.1.16-py3.6.egg-info"
    ]
  }
}

This way, if a consumer of this data wanted to, they could intentionally and explicitly filter out child packages that are part of ownership-by-file-overlap relationships (e.g. the Python package in your example).

Maybe a way to reduce the findings when one also belongs to another packager could be provided.

I think this is a great idea for a new feature. As we saw above, Syft already has the data needed to do this. It would just need to expose the filtering functionality via a CLI flag or something.

How does that sound?

@luhring luhring self-assigned this May 2, 2022
@luhring luhring moved this to Triage (Comments or Progress Made) in OSS May 2, 2022
@WhyJee
Copy link
Author

WhyJee commented May 11, 2022

@luhring so it seems that the feature (or part of it) is already there. It is more a matter of SBOM processing in order to "minimize" it.

Now we may say we do have two use-cases:

  1. opensource packages an opensource component (case above) : libcomps rpm delivers libcomps python module
  2. I package some opensource components that are dependencies of my product (to ease customer deployment for instance)

In case 1. you may want to eliminate one of the two (which one is another story but normally you shall pick the englobing package).

In case 2. I may be interested to eliminate my package and keep only what it delivers. So in that case I need to find the Id of my package to remove it or not. But If I have a standard cleanup mechanism I will need to have a way to indicate do not remove the child (X,Y,Z, ...).

But in any case, we may say that ball is in the end of the SBOM consumer (and not producer -- here Syft).

Note that the picture is probably more complex as there are more relationship that we (I) may want to see in a SBOM.
Let's take an example:

  • for some reason, I have have forked an opensource component
  • I package it also

So we may have in the end something as (not complete picture as I may have forgotten other relationship types and using spdx relation names):

graph TD
    R1[pkg:rpm/myRpm-1.2.3] --> |contains| P1
    R1 --> |hasPrerequisites| R2[pkg:rpm/python-3.y.z]
    P1[pkg:pypi/myMod-2.3.4] --> |descendantOf| P2
    P2[pkg:pypi/aMod-4.5.6] --> |dependsOn| P3
    P3[pkg:pypi/otherMod-7.8.9]
Loading

There is a simpler case where you did not rename the module but just override the version such as aMod-4.5.6-mybranchonmyfork.

This implies that :

  • in myMod I have put necessary data so that the scanner is able to generate the whole graph
  • a vulnerability scanner is able to read the graph and gives me warning (or indication) of potential vulnerabilities in the module I derived from. myMod being eventually a private module it will surely not be known by vulnerability DB so this processing is interesting to have.

@luhring
Copy link
Contributor

luhring commented May 21, 2022

Thanks for the thoughtful depiction here! I think I'm following.

So it sounds like the actionable piece of this is that there are two potential feature enhancements we could add to Grype (the vulnerability scanner companion to Syft).

  1. Provide a way to tell Grype to ignore "contained" (using the nomenclature from your graph above; in Syft, we call this ownership-by-file-overlap) dependencies (but probably not their dependencies?) while finding vulnerability matches.

  2. Provide a way to tell Grype to ignore specific "containing" (e.g. pkg:rpm/myRpm-1.2.3 from your graph) packages, and to instead consider their child nodes to be roots in the graph (as if they had never been repackaged by you in the first place).

Does this align with your thinking?

@spiffcs
Copy link
Contributor

spiffcs commented May 25, 2022

Also just quick notes: @WhyJee would you be ok with me updating the issue name to reflect the new enhancement here?

@WhyJee
Copy link
Author

WhyJee commented Jun 23, 2022

@spiffcs sorry for the delay. I was on trip and then got too much involved in day to day activities.
So yes agree to update the issue.

@WhyJee
Copy link
Author

WhyJee commented Jun 23, 2022

@luhring I believe there could be some enhancements in Syft side either.

I would see its logic in a multistage pattern:

  • Stage 1 : use OS package manager (here rpm) to identify a bunch of stuff
    • Stage 1.1 : remove from the file list every file owned by a package, except if indicated modified by the package manager (typically what rpm verify indicates)
  • Stage 2 : execute all other known package managers (option to skip may be provided -- would probably an overkill as I don't see real use case)
    • Stage 2.1 : same removal rule as 1.1
  • Stage 3 : based on some "file" DB, try to infer where they come from...

With this type of logic you can also assess the score of the findings:

  • Stage 1 or 2 and package clean = highest
  • Stage 1 or 2 and package dirty = high with warning
  • Stage 3 = from low to moderate

Regarding Grype enhancement, sounds both make sense. For 1. people using such option shall be aware that if vulnerability database use consistent package identification (the contained or the container) the final VEX may miss some vulnerabilities. Put pb is more on vulnerability identifier than on SBOM.

@tgerla tgerla added enhancement New feature or request and removed bug Something isn't working labels Aug 4, 2022
@tgerla tgerla assigned tgerla and unassigned luhring Aug 10, 2022
@WhyJee
Copy link
Author

WhyJee commented Aug 22, 2022

An example of the consequence of this behavior in another context. An image is scanned with Syft and the generated SBOM is pushed to a vulnerability tool.
The image contains the rpm package python3-rpm which leads in the SBOM to 2 entries:

  • rpm
    • Name : python3-rpm
    • Version: 4.14.3-23.el8
  • python
    • Name: rpm
    • Version: 4.14.3

CVE-2021-3421 is reported against this product for the 2nd item (Python) whereas it is fixed as per RHSA-2021:2574 - Security Advisory since version 4.14.3-14.
The consequence is more "manual" validation for developers to assess these false positive vulnerabilities.

@kzantow kzantow removed the status in OSS Dec 21, 2022
@wagoodman wagoodman added the filtering Related to selecting or filtering results label Dec 22, 2022
@kzantow kzantow moved this to Parking Lot (Comments or Progress) in OSS Jan 5, 2023
@matthyx
Copy link

matthyx commented May 11, 2023

@luhring I have the same issue, but with different catalogers (binary and sbom), and in this case there is no direct relationship between them...

syft packages -o json docker.io/bitnami/redis@sha256:d06075921a96f3ccc3d2567230f6d22575ff65c1a8e8c0ce55717aaa8719b41f

Will detect "Redis" and "redis" with respectively the binary-cataloger and the sbom-cataloger:

   "id": "daf59709b31d2efc",
   "name": "redis",
   "version": "7.0.11",
   "type": "binary",
   "foundBy": "binary-cataloger",
   "locations": [
    {
     "path": "/opt/bitnami/redis/bin/redis-server",
     "layerID": "sha256:6b9f25067a7386027c0290ec0845fe0f246fd5782d3cb1852d43d3bdf687a7c5"
    }
   "id": "75918b40c8383df",
   "name": "Redis",
   "version": "7.0.11",
   "type": "",
   "foundBy": "sbom-cataloger",
   "locations": [
    {
     "path": "/opt/bitnami/redis/.spdx-redis.spdx",
     "layerID": "sha256:6b9f25067a7386027c0290ec0845fe0f246fd5782d3cb1852d43d3bdf687a7c5"
    }

However there is no relationship between them... how should I filter these?
Thanks for your support.

@luhring
Copy link
Contributor

luhring commented May 19, 2023

cc: @wagoodman

@wagoodman
Copy link
Contributor

@matthyx I'm not certain in the specific case you brought up that there is enough information from syft's perspective to know that these are two different redis packages. That is, name and version might not be enough information in all cases, to create a relationship we need one package to claim ownership of a location that another package was defined by. For instance, a package found in an RPM DB lists out all files owned by a package... if that package is a python wheel, then the python cataloger will also pick up on it. The key to knowing if the packages describe the same thing is if the RPM file ownership locations overlap with the python package evidence locations, in this case a python wheel metadata file -- the we'd be able to determine there is some overlap in ownership.

Even though you provided a small snippet, the binary and sbom catalogers today do not list out owned files from packages they raise up, so it isn't possible yet to create ownership overlap relationships for these two packages.

@matthyx
Copy link

matthyx commented Jun 16, 2023

@wagoodman alright... so there is nothing we could do in that particular case (and other similar ones).
Thanks for your investigation and for the great product 👍

@wagoodman
Copy link
Contributor

wagoodman commented Jun 16, 2023

Popping off the stack some, ideally syft does persist all information found for packages discovered. However, that goal might not be mutually exclusive to merging packages together (slightly different than deduplicating, which may be lossy to the full set of data on a package object). I feel that such an approach probably has tradeoffs.

Today the syft package data model allows for a package to be found by a single cataloger and for tailored metadata to be persisted in the .metadata field:

{
  "id": "83df403875b8c91",
  "name": "curl",
  "version": "8.2.7",
  "type": "binary",
  "foundBy": "binary-cataloger",
  "purl": "pkg:github/curl/[email protected]",
  "metadata": {
    "matches": [
      {
        "classifier": "curl-binary",
        "location": "..."
      }
    ]
  }
}

(omitting a few things...)

But if we wanted to merge this definition with the dpkg one:

{
  "id": "413d8f5b0378c98",
  "name": "curl",
  "version": "8.2.7",
  "type": "dpkg",
  "foundBy": "dpkg-cataloger",
  "purl": "pkg:deb/[email protected]",
  "metadata": {
    "architecture": "x86-86",
    "installedSize": 8392,
    "maintainer": "[email protected]"
  }
}

... we'd be out of luck, since there are several singular fields that have different values, thus you'd need to drop one. But if the package object were modified to allow for merging logical packages found in different ways, then it would be possible:

{
  "name": "curl",
  "version": "8.2.7",
  "evidence": [
    {
      "id": "83df403875b8c91",
      "type": "binary",
      "foundBy": "binary-cataloger",
      "purl": "pkg:github/curl/[email protected]",
      "metadata": {
        "matches": [
          {
            "classifier": "curl-binary",
            "location": "..."
          }
        ]
      }
    },
    {
      "id": "413d8f5b0378c98",
      "type": "dpkg",
      "foundBy": "dpkg-cataloger",
      "purl": "pkg:deb/[email protected]",
      "metadata": {
        "architecture": "x86-86",
        "installedSize": 8392,
        "maintainer": "[email protected]"
      }
    }
  ]
}

There are some important things to note with this hypothetical object. Today nodes in the graph are package objects, which is relatively simple. With this new approach the evidence array elements would be nodes in the graph, not the logical package. Why? We don't know if there are multiple dependency graphs in the greater SBOM being explained, where each node would be a part of a different dependency graph... making the logical package the node represented in the graph means that we'd be effectively merging currently separate dependency graphs. It's ok if the two separate evidence nodes are related across the two dependency graphs, as long as that relationship is not expressed as a dependency.

This new hypothetical object also makes it a little harder for consumers to use the data. Instead of a small jq command to understand answers to simple questions (what is the type of package name==x? jq .artifacts[] | select(.name == x)) you'd need to start writing larger scripts to reconcile fields with multiple values (find the logical package name x, iterate over .data, find all .type fields and return the array). Where today you get 0 to 1 values, with this new approach you get 0 to many values.

Popping out of the hypothetical package object example... I still think the default behavior of syft should raise all of the raw information as possible, but I think there is room for allowing for opt-in filtering or deduplication logic. (needs more thought though...)

@matthyx
Copy link

matthyx commented Jun 16, 2023

cc: @slashben

@spiffcs
Copy link
Contributor

spiffcs commented Jun 26, 2023

We've had some internal discussion regarding the correctness of overlapping packages in an SBOM so I wanted to get the ball rolling on this thread for what this kind of opt in filtering enhancement would look like if we were to add it to the syft tool.

Just to set a baseline, we are NOT talking about how to enhance the current SBOM for downstream tooling like grype. We're discussing opt-in behavior for syft (non default flag or config) where it filters and decides a winner between packages that overlap via the ownership-by-file-overlap relationship, but contain conflicting information.

Feature

Before generating the SBOM, if syft detects a difference between the two package's information that are related via the ownership-by-file-overlap relationship`, it will use some context or rule set provided by the user to eliminate the "incorrect" package.

Current Default State

The current philosophy for package overlap is as follows (chime in @anchore/tools if any of this seems wrong):

syft stays out of making filtering decisions be default
it instead raises enough information so that downstream tooling 
can try and make the right decision between packages for their use case

The above is in place because it's currently not clear 100% of the time that some other cataloger (SBOM, binary, ecosystem, etc) will be wrong relative to an OS-package cataloger. There is no one size fits all cataloger hierarchy.

Said another way, Syft currently has no mechanism for assessing correctness between two catalogers output if a conflict in package information arrises, but ownership-by-file-overlap relationship is present.

Consider the following case of two packages:

{
  "id": "83df403875b8c91",
  "name": "curl",
  "version": "8.2.7",
  "type": "binary",
  "foundBy": "binary-cataloger",
  "purl": "pkg:github/curl/[email protected]",
  "metadata": {
    "matches": [
      {
        "classifier": "curl-binary",
        "location": "..."
      }
    ]
  }
}
{
  "id": "413d8f5b0378c98",
  "name": "curl",
  "version": "8.2.7-rc1",
  "type": "dpkg",
  "foundBy": "dpkg-cataloger",
  "purl": "pkg:deb/[email protected]",
  "metadata": {
    "architecture": "x86-86",
    "installedSize": 8392,
    "maintainer": "[email protected]"
  }
}

These packages have the following relationship:

{
  "parent": "413d8f5b0378c98",
  "child": "83df403875b8c91",
  "type": "ownership-by-file-overlap",
  "metadata": {
    "files": [
      "/usr/bin/local/curl"
    ]
  }
}

In the default SBOM - syft would surface both packages given the conflicting information. One of these packages would turn out to have incorrect version information upon investigation, but there is not current rule to say always take OS packages over binary given a conflict or vise versa.

Way forward

The thread is open for discussion on design/implementation on how we can best build this more advanced context into syft's current mechanisms so that users have more agency over filtering the SBOM they create.

I'll follow up with my own proposal separate to this framing comment so that we can keep problem/solution discussion separate.

@christinahaig
Copy link

As this is related to https://support.anchore.com/hc/en-us/requests/4315 I was hoping you could tell me when this Issue is planned to be worked on? If the ownership-by-file-overlap tag is already present, it seems like a solution is already at hand - let users decide which order to prioritize package cataloger discoveries (which takes precedent).

I can say from my experience, binary detections should always be at the bottom - they have the least accurate information. Merging/de-dupping/otherwise removing the binary component is not a 'lossy' action, it's just a bad detection - the version is purely wrong. It may be the upstream version, but that doesn't make it correct for the package as distros frequently apply patches and change the release string, which changes the component version. The package manager manifest has the correct information for that component, and where you know there is a relationship between the two components (shared files list for example from the package manager metadata, e.g. /var/lib/dpkg/info/<package_name>.list), you should delete the bad detection in favor of the correct information from the package manager.

@spiffcs
Copy link
Contributor

spiffcs commented Jul 19, 2023

Hey @christinahaig! Thanks for following up here -

The solution of:

Let users decide which order to prioritize package cataloger discoveries (which takes precedent).

Is a great suggestion.

My thoughts here are adding a few things to the syft configuration that help accomplish this in a two pass approach.

The first pass would just be:

remove-packages-by-overlap: true

This would allow syft to prune the generated SBOM before it's output much in the same way that https://github.com/anchore/grype already filters packages using the ownership-by-file-overlap

A follow up to that would be what you suggested - where a cataloger precedence construct is added that the user can configure. This one needs a bit more design work as the catalogers are hierarchical by convention only right now. They would need some kind of additions or alterations that allow a user configuration to hook into / identify cataloger "types" (binary, distro, package manager etc) that could be assigned a precedence.

As to when this is being worked on - I can take a look at getting the above config option added this afternoon with a PR for review by the rest of the @anchore/tools -

@wagoodman
Copy link
Contributor

I'm starting to see the case for why a binary package in particular should probably not be included if there is an owning package... since the "binary packages" were entirely synthesized by syft. That may call for excluding them by default.

Let users decide which order to prioritize package cataloger discoveries (which takes precedent).

This resonates with me too. Here's an option of what that might look like in syft configuration:

drop-packages-with-ownership-overlap:
  - parent-type: class:os
    type: binary

Where class:os is short hand for ["apk", "alpm", "rpm", "dpkg", "portage"], so the full expression would be:

drop-packages-with-ownership-overlap:
  - type: binary
    parent-type:
    - "apk"
    - "alpm"
    - "rpm"
    - "dpkg"
    - "portage"

This could be the default configuration. However, we could allow for simple expressions like:

drop-packages-with-ownership-overlap:
  # drop any python package that is owned by an RPM package
  - parent-type: rpm
    type: python

Alternatively we could allow for something as agnostic as dropping packages based off of more generic criteria:

drop-packages:
  - relationship-type: ownership-by-file-overlap 
    parent-type: class:os
    type: binary

But this is really starting to get into something like #31 ... but I'd like to avoid this since
#31 is really about how to apply hints for a specific image, and this issue is really about ignoring a class of packages based on structural elements, regardless of the specifics of an image.

@wagoodman
Copy link
Contributor

One thing that I feel is unanswered is should we be looking at exclusively the relationships and package types? Or should there be more to match on in order to drop a package? For instance, what if an OS package contains multiple binaries, should we suppress the binary packages then? Or what if a binary contained within an OS package does not logically represent the same package name as the OS package name (e.g. an RPM for web-app contains the php binary)... should we be trying to fully or partially ensure that the binary found logically represents the OS package?

@spiffcs
Copy link
Contributor

spiffcs commented Jul 20, 2023

 For instance, what if an OS package contains multiple binaries, 
should we suppress the binary packages then?

This is a great point! Here are a list of other fields we can match on to make this more exact. The current implementation on #1948 is very bare bones in that a match will be excluded based on the relationship existing and the types being of the correct orientation (From: os --->To:Binary )

Name
Version
FoundBy
Locations
Licenses
Language
Type
CPE
PURL

There was a suggestion in another issue that PURL could be a candidate to consider here. I do know that Version is sometimes unreliable here in that since syft is constructing binary packages the Version might not always match what exists from the OS cataloger:

Example:

busybox                 1.36.1       binary
busybox                 1.36.1-r0    apk

Other consideration:

Or what if a binary contained within an OS package does 
not logically represent the same package name as the OS package name 

^ My Opinion: A binary that is linked to an OS package with the ownership-by-file-overlap relationship should not be removed if they have different names

Edit: Weston makes a good point below that more exact matching on Name would still keep some of the frustration persisting here

@westonsteimel
Copy link
Contributor

westonsteimel commented Jul 20, 2023

I suspect that's not going to work particularly well because the package manager names will often not match the syft constructed names (for instance python vs python3, python3-min, etc).

@wagoodman
Copy link
Contributor

wagoodman commented Jul 20, 2023

agreed that a specific approach would be needed (we can look for partial matches or similarity).

The higher level question is should we be trying to determine if the binary package is being represented by the OS package? or not try and detect this? I feel that not accounting for this will filter out packages that should remain in the SBOM.

@spiffcs spiffcs moved this from Awaiting Response to In Review in OSS Aug 1, 2023
@spiffcs spiffcs assigned spiffcs and unassigned tgerla Aug 1, 2023
spiffcs added a commit that referenced this issue Aug 8, 2023
)

Fixes #931

PR #1948 introduces a new implicit exclusion for binary packages that overlap by file ownership and have certain characteristics:

1) the relationship between packages is OwnershipByFileOverlap
2) the parent package is an "os" package - see changelog for included catalogers
3) the child is a synthetic package generated by the binary cataloger - see changelog for included catalogers
4) the package names are identical

---------

Signed-off-by: Christopher Phillips <[email protected]>
@github-project-automation github-project-automation bot moved this from In Review to Done in OSS Aug 8, 2023
@spiffcs
Copy link
Contributor

spiffcs commented Aug 8, 2023

@christinahaig - we just merged #1948 and it should go into the next syft release - feedback is always welcome and we hope that this new default configuration reduces the noise you were seeing when synthetic packages were incorrectly constructed and had a valid OS overlap =)

GijsCalis pushed a commit to GijsCalis/syft that referenced this issue Feb 19, 2024
…chore#1948)

Fixes anchore#931

PR anchore#1948 introduces a new implicit exclusion for binary packages that overlap by file ownership and have certain characteristics:

1) the relationship between packages is OwnershipByFileOverlap
2) the parent package is an "os" package - see changelog for included catalogers
3) the child is a synthetic package generated by the binary cataloger - see changelog for included catalogers
4) the package names are identical

---------

Signed-off-by: Christopher Phillips <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request filtering Related to selecting or filtering results
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

8 participants