Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map shared lib / executable dependencies #661

Open
wagoodman opened this issue Dec 9, 2021 · 7 comments
Open

Map shared lib / executable dependencies #661

wagoodman opened this issue Dec 9, 2021 · 7 comments

Comments

@wagoodman
Copy link
Contributor

What would you like to be added:
The ability to list the specific shared lib dependencies for a binary. For example:

$ readelf -d ./partx

Dynamic section at offset 0x1c908 contains 29 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libblkid.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libsmartcols.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x4000
 0x000000000000000d (FINI)               0x15304
 0x0000000000000019 (INIT_ARRAY)         0x1d530
...

We could specifically note that there are shared lib dependencies: libblkid.so.1, libsmartcols.so.1, and libc.so.6

These files could be cross correlated with other packages that provide these files to discover relationships between packages and files (or files and files if there is no representative packaging metadata available). We can additionally discover other shared libs and do the same analysis, in which case we can build a tree of dependencies for executables.

The missing part of this is being able to reconcile runtime attributes that may change the structure of the tree (such as LD_LIBRARY_PATH). However, as long as the files are present the superset of dependencies can be created without issue (no need to consider these runtime constraints).

There is a lot more thought needed here; does this imply a separate binary cataloger? are those findings considered packages? if they are not (thus considered only files) how do you keep extra binary format info around? or do we only focus on creating relationships between files? does this overlap with the golang bin cataloger or is it in a separate-enough of a domain?

@wagoodman wagoodman added the enhancement New feature or request label Dec 9, 2021
@jonasagx
Copy link
Contributor

jonasagx commented Mar 1, 2022

Possibly helpful: https://github.com/sad0p/go-readelf

@wagoodman
Copy link
Contributor Author

The go stdlib already has the capability of listing out shared libs from all formats we'd be interested in supporting (including elf)

@jonasagx
Copy link
Contributor

From OSS meeting:

We should consider when to catalog these based on the source being scanned (maybe images and dir only? Maybe not individual files?)

@mythi
Copy link

mythi commented Jul 20, 2023

The ability to list the specific shared lib dependencies for a binary. For example:

I can see this issue is bit old but the feature would be greatly useful for my use-case so +1 for the idea.

@wagoodman wagoodman added this to OSS Feb 7, 2024
@wagoodman wagoodman added this to the Elevate binary artifacts milestone Feb 7, 2024
@wagoodman wagoodman moved this to Backlog in OSS Feb 7, 2024
@wagoodman
Copy link
Contributor Author

wagoodman commented Feb 12, 2024

I feel this can manifest a few different ways, but I want to put forth my take on how this could be expressed.

The odd thing about this kind of feature is that we are relating things that are essentially files and not necessarily packages. That is, it might be that a binary we find with shared lib deps is part of a higher-level RPM, or maybe not. The same can be said of the shared libs it's using. If we go the direction of blindly adding relationships for all file nodes in the SBOM that represent executables to other file nodes (other binary files) then it will logically be duplicating any similar package-to-package representations if both executables already happen to be packaged as RPMs (and the cross-package dependency is already captured as a relationship). I think this is is a little conflicting since:

  • though we'd be trying to describe as accurately as possible that this-file-depends-on-that-file... (a good thing)
  • ...we'd be duplicating a relationship that is logically already there (adding noise to the SBOM, which is not a good thing)

As mentioned earlier, we don't technically know what the loader will do at runtime since we don't have all of the information that the loader would have (such as LD_LIBRARY_PATH). I also don't think we should try an replicate the linker behavior even if we had enough information to do so. This somewhat devalues the file-to-file relationships as a way to convey shared libs. Does it invalidate it? I don't think so, more just de-emphasizes the need to exhaustively enumerate binary-to-binary relationships.

I feel the right user-facing perspective is to try and convey any additional dependency information that is not already present in the existing package-to-package relationships. In that spirit, here are more specific thoughts:

  • Capture all shared lib dependencies in two places: a simple list on the file node (as exactly found in the binary) and conditionally as an additional relationship. The list on the file node is more accurate in terms of the claims found from the binary and we can raise up the need for a lib even if it's not present in the artifact anywhere (thus a relationship could not be created)
  • For the new shared lib dependencies, there is an order of preference: first package-to-package, then package-to-file/file-to-package, and file-to-file last. "Conditionally capture additional relationships" for shared libs means that right before crafting a binary-to-binary relationship look for any packages for each binary that a package already claims ownership for. If a package is found to own the binary in question, the package node should be used over the file node. The same should be done for both file nodes... if there is already a dependency relationship for what nodes that get resolved, then skip it, otherwise create the new relationship.

The output of this is a graph where you could traverse runtime dependencies in one connection, instead of needing to traverse in a package-sense first then a file-sense second after looking at attributes of the package/file nodes and determining the need to traverse to another node that doesn't have an edge. I think this would make understanding dependencies more transparent and easier for end users over other approaches.

The downside with this approach is that end users that are doing graph traversal will need to understand that dependences can be either a package or a file, which might be surprising. We do have precedence for this in other relationships contexts in the graph already (e.g. this package owns this files).

@wagoodman
Copy link
Contributor Author

@mythi we have implemented some of this in a couple of ways:

The first: #2626 which added enumerations of binary imports, exports, and an indication if there is an entrypoint:

$ syft alpine:latest -o json | jq '.files[] | select(.executable != null)'
{
  "id": "ff9969c3449b1e27",
  "location": {
    "path": "/sbin/apk",
    "layerID": "sha256:d4fc045c9e3a848011de66f34b81f052d4f2c15a17bb196d637e526349601820"
  },
  "metadata": {
    "mode": 755,
    "type": "RegularFile",
    "userID": 0,
    "groupID": 0,
    "mimeType": "application/x-sharedlib",
    "size": 69648
  },
  "digests": [ ... ],
  "executable": {
    "format": "elf",
    "hasExports": true,
    "hasEntrypoint": true,
    "importedLibraries": [
      "libcrypto.so.3",
      "libz.so.1",
      "libapk.so.2.14.0",
      "libc.musl-x86_64.so.1"
    ],
    "elfSecurityFeatures": { ...  }
  }
}

This doesn't create any relationships between binaries at all, or raises them to the level of packages (they are under only the "files" section), but it is something.

The second enhancement is around #2396 and #2715 which looks for indications of ELF notes embedded in the binary that indicate package information. This elevates individual binaries or groups of binaries as packages and additionally creates relationships between those new ELF packages and other existing packages and files based on binary imports and exports.

@mythi I'm curious based on your +1 does this fits your needs? or are you looking for additional / different information?

@mythi
Copy link

mythi commented May 23, 2024

@wagoodman thanks for the follow-up! I need to find the time to give it a try. At a quick glance it looks exactly what I was thinking but I'd have to test it out.

My use-case is rather special but syft is great match for it (thanks!): I have a custom template which spits out Gramine-SGX trusted files TOML tables for individual files in a container image. With this, I should be able to enhance the template to skip unnecessary image files and only add table entries for the main app executable and its library deps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

3 participants