Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract vm images #16 #20

Merged
merged 50 commits into from
Jun 1, 2021
Merged

Extract vm images #16 #20

merged 50 commits into from
Jun 1, 2021

Conversation

pombredanne
Copy link
Member

@pombredanne pombredanne commented Apr 6, 2021

This PR:

  • adds support to extract filesystems as files as found in VM images from VirtualBox, VMware and QEMU using libguestfs for Extract VM images #16
  • provides a new way to configure third-party binaries (using environment variables, plugins or PATH)

steven-esser and others added 9 commits January 18, 2021 20:01
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Steven Esser <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
With this all supported archive formats with be tried.

Signed-off-by: Philippe Ombredanne <[email protected]>
THis is a two step extraction using libguestfs to get a FS to a tarball
which is then extractcode normally (hence dealing with links, device
files and other permission oddities as a side effect).

We support VDI (VirtualBox, VMDK (VMware) and QCOW2 (QEMU)

Signed-off-by: Philippe Ombredanne <[email protected]>
@pombredanne pombredanne changed the title [WIP] Extract vm images #16 Extract vm images #16 Apr 22, 2021
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Use virtualenv-embedded libraries
Signed-off-by: Philippe Ombredanne <[email protected]>
This will work even from a git archive or when git is not installed.

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
This was they do not end up in the template CHANGELOG.rst

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
We now load native libraries and executables from:
1. an envt. variable path
2. OR a locatin provider plugin
3. OR the PATH
or we fail

Signed-off-by: Philippe Ombredanne <[email protected]>
Reuse variables.

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
We are not extracting symlinks, though it could be useful in the
future for some cases.

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
From aboutcode-org/typecode#20

Reported-by: Pierre Tardy <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Also support Python 3.9

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
And remove v prefix from fallback version

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Apply formatting and minor refactoring.
Refine and carify documentation.

Signed-off-by: Philippe Ombredanne <[email protected]>
@pombredanne
Copy link
Member Author

All clear. Now merging for release.

@pombredanne pombredanne merged commit fabf1f2 into main Jun 1, 2021
@pombredanne pombredanne deleted the 16-vm-images branch June 1, 2021 10:28
@rwmjones
Copy link

rwmjones commented Jun 1, 2021

A few comments about the patch in no particular order:

Instead of using guestfish it may or may not be worth considering using the API directly. We have a solid Python binding that covers the entire API. Example: https://libguestfs.org/guestfs-python.3.html#example-2:-inspect-a-virtual-machine-disk-image

For extracting things like partitions from disk images (ie. without processing the filesystem or individual files), we have the lighter weight tooling of libnbd / nbdkit (https://gitlab.com/nbdkit) / qemu-nbd. For example to copy out a partition from a raw disk image (to stdout), you might use:

nbdcopy -- [ nbdkit --filter=partition file disk.img partition=1 ] -

https://libguestfs.org/nbdkit-partition-filter.1.html#EXAMPLE
https://libguestfs.org/nbdcopy.1.html

To copy out a partition from a qcow2 file (again to stdout):

nbdcopy -- [ \
    nbdkit --filter=partition nbd \
        command=qemu-nbd arg=-f arg=qcow2 arg=/path/to/image.qcow2 \
        partition=1 ] -

https://libguestfs.org/nbdkit-nbd-plugin.1.html#Add-nbdkit-partition-filter-to-qemu-nbd
https://libguestfs.org/nbdcopy.1.html

@pombredanne
Copy link
Member Author

@rwmjones thank you ++ for chiming in!

You wrote:

We have a solid Python binding that covers the entire API. Example: https://libguestfs.org/guestfs-python.3.html#example-2:-inspect-a-virtual-machine-disk-image

I considered it at first but then the uneven packaging and support for various Python versions across Linux distros made switch back to a CLI interface for now.

For extracting things like partitions from disk images (ie. without processing the filesystem or individual files), we have the lighter weight tooling of libnbd / nbdkit (https://gitlab.com/nbdkit) / qemu-nbd.

I looked at it in details too. The partitions are not enough and I would need to mound things afterwards otherwise.
Yet I really want the filesystems out so I can process a whole VM image analysis pipeline from a virtual disk in https://github.com/nexB/scancode.io/blob/main/scanpipe/pipelines/root_filesystems.py the same way we can already process whole Docker container images in https://github.com/nexB/scancode.io/blob/main/scanpipe/pipelines/docker.py

On a side note, I have been hit by https://bugs.launchpad.net/ubuntu/+source/libguestfs/+bug/1813662 and I posted there https://bugs.launchpad.net/ubuntu/+source/libguestfs/+bug/1813662/comments/25 ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants