-
Notifications
You must be signed in to change notification settings - Fork 159
Automatically surface artifacts produced by a build #215
Comments
This makes me wonder: why does the Build take an explicit Thinking out loud, presumably the main issue with that would be that we'd need to specify the path and disk layout of the image inside the shared workspace (or another shared volume) in order for our implicit final init container to know how to grab it and push it? |
That's a good question. I think when we talk about "outputs" what we're actually talking about is two types of things (at least) that it's useful to know your build produced, and they're each handled separately. For container images, it's hard in Knative Build today because we expect steps themselves to push those, and we don't get much visibility into whether/where they pushed. If the build wrote an image to the docker daemon, we could push those images at the end of the build and have a solid record that the images were pushed, because we were responsible for it. (This is what GCB does, build configs have an For other artifacts (jars, tars, zips, debs, logs, etc.), we could do something like this. GCB also supports non-container artifacts, where the build specifies a pattern of files in the Basically the essential difference is that for unprivileged image builders, container images are constructed and pushed entirely during a build step's execution, and don't necessarily write any persistent data to the Does that distinction help clarify the problem, and my thinking on possible solutions? |
Yeah, thanks for the response! I think that basically confirms my guess that the basic problem is how the builder can organise for an image to be in a format where we're able to upload it. I definitely don't think you'd want to expose the docker socket in to a step or have a privileged builder, but actually once you realise that you don't want to have privileged builders on a cluster I think the problem might simplify a bit. Given you are using an unprivileged builder, you almost by definition aren't getting any benefit of layered filesystems or any magic like that (because you'd need privileges to use any of that). That means your final 'image' is just files on disk. Which means (it seems, potentially), you could just build on to a volume. As a blurry straw-man, we already have a shared /workspace volume mounted in to each step, if we had an /output volume the unprivileged builders could - instead of building and uploading an image directly - build and save it in to /output in OCI format (or list the path to the OCI image in |
I like the idea of builder writing images to volumes for us to push ourselves later, I'm just not sure why they would if they can push directly themselves already today, but let's assume we motivate them since most of them are...us 😄 . For builders that use If we go that route, I think we'd have to find some way to refactor |
Just thinking out loud: How much do we trust the builders? If we need to change them to write to disk, we might as well allow them to just self-report what they pushed. E.g. we could just scrape logs for something in the form: Docker and ggcr already output this, so it would be easy to do but pretty brittle :/ I don't think there's enough information in the normal /shrug, needs more 🤔 |
I don't think we should trust the builders to report what they pushed. An earlier alternative which was discussed was to scrape step status text and just report that, but I think the danger is too high that some system would explicitly trust that output, and that a malicious user could either omit reporting that it pushed a bad image, or report that it pushed a good image when it pushed a bad one. By having builders write images locally and having the build do the push itself at the end, we can at least verify that the image we claim to have pushed was pushed, since we did it. The egress proxy approach is I think useful because it wouldn't be possible to trick it into reporting false pushes, though it might be possible to confuse it into missing a push done by the builder. Agreed, needs more 🤔 |
What if we have the builders do all the pushing, since they have all the context required to do it efficiently (without trying to serialize that context to disk), but have them also write out the registry manifest they (would have) pushed. If we then do the final (perhaps additional) PUT of that manifest, we can guarantee that it exists and that we have write access to push it. Re: egress proxy, I doubt we can completely avoid missing some pushes, but SGTM if you think it would work and if I don't have to maintain it 😅 |
Let's consider three options for provenance:
I'd argue that the only thing Let's suppose hypothetically, that I don't trust these:
Network jails make this (and everything) harder, but not impossible. I still need to trust that the builder doesn't have an image tarball embedded in its filesystem. I think that trust in the Builder (at some level) is required for provenance to work. I don't think that necessarily means I need to trust all Builders, but I think that provenance should sign the outputs in the context of the builder SHAs, and that policy can judge provenance assertions based on builder reputation. I'm surprised @vbatts isn't here :) |
A secondary item in favor of storing metadata in a local output volume is
that you can standardize much of the post-processing (e.g. if you want to
also submit the image to a scanning service prior to upload, you can do
that in a post-step without needing to teach each builder how to do so).
The drawback is that you end up with 0-2 extra sets of disk I/O for the
produced image; depending on the work done this could be large or tiny,
though I'd expect that the large ones (e.g. kaniko) will have done a lot of
other I/O work anyway.
…On Thu, Aug 9, 2018 at 3:53 PM Matt Moore ***@***.***> wrote:
Let's consider three options for provenance:
- Builder X puts an image into the docker daemon, post-step pushes
from the daemon.
- Builder Y produces a docker save tarball, post-step pushes the
tarball.
- Builder Z publishes the image and reports what it published.
I'd argue that the only thing X and Y give you that Z does not is
confirmation that the image was written by the identity the build is
running as (this can often also be checked if outputs are declared, and if
all credentials and protocols are known by the system).
Let's suppose hypothetically, that I don't trust these:
- Builder X maliciously pulls an image and tags it instead of
building, you push the bad image.
- Builder Y does the same, but saves it to a tarball, you push the bad
image.
- Builder Z writes out that it published a bad image.
Network jails make this (and everything) harder, but not impossible. I
still need to trust that the builder doesn't have an image tarball embedded
in its filesystem.
I think that trust in the Builder (at some level) is required for
provenance to work. I don't think that necessarily means I need to trust
all Builders, but I think that provenance should sign the outputs *in the
context of the builder SHAs*, and that policy can judge provenance
assertions based on builder reputation.
I'm surprised @vbatts <https://github.com/vbatts> isn't here :)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#215 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHlyN58Rlzfg-mhYA1l-Nm_RHRyhHbHAks5uPL1OgaJpZM4U7cKg>
.
--
Evan Anderson <[email protected]>
|
As @evankanderson says the nice thing about using a local OCI image is you can share the upload steps between multiple builds, and standardise post-processing. It also seems a bit more declarative for tooling to be able to see in the spec what will be uploaded by the build in a standard way (rather than implicitly assuming a particular step is the upload step). The other maybe nice advantage of using local OCI images as the step outputs is that steps could interact with the image produced by previous steps, for example running vulnerability checks on the image from a previous step, or adding extra layers (I guess this is really just a special case of the above, though). @vbatts would be way way better than me to say this for sure, but fwiw I think the avoiding-double-IO problem with having the step build a local image is pretty soluble with the OCI format. The normal path (with the potential double disk IO problem) is to have the step produce an image in OCI format with all the needed layer blobs referenced by the manifest present in the blobs directory, which it's easy to then push to a registry. To avoid an extra copy of layers that are already in the registry I think the step can just produce an OCI image whose manifest references blobs that it doesn't bother to put in the blobs folder: as long as those are already in the registry, the uploader can skip uploading them and never notice they're not there. (@vbatts feel very free to say if this is crazy! :)) |
I'm not sure this is the most productive forum for this, perhaps we should add to next week's Build WG agenda? |
@imjasonh @julz I think having an @jonjohnsonjr said
There ought to be enough there. There is an age-old effort to have a This is the original and primary use-case for skopeo is to copy from an OCI (or docker save) layout up to a remote registry, etc. @imjasonh said
To this point, I was wonder the other day about a boilerplate plumbing that could enable builders (buildah, img, buildkit, etc) to have information for stashing in image annotations/LABELS like the git commit of source built from, digest of the BUILDER_IMAGE, etc. This info ought not be quite so ephemeral in that it is stashed in the signable artifact. @mattmoor yea this github issues make for sloppy design conversations. lol as I'm writing replies here and continuing to read down the issue, I'm seeing others have the same feedback 👍 Also also wik, good to see you again @julz ;-) |
Wouldn't this require the proxy to MITM the HTTPS connection to the registry? If so that makes me profoundly uncomfortable -- it would be a giant shining bullseye for attackers looking to perform supply chain attacks. I can imagine having private repos as write-through proxies, which will suit a lot of enterprise folks (many of whom already do this). But it won't fly with folks who want to use dockerhub, GCR etc directly, because you will not (I hope) be able to present valid certificates in their place. |
@vbatts ack, the OCI image layout would work perfectly for a sparsely populated tarball 👍 |
i'm curious about how this plays with remote builders. for example, imagine if google builder would be implemented via a container (any reason why it shouldn't be decoupled?) that just talks to the remote apis after it uploaded some source code. since image isnt locally built there wouldnt be anything to inspect (unless it's downloaded, but that's expensive). i see several approaches:
|
When a Build produces a Docker image in a registry, (or any artifact, like a JAR in object storage, etc.) that information is not surfaced anywhere in the Build itself. This information can be useful in a chain-of-custody type scenario, when you need to determine the provenance of an image, how exactly it was built, and from what source. Since this ties to security, we need to make sure it's hard to forge or modify build provenance of an image.
It would be useful to have this information automatically collected by the Build system, possibly using an egress proxy that inspects network traffic leaving the Build's Pod. This could watch for traffic that looks like an image push, and update the Build resource with the image reference and digest that was pushed. It could also push this provenance information to a service like Grafeas, which was built for exactly this kind of audit information.
The text was updated successfully, but these errors were encountered: