OCI layout context improvements #3118

jedevc · 2022-09-16T15:44:37Z

Follow up to #2827 with:

Fix for consistently occurring 5-15s timeouts (depending on caching) when resolving OCI layout content (split out into source: avoid hang if no session id for oci-layout #3122)
Allow using tags instead of only digests (WIP)
Refactoring for reference parsing to rely on containerd's and docker distribution's reference.Parse functions, instead of manual parsing wherever possible
- One of these changes breaks API compatibility between the front-end and buildkit! (by prefixing the storeID hostname on the frontend, and storing it in LLB, instead of splitting that logic across the backend and the frontend). However, since we haven't released OCI sources in buildkit yet, I think this should be fine, but I've split it out into a separate commit so we can easily drop it if necessary. I don't this should affect the buildx release we've recently made.

jedevc · 2022-09-16T15:51:42Z

cmd/buildctl/build/ocilayout.go

+type indexedStore struct {
+	content.Store
+	index ocispecs.Index
+}


This is the bit I'm not so sure about. We need some way on the server to work out what tag is present - we could potentially just use the .Walk available to us, and then fetch all the content, and unpacking it all to look for the right annotations in the manifest. However, with even small content stores, that's unnecessary amounts of content to walk over and scan.

This store overrides the local store, which performs a special filtering operation to provide for when an indexed piece of content appears - probably not ideal, but this works entirely inline with the content store API.

The only other alternative that comes to mind is to create the store as a local directory as well, to be able to read /index.json from buildkit, but that feels frustrating, given we only want to read index.json, none of the other content.

Unfortunately, what works for the local cache exporter doesn't work here (which is to modify the --cache-to parameters using the index.json before sending them to buildkit), since we don't know exactly which tags the frontend will request, they don't necessarily need to be sent over in the client request (and if they were, different frontends might take different formats).

Walk() is really linearly scanning the index.json to find all of the tags, eventually stopping when the fn matches? In the single use case we have, it just looks for a matching tag?

However, with even small content stores, that's unnecessary amounts of content to walk over and scan.

And yet, there really is no other way. There is no b-tree format or similar efficient store for index.json. It has to be scanner linearly.

The one concern I would have is with using tag. I have seen cases where the annotation is used for just the tag, others where it is used for the entire name. And then when you consider canonicalization of the name, it gets messy.

So if my ref is alpine:3.16, which of the following should it be looking for?

3.16

alpine:3.16

docker.io/library/alpine:3.16

Again, I have seen all of the above used. Where I use oci-layout setups, I am strict about full canonicalization before writing and upon reading, is whether you wrote an image as alpine:3.16 or library/alpine:3.16 or docker.io/library/alpine:3.16, you got docker.io/library/alpine:3.16. Similarly, when reading.

But that is hardly universally agreed.

Walk() is really linearly scanning the index.json to find all of the tags, eventually stopping when the fn matches? In the single use case we have, it just looks for a matching tag?

...yup, pretty much 😄 The code isn't quite completed, it doesn't play well with any of the other filters, so I need to tidy that up. However, overriding Walk() on the build client to only scan index.json instead of needing to traverse the entire content store is an improvement 😄

I have seen cases where the annotation is used for just the tag, others where it is used for the entire name. And then when you consider canonicalization of the name, it gets messy.

I think the best move here is to conform with how we output OCI images today, using containerd. In this case, org.opencontainers.image.ref.name is used to store only the label+digest, while io.containerd.image.name is used to store the full name.

It is worth making sure we're happy with the strategy here though: if we decide to support a syntax like oci-layout://<oci-store>/<image-in-oci-store>:<tag> instead of just today's oci-layout://<oci-store>:<tag>, then we definitely need the long-form name to determine that.

Unfortunately, I can't think of anything reasonable that would let us distinguish between 3.16, alpine:3.16 and docker.io/library/alpine:3.16.

Here's a suggestion:

We prefer io.containerd.image.name over org.opencontainers.image.ref.name if both are present

We attempt to parse the tag name as a full reference (with host) i.e. docker.io/library/alpine:3.16 using containerd's reference.Parse

Otherwise we assume it's a plain tag, i.e. 3.16 (with a special empty name, so we can detect it later)

(we ignore the possibility that the tag is formatted as alpine:3.16 in the oci-store, since then we'd also want to allow parsing alpine and there's no way of distinguishing the alpine tag with the alpine image name)

With this, in this PR we'd only compare tag values in index.json with the tag specified in the oci-store (so if the OCI store contains x:123 and y:123, specifying oci-layout://<oci-store>:123 could select either.

In a future PR, we should add support for ci-layout://<oci-store>/<image-in-oci-store>:<tag>, where we can compare a full name, but I'd rather do that as a follow-up instead of here.

The other alternative would be to check the specified name and tag in the oci-layout:// url against every possible name annotation: so specifying /alpine would give you the alpine image, etc - however, you'd also have to allow oci-layout://<oci-store>/latest which was originally meant to have the tag :latest

We could start with the strictest use case, and look at relaxing it over time, i.e. we only support the full canonical name docker.io/library/alpine:3.16 in <image-in-oci-store>. So you must do oci-layout://docker.io/library/alpine:3.16. We only would ever support the entire canonical name in the store's index.json. It does make it easier for us to parse.

One step more flexible, which is almost as easy to handle, is to support anything valid in the oci-layout:// URI (it sort of is a URI), but canonical names only in the store, i.e. we relax it in oci-layout://. So whatever we get in <image-in-oci-store> is canonicalized, but it must be valid for that parsing. Just like docker run can handle any of these:

alpine (canonicalized to docker.io/library/alpine:latest)

alpine:3.16 (canonicalized to docker.io/library/alpine:3.16)

library/alpine:3.16 (canonicalized to docker.io/library/alpine:3.16)

docker.io/library/alpine:3.16 (already canonicalized)

But just a tag would not be accepted: latest, as we cannot tell what that is.

That is the way I have done most things that read/write from an OCI layout, under the assumption that the layout would include multiple images, not just multiple tags for a single image, forcing me to do that.

Thoughts?

deitch · 2022-09-18T09:11:00Z

client/llb/source.go

@@ -455,7 +455,7 @@ func Differ(t DiffType, required bool) LocalOption {
 	})
 }

-func OCILayout(contentStoreID string, dig digest.Digest, opts ...OCILayoutOption) State {
+func OCILayout(ref string, opts ...OCILayoutOption) State {


I don't fully get what we gain from going to a ref string vs handling the two separately. Aren't we better off having them separate, as we can require a hash? We never really added support for tag-based parsing of OCI layout source, mainly because there is no agreed standard for doing it. I have used the tags org.opencontainers.image.ref.name and io.containerd.image.name, as have many others, but there is no official agreement quite yet.

Oh, OK, now I see it. You are adopting precisely that. No argument, very much in favour of it.

deitch · 2022-09-18T09:20:29Z

Fix for consistently occurring 5-15s timeouts (depending on caching) when resolving OCI layout content

What was the source of that?

deitch · 2022-09-18T09:20:58Z

since we haven't released OCI sources in buildkit yet, I think this should be fine, but I've split it out into a separate commit so we can easily drop it if necessary

It is in buildx, so we would need to update that too.

deitch · 2022-09-20T09:31:23Z

May I make a suggestion? Split this into two PRs. The first to fix the timeout/delay issue and simplify the reference parsing, then a second for the tags. That way you can get the first one in quickly, and we can spend time on the tags discussion.

deitch · 2022-09-20T09:31:44Z

FWIW, I think we very much should support full tags. Anything that is halfway sane is good.

deitch · 2022-09-20T12:44:35Z

Nice job on the split; probably should remove the stuff in #3122 from here, just to make it easier to review (although once that merges in and you rebase, it all should go away).

Instead of using custom parsing mechansisms for references in oci-layout, we use containerd's reference.Parse or docker distribution's reference.Parse. These operations for us, with hopefully more consistent error messages, and better handling of labels (for when these are introduced). Signed-off-by: Justin Chadwell <[email protected]>

Previously, the hostname (repesenting the storeID) was explicitly prefixed in both the dockerfile frontend and the llb source code, across both sides of the API boundary. Signed-off-by: Justin Chadwell <[email protected]>

This patch allows ommiting the digest in oci-layout named contexts, and using a tag instead. Unfortunately, client changes are required to expose the contents of index.json - to do this, we wrap the containerd local store, and override the .Walk method to process queries that ask for only the indexed contents, instead of all the blobs. We allow for two schemes: - A full tag, e.g. oci-layout://<store>/alpine, which will look for the canonical docker.io/library/alpine:latest in index.json - A partial tag, e.g. oci-layout://<store>:latest, which will search for an image in index.json that has a :latest tag. Interally, we support this by translating this to an image <store>/*:latest, so that we can transport it through the OCI image identifier structure. Signed-off-by: Justin Chadwell <[email protected]>

jedevc · 2022-09-22T15:03:19Z

Have pushed an update to handle full tags, e.g. of the form oci-layout://<store>/<image>[:tag][@digest] as well as the existing oci-layout://<store>[:tag][@digest]. All images with this full form are normalized before looking them up in index.json - so an OCI index containing an image library/alpine won't be recognized, it would need to be docker.io/library/alpine:latest.

Handling this as well as the previous :tag case is fiddly, so going to add some more tests to check this. Importantly, we should support all the different OCI outputs that docker itself produces, though I've relaxed it a little bit to try and find a "best case" if those fail - that said, I don't think we should spend much effort on covering anything outside of what we produce.

Signed-off-by: Justin Chadwell <[email protected]>

jedevc requested a review from tonistiigi September 16, 2022 15:44

jedevc marked this pull request as draft September 16, 2022 15:44

jedevc commented Sep 16, 2022

View reviewed changes

jedevc force-pushed the oci-layout-context branch from 9b1b9b0 to 8cca091 Compare September 16, 2022 16:15

deitch reviewed Sep 18, 2022

View reviewed changes

jedevc mentioned this pull request Sep 20, 2022

source: avoid hang if no session id for oci-layout #3122

Merged

jedevc force-pushed the oci-layout-context branch from 8cca091 to 6be47da Compare September 21, 2022 08:42

source: pass oci-layout source hostname over llb

a1d52f8

Previously, the hostname (repesenting the storeID) was explicitly prefixed in both the dockerfile frontend and the llb source code, across both sides of the API boundary. Signed-off-by: Justin Chadwell <[email protected]>

jedevc force-pushed the oci-layout-context branch from 6be47da to 0296780 Compare September 21, 2022 09:10

jedevc requested a review from AkihiroSuda September 21, 2022 11:16

jedevc force-pushed the oci-layout-context branch from 0296780 to 9411667 Compare September 22, 2022 14:57

jedevc force-pushed the oci-layout-context branch from 9411667 to c73e695 Compare September 22, 2022 14:57

dockerfile: add tests for tagged oci-layout contexts

5697c9a

Signed-off-by: Justin Chadwell <[email protected]>

jedevc mentioned this pull request Oct 18, 2022

oci exporter: support exporting directories #3162

Merged

jedevc mentioned this pull request Nov 22, 2022

Refactor reference parsing for oci-layout #3300

Merged

jedevc closed this Dec 6, 2022

jedevc mentioned this pull request Dec 7, 2022

Split out OCI store #3371

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCI layout context improvements #3118

OCI layout context improvements #3118

jedevc commented Sep 16, 2022 •

edited

Loading

jedevc Sep 16, 2022

deitch Sep 18, 2022

jedevc Sep 20, 2022

jedevc Sep 20, 2022 •

edited

Loading

deitch Sep 20, 2022

deitch Sep 18, 2022

deitch Sep 18, 2022

deitch commented Sep 18, 2022

deitch commented Sep 18, 2022

deitch commented Sep 20, 2022

deitch commented Sep 20, 2022

deitch commented Sep 20, 2022

jedevc commented Sep 22, 2022

OCI layout context improvements #3118

OCI layout context improvements #3118

Conversation

jedevc commented Sep 16, 2022 • edited Loading

jedevc Sep 16, 2022

Choose a reason for hiding this comment

deitch Sep 18, 2022

Choose a reason for hiding this comment

jedevc Sep 20, 2022

Choose a reason for hiding this comment

jedevc Sep 20, 2022 • edited Loading

Choose a reason for hiding this comment

deitch Sep 20, 2022

Choose a reason for hiding this comment

deitch Sep 18, 2022

Choose a reason for hiding this comment

deitch Sep 18, 2022

Choose a reason for hiding this comment

deitch commented Sep 18, 2022

deitch commented Sep 18, 2022

deitch commented Sep 20, 2022

deitch commented Sep 20, 2022

deitch commented Sep 20, 2022

jedevc commented Sep 22, 2022

jedevc commented Sep 16, 2022 •

edited

Loading

jedevc Sep 20, 2022 •

edited

Loading