-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCI layout context improvements #3118
Conversation
cmd/buildctl/build/ocilayout.go
Outdated
type indexedStore struct { | ||
content.Store | ||
index ocispecs.Index | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bit I'm not so sure about. We need some way on the server to work out what tag is present - we could potentially just use the .Walk
available to us, and then fetch all the content, and unpacking it all to look for the right annotations in the manifest. However, with even small content stores, that's unnecessary amounts of content to walk over and scan.
This store overrides the local store, which performs a special filtering operation to provide for when an indexed piece of content appears - probably not ideal, but this works entirely inline with the content store API.
The only other alternative that comes to mind is to create the store as a local directory as well, to be able to read /index.json
from buildkit, but that feels frustrating, given we only want to read index.json
, none of the other content.
Unfortunately, what works for the local cache exporter doesn't work here (which is to modify the --cache-to
parameters using the index.json
before sending them to buildkit), since we don't know exactly which tags the frontend will request, they don't necessarily need to be sent over in the client request (and if they were, different frontends might take different formats).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Walk()
is really linearly scanning the index.json
to find all of the tags, eventually stopping when the fn
matches? In the single use case we have, it just looks for a matching tag?
However, with even small content stores, that's unnecessary amounts of content to walk over and scan.
And yet, there really is no other way. There is no b-tree format or similar efficient store for index.json
. It has to be scanner linearly.
The one concern I would have is with using tag
. I have seen cases where the annotation is used for just the tag, others where it is used for the entire name. And then when you consider canonicalization of the name, it gets messy.
So if my ref
is alpine:3.16
, which of the following should it be looking for?
3.16
alpine:3.16
docker.io/library/alpine:3.16
Again, I have seen all of the above used. Where I use oci-layout setups, I am strict about full canonicalization before writing and upon reading, is whether you wrote an image as alpine:3.16
or library/alpine:3.16
or docker.io/library/alpine:3.16
, you got docker.io/library/alpine:3.16
. Similarly, when reading.
But that is hardly universally agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Walk()
is really linearly scanning theindex.json
to find all of the tags, eventually stopping when thefn
matches? In the single use case we have, it just looks for a matching tag?
...yup, pretty much 😄 The code isn't quite completed, it doesn't play well with any of the other filters, so I need to tidy that up. However, overriding Walk()
on the build client to only scan index.json
instead of needing to traverse the entire content store is an improvement 😄
I have seen cases where the annotation is used for just the tag, others where it is used for the entire name. And then when you consider canonicalization of the name, it gets messy.
I think the best move here is to conform with how we output OCI images today, using containerd. In this case, org.opencontainers.image.ref.name
is used to store only the label+digest, while io.containerd.image.name
is used to store the full name.
It is worth making sure we're happy with the strategy here though: if we decide to support a syntax like oci-layout://<oci-store>/<image-in-oci-store>:<tag>
instead of just today's oci-layout://<oci-store>:<tag>
, then we definitely need the long-form name to determine that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I can't think of anything reasonable that would let us distinguish between 3.16
, alpine:3.16
and docker.io/library/alpine:3.16
.
Here's a suggestion:
- We prefer
io.containerd.image.name
overorg.opencontainers.image.ref.name
if both are present - We attempt to parse the tag name as a full reference (with host) i.e.
docker.io/library/alpine:3.16
using containerd'sreference.Parse
- Otherwise we assume it's a plain tag, i.e.
3.16
(with a special empty name, so we can detect it later)
(we ignore the possibility that the tag is formatted as alpine:3.16
in the oci-store, since then we'd also want to allow parsing alpine
and there's no way of distinguishing the alpine
tag with the alpine
image name)
With this, in this PR we'd only compare tag values in index.json
with the tag specified in the oci-store (so if the OCI store contains x:123
and y:123
, specifying oci-layout://<oci-store>:123
could select either.
In a future PR, we should add support for ci-layout://<oci-store>/<image-in-oci-store>:<tag>
, where we can compare a full name, but I'd rather do that as a follow-up instead of here.
The other alternative would be to check the specified name and tag in the oci-layout://
url against every possible name annotation: so specifying /alpine
would give you the alpine
image, etc - however, you'd also have to allow oci-layout://<oci-store>/latest
which was originally meant to have the tag :latest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could start with the strictest use case, and look at relaxing it over time, i.e. we only support the full canonical name docker.io/library/alpine:3.16
in <image-in-oci-store>
. So you must do oci-layout://docker.io/library/alpine:3.16
. We only would ever support the entire canonical name in the store's index.json
. It does make it easier for us to parse.
One step more flexible, which is almost as easy to handle, is to support anything valid in the oci-layout://
URI (it sort of is a URI), but canonical names only in the store, i.e. we relax it in oci-layout://
. So whatever we get in <image-in-oci-store>
is canonicalized, but it must be valid for that parsing. Just like docker run
can handle any of these:
alpine
(canonicalized todocker.io/library/alpine:latest
)alpine:3.16
(canonicalized todocker.io/library/alpine:3.16
)library/alpine:3.16
(canonicalized todocker.io/library/alpine:3.16
)docker.io/library/alpine:3.16
(already canonicalized)
But just a tag would not be accepted: latest
, as we cannot tell what that is.
That is the way I have done most things that read/write from an OCI layout, under the assumption that the layout would include multiple images, not just multiple tags for a single image, forcing me to do that.
Thoughts?
9b1b9b0
to
8cca091
Compare
@@ -455,7 +455,7 @@ func Differ(t DiffType, required bool) LocalOption { | |||
}) | |||
} | |||
|
|||
func OCILayout(contentStoreID string, dig digest.Digest, opts ...OCILayoutOption) State { | |||
func OCILayout(ref string, opts ...OCILayoutOption) State { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully get what we gain from going to a ref string
vs handling the two separately. Aren't we better off having them separate, as we can require a hash? We never really added support for tag-based parsing of OCI layout source, mainly because there is no agreed standard for doing it. I have used the tags org.opencontainers.image.ref.name
and io.containerd.image.name
, as have many others, but there is no official agreement quite yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, OK, now I see it. You are adopting precisely that. No argument, very much in favour of it.
What was the source of that? |
It is in buildx, so we would need to update that too. |
May I make a suggestion? Split this into two PRs. The first to fix the timeout/delay issue and simplify the reference parsing, then a second for the tags. That way you can get the first one in quickly, and we can spend time on the tags discussion. |
FWIW, I think we very much should support full tags. Anything that is halfway sane is good. |
Nice job on the split; probably should remove the stuff in #3122 from here, just to make it easier to review (although once that merges in and you rebase, it all should go away). |
Instead of using custom parsing mechansisms for references in oci-layout, we use containerd's reference.Parse or docker distribution's reference.Parse. These operations for us, with hopefully more consistent error messages, and better handling of labels (for when these are introduced). Signed-off-by: Justin Chadwell <[email protected]>
8cca091
to
6be47da
Compare
Previously, the hostname (repesenting the storeID) was explicitly prefixed in both the dockerfile frontend and the llb source code, across both sides of the API boundary. Signed-off-by: Justin Chadwell <[email protected]>
6be47da
to
0296780
Compare
0296780
to
9411667
Compare
This patch allows ommiting the digest in oci-layout named contexts, and using a tag instead. Unfortunately, client changes are required to expose the contents of index.json - to do this, we wrap the containerd local store, and override the .Walk method to process queries that ask for only the indexed contents, instead of all the blobs. We allow for two schemes: - A full tag, e.g. oci-layout://<store>/alpine, which will look for the canonical docker.io/library/alpine:latest in index.json - A partial tag, e.g. oci-layout://<store>:latest, which will search for an image in index.json that has a :latest tag. Interally, we support this by translating this to an image <store>/*:latest, so that we can transport it through the OCI image identifier structure. Signed-off-by: Justin Chadwell <[email protected]>
9411667
to
c73e695
Compare
Have pushed an update to handle full tags, e.g. of the form Handling this as well as the previous |
Signed-off-by: Justin Chadwell <[email protected]>
Follow up to #2827 with:
reference.Parse
functions, instead of manual parsing wherever possibleCC @deitch