-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
media-types: Define '+gzip' structured syntax suffix #332
Conversation
@@ -32,7 +32,8 @@ This specification uses the following terms: | |||
<dd> | |||
A layer DiffID is a SHA256 digest over the layer's uncompressed tar archive and serialized in the descriptor digest format, e.g., <code>sha256:a9561eb1b190625c9adb5a9513e72c4dedafc1cb2d4c5236c9a6957ec7dfd5a9</code>. | |||
Layers must be packed and unpacked reproducibly to avoid changing the layer ID, for example by using tar-split to save the tar headers. | |||
NOTE: the DiffID is different than the digest in the manifest list because the manifest digest is taken over the gzipped layer for <code>application/vnd.oci.image.layer.tar+gzip</code> types. | |||
The DiffID is different than the layer digest in the <a href="manifest.md#image-manifest-property-descriptions">manifest's <code>layers</code></a> because the layer digest is taken over the blob regardless of compression, while the DiffID is taken after removing any compression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this could be phrased as "The DiffID can be different than the layer digest...". Can it be the case that the manifest refers to uncompressed blobs? Or is the spec recommending against this practice elsewhere?
@wking I wonder how binary diffing can work with the tar format. Are there any tools you know off that can make sense of the binary diff between two tar files? |
It's not quite a binary diff of two tar files, though creating a |
honestly, the wording for the suffix isn't so bad, but you've removed
all the places that use the suffix? That seems confusing.
|
On Tue, Sep 20, 2016 at 08:44:19AM -0700, Vincent Batts wrote:
In most places where we talk about these types, we're talking about So I've left the +gzip on in the examples, but I don't think we need |
On Tue, Sep 20, 2016 at 08:37:33AM -0700, Vincent Batts wrote:
Assuming you have the same file order, regular old diff is sufficient $ wget https://archive.org/download/alicesadventures19033gut/19033.txt This eBook is for the use of anyone anywhere at no cost and with Title: Alice in Wonderland -Author: Lewis Carroll Illustrator: Gordon Robinson @@ -1338,7 +1338,7 @@ -End of the Project Gutenberg EBook of Alice in Wonderland, by Lewis Carroll *** END OF THIS PROJECT GUTENBERG EBOOK ALICE IN WONDERLAND *** |
On Tue, Sep 20, 2016 at 04:43:49AM -0700, George Lestaris wrote:
The resulting digest might be the same, but the logic for getting
I think using uncompressed layers in the manifest should be fine, if |
On Tue, Sep 20, 2016 at 09:44:55AM -0700, W. Trevor King wrote:
However, it is relevant when we equate our layer type with |
On 20/09/16 09:52 -0700, W. Trevor King wrote:
You can choose this for yourself, but it's not an all inclusive |
On Tue, Sep 20, 2016 at 04:50:29PM -0700, Vincent Batts wrote:
Agreed. I'm sure folks who want to put in more effort can do better. |
There are a few problems with this PR. The first is that we don't want to introduce CAS concepts at the config level. The The main problem is that we introduce structure to what should just be string constants. The media types should just string matching. |
On Wed, Sep 21, 2016 at 04:52:37PM -0700, Stephen Day wrote:
The media-type reference for diff_ids is an informative example, since
They're called “structured syntax suffixes” for a reason ;). I agree But “scales better” only matters if we actually do grow the number of And if the idea of a structured syntax suffix makes you jumpy, I'm |
@wking 👎 I'm not wasting any more time explaining why these changes are a bad idea. |
On Thu, Sep 22, 2016 at 02:57:56PM -0700, Stephen Day wrote:
I'm proposing 1 ways to address both of your concerns 2. If you Or are you against allowing uncompressed layers at all? In that case |
@wking Dismissing is not the same as addressing. In practice, with CAS, allowing this variation at all causes a host of problems around hash stability and re-use of the existing body of content. For now, let's err on having just compressed layers. Structure can always be added but once you let this out of the bag, it cannot be returned. |
On Thu, Sep 22, 2016 at 05:56:42PM -0700, Stephen Day wrote:
That's certainly true, but I don't think I'm dismissing your concerns.
Do you have further concerns which I have dismissed without addressing?
“Things work fairly well as they stand; let's not rock the boat on |
@wking i'm not strictly opposed to the possibility of non-gzipped tar archives, but I feel the way you've introduced the extent of it being optional here is too far. particularly, seeing the truncated form (without +gzip) seems to infer that this is the default. For Docker, this would break some notions of the cache that the registries hold. |
On Thu, Oct 06, 2016 at 10:42:40AM -0700, Vincent Batts wrote:
There is no “default”; both are valid. But I've pushed 96f3a67 → I've also added some language requiring OCI implementations to support
I'm missing something here. How does registry cache come in? The
That sounds like the bit under (2) in 1. I'll file that alternative |
Rebased onto master with 3cc036e → 2829f04, resolving some minor conflicts and adding support for unpacking both gzipped and uncompressed layers. I expect |
I'd prefer defining this as a structured syntax suffix following RFC 6839, and have filed a pull request to that effect [1]. However, the current maintainer consensus seems to be to define the compressed and uncompressed types directly without declaring a structured syntax suffix pattern [2]. I'm not clear on the reason for avoiding the structured syntax suffix, but that's the route I've taken in this commit. Now that you can choose both compressed or uncompressed media types, it is easy to clarify DiffIDs by comparing types with and without the +gzip compression. media type. It also allows you to create image-layout instances where the layers are stored uncompressed, which may be useful for cases such as: * Binary diffing between layer blobs for cheaper updates of large layers [3]. * Compressing an image-layout tarball for a smaller smaller overall tarball (by avoiding the unnecessary fragmentation of compressing the individual blob entries). Also update unpackLayer to handle both compressed and uncompressed layers. I expect unpackLayer will end up in image-tools, so I haven't invested a lot of time polishing this implementation. But without *some* sort of change the manifest tests fail. [1]: opencontainers#332 [2]: opencontainers#332 (comment) [3]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-08-16.log.html#t2016-08-16T23:35:43 Signed-off-by: W. Trevor King <[email protected]>
On Thu, Oct 06, 2016 at 03:57:45PM -0700, W. Trevor King wrote:
Filed as #388. |
Leaning heavily on the existing entries in RFC 6839. The suffix makes it easy to clarify DiffIDs without requiring a particular layer media type. It also allows you to create image-layout instances where the layers are stored uncompressed, which may be useful for cases such as: * Binary diffing between layer blobs for cheaper updates of large layers [1]. * Compressing an image-layout tarball for a smaller smaller overall tarball (by avoiding the unnecessary fragmentation of compressing the individual blob entries). [1]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-08-16.log.html#t2016-08-16T23:35:43 Signed-off-by: W. Trevor King <[email protected]>
Generated with: $ make img/media-types.png and Graphviz version 2.38.0. Signed-off-by: W. Trevor King <[email protected]>
I'd prefer defining this as a structured syntax suffix following RFC 6839, and have filed a pull request to that effect [1]. However, the current maintainer consensus seems to be to define the compressed and uncompressed types directly without declaring a structured syntax suffix pattern [2]. I'm not clear on the reason for avoiding the structured syntax suffix, but that's the route I've taken in this commit. Now that you can choose both compressed or uncompressed media types, it is easy to clarify DiffIDs by comparing types with and without the +gzip compression. media type. It also allows you to create image-layout instances where the layers are stored uncompressed, which may be useful for cases such as: * Binary diffing between layer blobs for cheaper updates of large layers [3]. * Compressing an image-layout tarball for a smaller smaller overall tarball (by avoiding the unnecessary fragmentation of compressing the individual blob entries). [1]: opencontainers#332 [2]: opencontainers#332 (comment) [3]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-08-16.log.html#t2016-08-16T23:35:43 Signed-off-by: W. Trevor King <[email protected]>
I like the notion of defining the structured syntax, but the mechanic of leaving it up to implementations for detecting and trying different suffixes seems to sprawl that clarity rather than refine it. |
I'd prefer defining this as a structured syntax suffix following RFC 6839, and have filed a pull request to that effect [1]. However, the current maintainer consensus seems to be to define the compressed and uncompressed types directly without declaring a structured syntax suffix pattern [2]. I'm not clear on the reason for avoiding the structured syntax suffix, but that's the route I've taken in this commit. Now that you can choose both compressed or uncompressed media types, it is easy to clarify DiffIDs by comparing types with and without the +gzip compression. media type. It also allows you to create image-layout instances where the layers are stored uncompressed, which may be useful for cases such as: * Binary diffing between layer blobs for cheaper updates of large layers [3]. * Compressing an image-layout tarball for a smaller smaller overall tarball (by avoiding the unnecessary fragmentation of compressing the individual blob entries). [1]: opencontainers#332 [2]: opencontainers#332 (comment) [3]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-08-16.log.html#t2016-08-16T23:35:43 Signed-off-by: W. Trevor King <[email protected]>
I'd prefer defining this as a structured syntax suffix following RFC 6839, and have filed a pull request to that effect [1]. However, the current maintainer consensus seems to be to define the compressed and uncompressed types directly without declaring a structured syntax suffix pattern [2]. I'm not clear on the reason for avoiding the structured syntax suffix, but that's the route I've taken in this commit. Now that you can choose both compressed or uncompressed media types, it is easy to clarify DiffIDs by comparing types with and without the +gzip compression. media type. It also allows you to create image-layout instances where the layers are stored uncompressed, which may be useful for cases such as: * Binary diffing between layer blobs for cheaper updates of large layers [3]. * Compressing an image-layout tarball for a smaller smaller overall tarball (by avoiding the unnecessary fragmentation of compressing the individual blob entries). [1]: opencontainers#332 [2]: opencontainers#332 (comment) [3]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-08-16.log.html#t2016-08-16T23:35:43 Signed-off-by: W. Trevor King <[email protected]>
I'd prefer defining this as a structured syntax suffix following RFC 6839, and have filed a pull request to that effect [1]. However, the current maintainer consensus seems to be to define the compressed and uncompressed types directly without declaring a structured syntax suffix pattern [2]. I'm not clear on the reason for avoiding the structured syntax suffix, but that's the route I've taken in this commit. Now that you can choose both compressed or uncompressed media types, it is easy to clarify DiffIDs by comparing types with and without the +gzip compression. media type. It also allows you to create image-layout instances where the layers are stored uncompressed, which may be useful for cases such as: * Binary diffing between layer blobs for cheaper updates of large layers [3]. * Compressing an image-layout tarball for a smaller smaller overall tarball (by avoiding the unnecessary fragmentation of compressing the individual blob entries). [1]: opencontainers#332 [2]: opencontainers#332 (comment) [3]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-08-16.log.html#t2016-08-16T23:35:43 Signed-off-by: W. Trevor King <[email protected]>
I'd prefer defining this as a structured syntax suffix following RFC 6839, and have filed a pull request to that effect [1]. However, the current maintainer consensus seems to be to define the compressed and uncompressed types directly without declaring a structured syntax suffix pattern [2]. I'm not clear on the reason for avoiding the structured syntax suffix, but that's the route I've taken in this commit. Now that you can choose both compressed or uncompressed media types, it is easy to clarify DiffIDs by comparing types with and without the +gzip compression. media type. It also allows you to create image-layout instances where the layers are stored uncompressed, which may be useful for cases such as: * Binary diffing between layer blobs for cheaper updates of large layers [3]. * Compressing an image-layout tarball for a smaller smaller overall tarball (by avoiding the unnecessary fragmentation of compressing the individual blob entries). [1]: opencontainers/image-spec#332 [2]: opencontainers/image-spec#332 (comment) [3]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-08-16.log.html#t2016-08-16T23:35:43 Signed-off-by: W. Trevor King <[email protected]>
I'd prefer defining this as a structured syntax suffix following RFC 6839, and have filed a pull request to that effect [1]. However, the current maintainer consensus seems to be to define the compressed and uncompressed types directly without declaring a structured syntax suffix pattern [2]. I'm not clear on the reason for avoiding the structured syntax suffix, but that's the route I've taken in this commit. Now that you can choose both compressed or uncompressed media types, it is easy to clarify DiffIDs by comparing types with and without the +gzip compression. media type. It also allows you to create image-layout instances where the layers are stored uncompressed, which may be useful for cases such as: * Binary diffing between layer blobs for cheaper updates of large layers [3]. * Compressing an image-layout tarball for a smaller smaller overall tarball (by avoiding the unnecessary fragmentation of compressing the individual blob entries). [1]: opencontainers/image-spec#332 [2]: opencontainers/image-spec#332 (comment) [3]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-08-16.log.html#t2016-08-16T23:35:43 Signed-off-by: W. Trevor King <[email protected]>
I'd prefer defining this as a structured syntax suffix following RFC 6839, and have filed a pull request to that effect [1]. However, the current maintainer consensus seems to be to define the compressed and uncompressed types directly without declaring a structured syntax suffix pattern [2]. I'm not clear on the reason for avoiding the structured syntax suffix, but that's the route I've taken in this commit. Now that you can choose both compressed or uncompressed media types, it is easy to clarify DiffIDs by comparing types with and without the +gzip compression. media type. It also allows you to create image-layout instances where the layers are stored uncompressed, which may be useful for cases such as: * Binary diffing between layer blobs for cheaper updates of large layers [3]. * Compressing an image-layout tarball for a smaller smaller overall tarball (by avoiding the unnecessary fragmentation of compressing the individual blob entries). [1]: opencontainers/image-spec#332 [2]: opencontainers/image-spec#332 (comment) [3]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-08-16.log.html#t2016-08-16T23:35:43 Signed-off-by: W. Trevor King <[email protected]>
I'd prefer defining this as a structured syntax suffix following RFC 6839, and have filed a pull request to that effect [1]. However, the current maintainer consensus seems to be to define the compressed and uncompressed types directly without declaring a structured syntax suffix pattern [2]. I'm not clear on the reason for avoiding the structured syntax suffix, but that's the route I've taken in this commit. Now that you can choose both compressed or uncompressed media types, it is easy to clarify DiffIDs by comparing types with and without the +gzip compression. media type. It also allows you to create image-layout instances where the layers are stored uncompressed, which may be useful for cases such as: * Binary diffing between layer blobs for cheaper updates of large layers [3]. * Compressing an image-layout tarball for a smaller smaller overall tarball (by avoiding the unnecessary fragmentation of compressing the individual blob entries). [1]: opencontainers/image-spec#332 [2]: opencontainers/image-spec#332 (comment) [3]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-08-16.log.html#t2016-08-16T23:35:43 Signed-off-by: W. Trevor King <[email protected]>
I'd prefer defining this as a structured syntax suffix following RFC 6839, and have filed a pull request to that effect [1]. However, the current maintainer consensus seems to be to define the compressed and uncompressed types directly without declaring a structured syntax suffix pattern [2]. I'm not clear on the reason for avoiding the structured syntax suffix, but that's the route I've taken in this commit. Now that you can choose both compressed or uncompressed media types, it is easy to clarify DiffIDs by comparing types with and without the +gzip compression. media type. It also allows you to create image-layout instances where the layers are stored uncompressed, which may be useful for cases such as: * Binary diffing between layer blobs for cheaper updates of large layers [3]. * Compressing an image-layout tarball for a smaller smaller overall tarball (by avoiding the unnecessary fragmentation of compressing the individual blob entries). [1]: opencontainers/image-spec#332 [2]: opencontainers/image-spec#332 (comment) [3]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-08-16.log.html#t2016-08-16T23:35:43 Signed-off-by: W. Trevor King <[email protected]>
Leaning heavily on the existing entries in RFC 6839. The suffix makes it easy to clarify DiffIDs without requiring a particular layer media type. It also allows you to create image-layout instances where the layers are stored uncompressed, which may be useful for cases such as:
The suffix has been discussed briefly in #316 and it would make #328 and #330 easier to address.