-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
canonicalization: Add recommendations for canonicalization #339
Conversation
On Sat, Sep 24, 2016 at 10:04:53AM -0700, Jonathan Boulle wrote:
Canonicalization is a way to avoid redundant blobs for a particular |
OK, that makes sense. How about moving this to a more generic "implementers guide"-type document (as a loose example of this see appc) |
On Mon, Sep 26, 2016 at 07:10:31AM -0700, Jonathan Boulle wrote:
I'm ok with that, but I prefer a normative SHOULD to informative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, fair point, I think I'm just not quite on board with the wording yet. Added some suggestions.
@@ -0,0 +1,19 @@ | |||
# Canonicalization | |||
|
|||
One benefit of content-addressable storage is easy deduplication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly more intro context - something like "One of the goals of the OCI Image Specification is to leverage content-addressable storage (CAS), which provides benefits like easy deduplication"
|
||
## JSON | ||
|
||
[JSON][] content SHOULD be serialized as [canonical JSON][canonical-json]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON content (for example, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON content (for example all of the bla bla media types or bla bla manifests) SHOULD be serialized as
|
||
One benefit of content-addressable storage is easy deduplication. | ||
Many images might depend on a particular [layer](layer.md), the there will only be one blob in the [store](image-layout.md). | ||
However, that same semantic layer serialized slightly differently would have a different hash, and if both versions of the layer are referenced there will be two blobs with the same semantic content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that same layer, if serialized in a different way, would have an entirely different hash, and hence if both versions of the layer, there will be two underlying blobs in the store with the same semantic comment.
0b0c8e6
to
3eb6af7
Compare
On Tue, Oct 04, 2016 at 06:02:13AM -0700, Jonathan Boulle wrote:
I don't think content-addressability is a goal in itself. It's just a
This line is a normative SHOULD, and if the PR lands I will file The following sections contain canonicalization recommendations for
“hence if both versions of the layer” seems to be missing a word or |
@wking if someone does an inline review, please reply inline, otherwise it's impossible to follow |
# Canonicalization | ||
|
||
OCI Images [are](descriptor.md) [content-addressable](image-layout.md). | ||
One benefit of content-addressable storage is easy deduplication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already explicitly called out as a goal @ https://github.com/opencontainers/image-spec/blame/e74c6c703315a8d2baaea691b527194377c33a8d/manifest.md#L4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I stand corrected. I still think the current wording covers that though. Would you still prefer your initial suggestion?
|
||
OCI Images [are](descriptor.md) [content-addressable](image-layout.md). | ||
One benefit of content-addressable storage is easy deduplication. | ||
Many images might depend on a particular [layer](layer.md), the there will only be one blob in the [store](image-layout.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the there" needs fixing
@@ -47,3 +47,5 @@ The following figure shows how the above media types reference each other: | |||
|
|||
[Descriptors](descriptor.md) are used for all references. | |||
The manifest list being a "fat manifest" references one or more image manifests per target platform. An image manifest references exactly one target configuration and possibly many layers. | |||
|
|||
[JSON]: http://json.org/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reference-style link label defining JSON. In the in-flight opencontainers/runtime-spec#584 I change a similar runtime-spec link to point at RFC 7159, and I can do that here too if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, fine, I asked the wrong question :/. Let me try again: why is it here? (I don't see any JSON references in this document)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a separate file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, so it is. Thanks for banging my head into that until I understood :p.
Looks okay overall after a few more fixes |
On Wed, Oct 05, 2016 at 05:09:38AM -0700, Jonathan Boulle wrote:
I'll try, but GitHub no longer has a way to do that via email. When I |
3eb6af7
to
4729086
Compare
I've pushed 3eb6af7 → 4729086 which:
|
|
||
* [Go][]: [github.com/docker/go][], which claims to implement [canonical JSON][canonical-json] except for Unicode normalization. | ||
|
||
Of the [OCI Image Format Specification media types](media-types.md), all the types ending in `+json` contain JSON content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this above the "Implementations" imo.
4729086
to
b695fb3
Compare
Spun off from [1], now that we seem to have reached a consensus for "SHOULD canonical JSON" there. I've set this up so we have space to add canonicalization recommendations for other formats, although the only other basic type discussed in this repository is a gzipped tarball and that's more than I want to bite off at the moment ;). [1]: opencontainers#259 Subject: manifest json fields order Signed-off-by: W. Trevor King <[email protected]>
b695fb3
to
1685bae
Compare
I've pushed b695fb3 → 1685bae dropping the JSON link label from |
while i'm not in disagreement here, i fell like this is describing something without describing anything. |
On Thu, Oct 06, 2016 at 09:44:02AM -0700, Vincent Batts wrote:
How would symlinks deduplicate blobs? Maybe if you had a cluster of |
I know we've already merged this, but how can Canonical JSON be considered a viable specification when the only reference is to a wiki page that could be changed at any time? I brought this up at the time when we started using canonical json in docker. Docker Distribution attempts to take this further, but canonical json is insufficient for this purpose. I am really confused as to why we consider Canonical JSON a viable specification. |
On Wed, Oct 12, 2016 at 02:21:25PM -0700, Stephen Day wrote:
I agree that a more formal spec would be nice 1. If you have one, And the last change to the spec itself was in 2008 2, so it's fairly
The goal is to reduce the number of blobs carrying the same semantic |
Spun off from #259, now that we seem to have reached a consensus for “SHOULD canonical JSON” there. I've set this up so we have space to add canonicalization recommendations for other formats, although the only other basic type discussed in this repository is a gzipped tarball and that's more than I want to bite off at the moment ;).