-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config: Use an array of strings in the ChainID definition #593
config: Use an array of strings in the ChainID definition #593
Conversation
There's no need to couple this to DiffIDs and layer application, since the implementation in identity/chainid.go is just operating on an array of strings. In the chainid.go implementation, those strings happen to be digests, but nothing digest-specific is needed. $ cat a.go package main import ( _ "crypto/sha256" "fmt" "github.com/opencontainers/go-digest" "github.com/opencontainers/image-spec/identity" ) func main() { for _, chainID := range identity.ChainIDs([]digest.Digest{"a", "b", "c"}) { fmt.Println(chainID) } } $ go run a.go a sha256:c8687a08aa5d6ed2044328fa6a697ab8e96dc34291e8c2034ae8c38e6fcc6d65 sha256:fd4283a6ddaa46181b8c3bd3f2497dd8f801e80b395b78bd6331197ba8db877d which matches the example I'm adding with this commit. Signed-off-by: W. Trevor King <[email protected]>
And, as I have tried to make you understand, this premise is completely false. The algorithm described in this section is expressed in terms of abstract layers, not strings. These changes, subsequently, are incorrect, in that it inappropriately narrows the definition. The representation of the filesystem changeset is not important for this definition. If you had listened to my response in #586, you would have already understood this. If you would like to describe the implementation, you are welcome to put the information up on a blog, but for the specification, this is sufficient. |
On Wed, Mar 08, 2017 at 11:49:59AM -0800, Stephen Day wrote:
> There's no need to couple this to DiffIDs and layer application,
> since the implementation in identity/chainid.go is just operating
> on an array of strings.
And, as I have tried to make you understand, this premise is
completely false.
The code snippet I give in [1] shows that it is true, with
ChainIDs([]digest.Digest{"a", "b", "c"}) happily returning values
despite the fact that "a", "b", and "c" are clearly *not* DiffIDs or
layers or whatever.
The algorithm described in this section is expressed in terms of
abstract layers, not strings.
I agree that the current description in master is involving abstract
layers. I think that needlessly complicates the definition and think
that this PR gives a simpler definition which still covers the actual
implementation.
These changes, subsequently, are incorrect, in that it
inappropriately narrows the definition.
This is also not true. You can, as I keep pointing out [1], use:
ChainID([DiffID(L₀), …, DiffID(Lₙ)])
to recover abstract-layer-based semantics.
If you want to make a clear argument for my definition here being
false, please provide a ChainID invocation and results which are not
covered by my definition here. Ideally something I can ‘go run …’
(like I did in [1]), because then there is no ambiguity about what
we're saying.
[1]: #593 (comment)
|
We can look at this as
This argument has been made in #586, the implementation of ChainID in image-spec, the demonstration of ChainID in containerd, the production implementation and usage of ChainID in docker and the several years of experience put into the review of this information by multiple members of our team before I posted it in the #482. We did not just pull this out of thin air. At this point, your position has not changed, bordering on blatant intellectual dishonesty. I have wasted hours pushing through what should have been a very simple clarification trying to explain these details to you and you continue to badger me about it. In my view, all of this is time and energy that could have been put towards a working 1.0 specification. While it is unfair to me, it is incredibly unfair to the community to be dragging the conversation down your selfish path when we should be stabilizing and implementing. |
On Wed, Mar 08, 2017 at 12:29:50PM -0800, Stephen Day wrote:
We can look at this as `[DiffID(L₀), …, DiffID(Lₙ)] != L₀|…Lₙ` is
not always the case.
I agree, that the filesystem resulting from applying a series of
layers (L₀|…|Lₙ) is different from an array of strings ([DiffID(L₀),
…, DiffID(Lₙ)]). But the current master implementation is taking an
array of strings [1] (which it types as Digests, but those are just
strings [2]). And it it not making any attempt to resolve those into
layers, or apply them to a filesystem, or anything like that [3]. It
is operating on them purely as opaque strings, as my ‘go run a.go’
example [4] makes clear.
You *could* define ChainID to operate on the resulting filesystem
(that's 4.a and 4.d.i in [5]). But that's not what we have
implemented in chainid.go. If you want to convince me otherwise,
consider the following situation.
1. I have an array of layers.
2. I apply those layers sequentially to produce the resulting
filesystem.
3. I forget my array of layers, but retain the resulting filesystem.
4. I try to call ChainID on the resulting filesytem (hint, there's no
way to do that).
Or consider the following situation:
1. I have an empty tarball:
$ dd if=/dev/zero bs=512 count=2 of=empty.tar
$ sha256sum empty.tar
5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef empty.tar
2. I have a manifest which lists that tarball once:
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar",
"size": 1024,
"digest": "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef"
}
]
and another which lists it twice:
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar",
"size": 1024,
"digest": "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef"
},
{
"mediaType": "application/vnd.oci.image.layer.v1.tar",
"size": 1024,
"digest": "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef"
}
]
3. I calculate the ChainID of each set of layers:
$ cat a.go
package main
import (
_ "crypto/sha256"
"fmt"
"github.com/opencontainers/go-digest"
"github.com/opencontainers/image-spec/identity"
)
func main() {
fmt.Println("one entry")
fmt.Println(identity.ChainID([]digest.Digest{"sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef"}))
fmt.Println("two entries")
fmt.Println(identity.ChainID([]digest.Digest{"sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef", "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef"}))
}
$ go run a.go
one entry
sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
two entries
sha256:170b376f64fb30995c140276be3d71dfb256b308d86183ca3b22aa93a79ad548
4. I compare the results, but
sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
!=
sha256:170b376f64fb30995c140276be3d71dfb256b308d86183ca3b22aa93a79ad548,
despite the fact that the resulting filesystem is the same in both
cases (just the empty directory).
While we may represent that application with the array, it is not
necessary to calculate this from the point of the array.
That is not true. Without an array of layers, how would you recurse
on the array of layers? This is my “I try to call ChainID on the
resulting filesytem” above, and if you have a way for me to calculate
the ChainID given only the resulting filesystem, please share.
In fact, the expansion of `DiffID(Lₙ)` is part of the definition of
the `ChainID`, allowing us to abstract the algorithm in the
definitions of the function `Digest` and `DiffID`.
And ChainID([DiffID(L₀), …, DiffID(Lₙ)]) clearly allows you to use
whatever abstract DiffID you'd like. I haven't filed a PR addressing
the underspecified Digest yet (that's next after we resolve this PR).
Your definition requires the user of this algorithm to create an
array, which may not exist while the current definition allows that
calculation to be made on demand.
Again, you *have* to have an array to call the current ChainID
implementation [6]. If you feel that you do not, please provide some
Go showing you calling ChainID without an array.
This argument has been made in #586, the implementation of ChainID
in image-spec, the demonstration of ChainID in containerd, the
production implementation and usage of ChainID in docker and the
several years of experience put into the review of this information
by multiple members of our team before I posted it in the #482. We
did not just pull this out of thin air.
I'm not arguing with your *implementation* (which is good enough).
I'm arguing that your Markdown definition is overly complicated when a
string-array-based definition covers the current implementation
without any confusion about layer application or “abstract DiffID
functions”.
At this point, your position has not changed, bordering on blatant
intellectual dishonesty.
Clearly we are having trouble seeing eye-to-eye on this. I'm not sure
I'm wrong though. Can we have some neutral third parties weigh in?
[1]: https://github.com/opencontainers/image-spec/blob/05a9f2be75900ea431a7a11d7e84dbe3b165317f/identity/chainid.go#L30
[2]: https://github.com/opencontainers/go-digest/blob/v1.0.0-rc0/digest.go#L36
[3]: https://github.com/opencontainers/image-spec/blob/05a9f2be75900ea431a7a11d7e84dbe3b165317f/identity/chainid.go#L56-L67
[4]: #593 (comment)
[5]: #586 (comment)
[6]: https://github.com/opencontainers/image-spec/blob/05a9f2be75900ea431a7a11d7e84dbe3b165317f/identity/chainid.go#L30
|
@wking You have now been blocked. |
There's no need to couple this to DiffIDs and layer application, since the implementation in
identity/chainid.go
is just operating on an array of strings. In thechainid.go
implementation, those strings happen to be digests, but nothing digest-specific is needed.which matches the example I'm adding with this commit.
In #586, there were some claims that the
ChainID
was operating on the resulting filesystem and not on an array of digest strings (e.g. here and here). But that's clearly not how thechainid.go
or currently-defined recursion algorithm works. In order to have aChainID
based on the resulting filesystem and supportChainID(C) = ChainID(∅|C)
, you'd need to define it recursively on the resulting filesystem instead of on the array stack. Something like:But as I pointed out here, hashing a filesystem is a lot more complicated than hashing a byte-string.
Folks interested in using whatever
DiffID
implementation they like can easily recover the old coupled behavior by using: