Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DiffOp #2434

Merged
merged 4 commits into from
Jan 6, 2022
Merged

DiffOp #2434

merged 4 commits into from
Jan 6, 2022

Conversation

sipsma
Copy link
Collaborator

@sipsma sipsma commented Oct 29, 2021

Support for DiffOp. Broken down into a set of fairly digestible commits, see the individual messages for more details, especially cache: add support for Diff refs. which has details on some of the design decisions. There are a lot of tricky corner cases here but the test coverage is pretty decent; let me know if I'm missing any other important cases though.

There are a few known follow-ups that are shared with those needed for merge-op, namely:

  1. Inline cache support (min registry cache works though and is tested)
  2. User docs (will just write a combined doc for merge+diff op since they are highly related)
  3. Better progress output that accounts for the lazy nature of merge+diff

@sipsma sipsma marked this pull request as ready for review November 29, 2021 03:20
@sipsma sipsma requested a review from tonistiigi November 29, 2021 03:20
@sipsma
Copy link
Collaborator Author

sipsma commented Dec 7, 2021

Updated with a rebase. I also tested this against moby now by running the merge+diff tests against dockerd built using this branch: https://github.com/sipsma/moby/commits/mobydiff

No extra changes were needed for diffop relative to those needed for mergeop.

@sipsma
Copy link
Collaborator Author

sipsma commented Dec 7, 2021

BTW, tests in CI are failing intermittently with 500 errors from dockerhub even after a few retries, so just gonna wait a bit before retrying again. Across different runs all the tests have all passed though, so shouldn't be any issues with diffop itself.

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to create separate PRs for the first commits.

For cachemanager.Diff() implementation I need to review again (possibly after changes). Rest mostly sgtm.

@@ -33,7 +33,7 @@ import (
)

func TestClientGatewayIntegration(t *testing.T) {
integration.Run(t, []integration.Test{
integration.Run(t, integration.TestFuncs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something off with the title of this commit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in this commit is to change the previous definition of type Test func(*testing.T, Sandbox) to an interface:

type Test interface {
	Name() string
	Run(t *testing.T, sb Sandbox)
}

Updating the commit message title to be Change integration.Test from a func to a interface, hopefully that's more clear.

@@ -137,7 +137,7 @@ func computeBlobChain(ctx context.Context, sr *immutableRef, createIfNeeded bool
case "overlayfs", "stargz":
// overlayfs-based snapshotters should support overlay diff. so print warn log on failure.
logWarnOnErr = true
case "fuse-overlayfs":
case "fuse-overlayfs", "native":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change just means that if we are using the native snapshotter it's not worth even trying to use the overlay differ since it won't work, and instead we can just skip ahead to using the double-walking differ rather than failing to use the overlay differ first.

Added more detail to the commit message body to clarify.

}

func applierFor(dest Mountable, tryCrossSnapshotLink bool) (_ *applier, rerr error) {
app := &applier{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: app isn't a good name for this as it carries a different meaning

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to just use a, which is also the receiver for applier.

type change struct {
kind fs.ChangeKind
subpath string
srcpath string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use consistent casing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

upperView, err := upperMnter.Mount()
if err != nil {
return err
type applier struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this+differ don't really need to be in the snapshotter package, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package is the only place they are used right now, so I didn't see a need to put them somewhere else. But if you'd prefer them in a util package or a subpackage or snapshotter let me know and I'll make that update, it's not a big deal either way to me.

cache/manager.go Outdated
@@ -740,6 +761,122 @@ func (cm *cacheManager) Merge(ctx context.Context, inputParents []ImmutableRef,
return rec.ref(true, dhs), nil
}

func (cm *cacheManager) Diff(ctx context.Context, lower, upper ImmutableRef, opts ...RefOption) (ir ImmutableRef, rerr error) {
if lower == nil && upper == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to think these checks are better at ops level? In here they just look like invalid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, not a big deal either way to me, updated to check them on the solver op level.

for _, o := range opts {
o.SetConstraintsOption(&c)
}
addCap(&c, pb.CapDiffOp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be inside DiffOp

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Selector digest.Digest
ComputeDigestFunc solver.ResultBasedCacheFunc
PreprocessFunc solver.PreprocessFunc
}, 2),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 -> should come from proto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to set this to the number of non-empty inputs (i.e. at most 2 if both lower and upper are non-empty). Let me know if that's what you were thinking.


var lowerRef cache.ImmutableRef
var lowerRefID string
if d.op.Lower.Input != pb.Empty {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if lower == -1 then return upper?

Same could probably be directly in client as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in my other comment, there is currently a desirable difference between Diff(Scratch(), A)) and A (the diff with scratch can't include any deletions present in A since there's nothing to delete from scratch), so will leave this as is for now pending the outcome of that discussion.

Copy link
Collaborator Author

@sipsma sipsma Dec 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with this now since we are switching to supporting only unmerge behavior instead of both for now. It's implemented below these lines (checking whether the lower ref is nil).

@@ -372,3 +373,16 @@ message MergeInput {
message MergeOp {
repeated MergeInput inputs = 1;
}

message LowerDiffInput {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the types?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just on the off-chance we wanted to add parameters to them in the future for some reason. I can't think of any use cases but figured it didn't hurt to put them in structs now. Not a big deal though, can just put them direct in DiffOp if preferred, let me know and I'll update if so.

Copy link
Collaborator

@aaronlehmann aaronlehmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar enough with the internals to review the implementation, but I've been involved in the high-level design and I've tested some earlier versions of this PR, including under load. From my side, LGTM.

cache/refs.go Outdated
@@ -177,8 +218,8 @@ func (cr *cacheRecord) isDead() bool {

var errSkipWalk = errors.New("skip")

// walkAncestors calls the provided func on cr and each of its ancestors, counting both layer
// and merge parents. It starts at cr and does a depth-first walk to parents. It will visit
// walkAncestors calls the provided func on cr and each of its ancestors, counting both layer,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/both//

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@sipsma
Copy link
Collaborator Author

sipsma commented Dec 20, 2021

ping @tonistiigi

@aaronlehmann
Copy link
Collaborator

@tonistiigi: Any more comments on this?

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seemed to work quite well in my testing.

One issue seems to be with the progress output. It looks like atm with any parameters diff will take 0.0s always, making progress quite useless. This is even when diffing different refs that does trigger the slow on-disk differ an mounts everything. We should try to capture the slow parts of actual differ running under the correct progress vertex, even if it is lazy. Another thing to consider is that for the cases where mounting is inevitable, there isn't any point to lazy semantics anyway though and it may even be confusing to the user. This does not need to be part of this PR.


func (cm *cacheManager) createDiffRef(ctx context.Context, parents parentRefs, dhs DescHandlers, opts ...RefOption) (ir *immutableRef, rerr error) {
dps := parents.diffParents
if err := dps.lower.Finalize(ctx); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense to try to avoid finalize calls for refs from local sources in a follow-up. I think it should be safe to hardlink there as llb.Local should never modify existing files, only replace them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah definitely worth considering for a follow-up. My only hesitation is that it might rule out future possible optimizations of local refs where you just modify an existing file instead of replacing it (i.e. 1 bit changed in a 1GB file so you just flip that bit instead of copying the whole file). But we can figure that out later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think we could do this for all mutable. Possibly local source would add some metadata that the files will be not modified in this ref. Although I'm not sure if atm there is any other case where a non-finalized ref can end up in here other than from local source.

}
}

id := identity.NewID()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about deterministic IDs somewhere (maybe for mergeop) to detect the duplicate work at least. You still planning to do that in follow-up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I will tackle that along-side tackling it for merge-op in a follow-up. The implementation should be similar and possibly overlap entirely depending on how we approach it.

cache/refs.go Outdated
@@ -177,8 +218,8 @@ func (cr *cacheRecord) isDead() bool {

var errSkipWalk = errors.New("skip")

// walkAncestors calls the provided func on cr and each of its ancestors, counting both layer
// and merge parents. It starts at cr and does a depth-first walk to parents. It will visit
// walkAncestors calls the provided func on cr and each of its ancestors, counting both layer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File(llb.Mkfile("/foo", 0755, []byte("A"))),
llb.Diff(llb.Scratch(), llb.Scratch().
File(llb.Mkfile("/foo", 0644, []byte("B"))).
File(llb.Rm("/foo")).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused how does this create a whiteout? It is correct logically if FileOp would capture the diff but atm this should just be a comparison of no file vs removed file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each FileOp creates a new layer on top of the previous FileOp, so the layers that get created are:

  1. /foo
  2. whiteout of /foo
  3. /bar

Because we decided to always use the "unmerge" behavior as discussed before, the diff will consist of those 3 layers, which includes the intermediate whiteout layer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Missed that this wasn't File(llb.Mkfile("/foo", 0644, []byte("B")), llb.Rm("/foo"))

if lowerRef == nil {
if upperRef == nil {
// The diff of nothing and nothing is nothing. Just return an empty ref.
return []solver.Result{worker.NewWorkerRefResult(nil, d.worker)}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these "nil" results correct? Much more logical would be to just return actual nil. I thought this is what we did in some cases. But if it breaks something then leave it to another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning actual nil isn't handled in the solver and seemed to require changes that felt too risky to bring in scope for this PR:

  1. It's assumed res is non-nil here:

    buildkit/solver/jobs.go

    Lines 863 to 864 in 15fb114

    r := res.(*execRes)
    return unwrapShared(r.execRes), r.execExporters, nil
  2. Even if we update the above code to handle nil, more changes will be needed here: https://github.com/moby/buildkit/blob/master/solver/edge.go#L905-L909

sipsma added 4 commits January 6, 2022 11:05
This allows you to create refs that are single layers representing the
diff between any two arbitrary refs. The primary use case for this is
to allows users to extract the changes created by ops like Exec and
rebase them elsewhere through MergeOp. However, there is no restriction
on the inputs to DiffOp and the resulting ref's layer is simply the
layer created by running the differ on the two inputs refs
(specifically, the same differ used during exports).

A Diff ref can be mounted by itself, in which case it is defined as the
result of applying the diff to Scratch. Most use cases though will use
Diff refs as the input to a MergeOp, in which case the diff is just
applied on top of the lower merge inputs, as was the case before.

In cases like Diff(A, A->B->C) (i.e. cases where the diff is between two
refs where the lower is an ancestor of upper), the diff will be defined
as the layers separating the two refs. In other cases, the diff is just
a single layer, not re-used from the inputs, representing the diff
between the two refs (which can be defined as the layer "Diff(A,B)" that
satisfies "Merge(A, Diff(A,B)) == B").

Note that there is technically a meaningful difference between the
"unmerge" behavior of extracting the layers separating diffs and the
"simple diff" of just running the differ on the two refs. Namely, in the
case where there are "intermediate deletes" (i.e. deletes that only
exist in layers between A and B but not between A and B by themselves),
then the simple diff and unmerge can create different results when
plugged into a MergeOp. This is due to the fact that intermediate
deletes will apply to the merge when using the unmerge behavior, but not
when using the simple diff. This is on top of the fact that the simple
diff inherently has a "flattening" behavior where multiple layers are
squashed into a single one.

So, in the case where lower is an ancestor of upper, we choose to follow
the unmerge behavior, but it's possible users may prefer the simple diff
behavior. As of right now, they won't be able to do so, but if needed we
can add the ability to choose which behavior is followed in the future.
This could be done through a flag provided to DiffOp or possibly by
adapting llb.Copy to support this type of behavior with the same
efficiency as DiffOp.

Signed-off-by: Erik Sipsma <[email protected]>
Before this, you could return worker ref results from ops that have nil
refs but once they were attempted to be used, various nil exceptions
would get hit. Now, those cases should be handled.

Signed-off-by: Erik Sipsma <[email protected]>
Signed-off-by: Erik Sipsma <[email protected]>
@sipsma
Copy link
Collaborator Author

sipsma commented Jan 6, 2022

Seemed to work quite well in my testing.

Thanks, glad to hear it.

One issue seems to be with the progress output.

Yeah, my idea was to fix this alongside fixing progress output for merge-op since I thought they would be basically the same issue. However, it seems like merge-op may end up actually being a somewhat distinct problem based on your comments on my draft PR, but we can figure that out there. Either way, I'll fix the progress for diff-op in one follow-up or another.

Another thing to consider is that for the cases where mounting is inevitable, there isn't any point to lazy semantics anyway though and it may even be confusing to the user

For sure, I guess we'd need a way of seeing that the diff is eventually needed by some descendent vertex in a solve and then somehow propagate that to the ancestor diff-op vertex. Same idea can apply to merge-op. I'll add that to the list of follow-ups (I will go ahead and open an issue to track all the follow-ups for merge+diff now that the base implementations are mostly settled).

@tonistiigi tonistiigi merged commit a2528b9 into moby:master Jan 6, 2022
@sipsma sipsma mentioned this pull request Jan 6, 2022
10 tasks
@crazy-max crazy-max added this to the v0.10.0 milestone Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants