-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Moved to #4057] Apply SOURCE_DATE_EPOCH to the files and the whiteouts inside OCI tar layers #3560
Conversation
7a4ab1e
to
ac6f97b
Compare
cache/blobs.go
Outdated
@@ -62,6 +63,8 @@ func (sr *immutableRef) computeBlobChain(ctx context.Context, createIfNeeded boo | |||
} | |||
|
|||
func computeBlobChain(ctx context.Context, sr *immutableRef, createIfNeeded bool, comp compression.Config, s session.Group, filter map[string]struct{}) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this is safe. If the blob already exists, then just the blob is used without any timestamp check. If the first build uses one epoch and the second uses another or no epoch at all then the second build gets epoch from the first build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now validated in cache/blobs.go:blobExistsWithSourceDateEpoch()
Is this and #3263 still blocked? |
Yes, will try to update next week, sorry |
6155189
to
45acbcd
Compare
Rebased |
75276f4
to
7121a39
Compare
Rebased again |
if cdAddress := sb.ContainerdAddress(); cdAddress != "" { | ||
// https://github.com/containerd/containerd/commit/9c9f564a35df7990870a53a125afbf88ac412753 | ||
integration.CheckContainerdVersion(t, cdAddress, ">= 1.7.0-beta.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good for containerd worker but I'm not sure what we should do for dockerd with containerd snapshotter enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe having a factory in the integration sandbox to return the containerd version depending on the worker if that makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it can be revisited in a separate PR?
The current code is passing the current CI.
7121a39
to
a274214
Compare
Rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iiuc correctly, this also converts the blobs of base images. I don't think that is correct, at least as the default behavior. I think it may be possible to use the DiffIDs
in the image config that the frontend passes to the exporter to figure out which layers are from the base image and which are not.
Not for this PR, but we are also assuming that archive/tar
and compress/gzip
are deterministic. We at least need to add test coverage that would reveal possible changes in Go updates and give us option to vendor old library if needed.
}) | ||
} | ||
|
||
if _, ok := filter[sr.ID()]; ok { | ||
eg.Go(func() error { | ||
_, err := g.Do(ctx, fmt.Sprintf("%s-%t", sr.ID(), createIfNeeded), func(ctx context.Context) (interface{}, error) { | ||
if sr.getBlob() != "" { | ||
if blobExistsWithSourceDateEpoch(sr, sourceDateEpoch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that if there is no blob for epoch but there is one for another one, we will run the differ again? Is it a good idea? I think it would be better to do something similar to compressions there we convert a blob to another one.
@@ -305,6 +320,9 @@ func (sr *immutableRef) setBlob(ctx context.Context, desc ocispecs.Descriptor) ( | |||
sr.queueMediaType(desc.MediaType) | |||
sr.queueBlobSize(desc.Size) | |||
sr.appendURLs(desc.URLs) | |||
if sourceDateEpoch != nil { | |||
sr.queueSourceDateEpoch(*sourceDateEpoch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand how this works. Doesn't it just fix a ref to a specific epoch forever? Instead, it should track multiple variants of blobs that are all associated with a ref (similar to compressions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should track multiple variants of blobs that are all associated with a ref (similar to compressions).
Does that require putting the epoch value into an OCI descriptor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean for the remote build-cache or for something else? In build-cache we already have "created time" but not exactly the same(although both would be epoch value). We might need a separate field for that.
If you think for the case of avoiding regenerating blobs for base image layers if they were already generated with the same epoch, then yes, I think saving epoch would be needed. Otoh, I don't think any user would want us to regenerate the layers for the base image, just to change the timestamps in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sr.queueSourceDateEpoch()
is only called for new diffs, so it seems safe for now
Maybe we can introcue an envvar to experimentally allow this behavour? Possible another way is managing epoch variants as compression variants by making |
300f800
to
415a7ae
Compare
@tonistiigi Could you take a look? |
a7bc2f0
to
ced39e5
Compare
Rebased |
Is there anything I can do to get this merged? 🙏 |
I have yet to fully comprehend the changes in this PR, but just a quick comment; IIUC, this PR changes
Maybe that's ok (in which case: ignore me), but perhaps not; in that case, should there be a BuildKit-specific build-arg instead? (e.g. |
Not true since v0.11:
https://github.com/moby/buildkit/blob/v0.11.6/docs/build-repro.md#source_date_epoch (And we didn't apply the env var to file timestamps in v0.11, because containerd v1.7 was not GA at that point) |
ced39e5
to
b04f204
Compare
Rebased |
… layers Propagate the `build-arg:SOURCE_DATE_EPOCH` opt value to the differ, to limit the upper bound of the file timestamps and set the whiteout timestamps. With this commit, the following workarounds mentioned in `docs/build-repro.md` are no longer needed for reproducible builds: > ```dockerfile > # Limit the timestamp upper bound to SOURCE_DATE_EPOCH. > # Workaround for moby#3180 > ARG SOURCE_DATE_EPOCH > RUN find $( ls / | grep -E -v "^(dev|mnt|proc|sys)$" ) -newermt "@${SOURCE_DATE_EPOCH}" -writable -xdev | xargs touch --date="@${SOURCE_DATE_EPOCH}" --no-dereference > ``` > ```dockerfile > # Squash the entire stage for resetting the whiteout timestamps. > # Workaround for moby#3168 > FROM scratch > COPY --from=0 / / > ``` Limitations: * containerd 1.7 is needed for the containerd worker mode. Signed-off-by: Akihiro Suda <[email protected]>
b04f204
to
497fd3a
Compare
The opt-out is not setting |
Right. Also, setting |
The scenario I was thinking of was that (say) ARG SOURCE_DATE_EPOCH=<foo>
RUN build-things.sh Would perhaps change some files to use To be clear; I'm most definitely NOT against making this all more reproducible, but thought I'd list some data-points that could be relevant in deciding, and if we all agree that;
Then 👍 👍 |
if enableOverlay { | ||
computed, ok, err := sr.tryComputeOverlayBlob(ctx, lower, upper, mediaType, sr.ID(), compressorFunc) | ||
computed, ok, err := sr.tryComputeOverlayBlob(ctx, lower, upper, mediaType, sr.ID(), compressorFunc, sourceDateEpoch) | ||
if !ok || err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- FWIW, this file is also updated in Support for building overlaybd images #3867 - does that PR need any changes for this to work with
overlaybd
? /cc @HileQAQ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably yes, but we can defer that, as overlaybd
will stay optional.
I don't think this change breaks somebody's Image. |
if sourceDateEpoch != nil { | ||
opts = append(opts, | ||
archive.WithModTimeUpperBound(*sourceDateEpoch), | ||
archive.WithWhiteoutTime(*sourceDateEpoch), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make the whiteout time configurable at all? These files never end up in a container, so the timestamp value is never visible. Lets just always make it zero.
If you agree, then this part can be separated out as no blob tracking etc is needed for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not possible with the current version of containerd (and I don't want this PR to be blocked until containerd v2.0), as https://pkg.go.dev/github.com/containerd/[email protected]/diff#Opt lacks a knob to set whiteout time without setting SOURCE_DATE_EPOCH
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand. If containerd doesn't support it, why can't we just call archive.WithWhiteoutTime(zeroTime)
in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we can't specify WithWhiteoutTime
for Differ.Compare()
, although we can pass WithSourceDateEpoch
to it.
Lines 191 to 214 in 2ff0d2a
if desc.Digest == "" && !isTypeWindows(sr) && comp.Type.NeedsComputeDiffBySelf() { | |
// These compression types aren't supported by containerd differ. So try to compute diff on buildkit side. | |
// This case can be happen on containerd worker + non-overlayfs snapshotter (e.g. native). | |
// See also: https://github.com/containerd/containerd/issues/4263 | |
desc, err = walking.NewWalkingDiff(sr.cm.ContentStore).Compare(ctx, lower, upper, | |
diff.WithMediaType(mediaType), | |
diff.WithReference(sr.ID()), | |
diff.WithCompressor(compressorFunc), | |
) | |
if err != nil { | |
bklog.G(ctx).WithError(err).Warnf("failed to compute blob by buildkit differ") | |
} | |
} | |
if desc.Digest == "" { | |
desc, err = sr.cm.Differ.Compare(ctx, lower, upper, | |
diff.WithMediaType(mediaType), | |
diff.WithReference(sr.ID()), | |
diff.WithCompressor(compressorFunc), | |
) | |
if err != nil { | |
return nil, err | |
} | |
} |
https://pkg.go.dev/github.com/containerd/[email protected]/diff#Comparer
https://pkg.go.dev/github.com/containerd/[email protected]/diff#WithSourceDateEpoch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first one is the complicated one that has many considerations to the UX, cache system, compression variants etc. It needs more validation/review, and I'm not sure if we can get it into the upcoming release as is.
Would it be mergable if we add a new build arg like BUILDKIT_SOURCE_DATE_EPOCH_MODE=(conservative|aggressive)
? (in the labs channel?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be mergable if we add a new build arg like BUILDKIT_SOURCE_DATE_EPOCH_MODE=(conservative|aggressive) ? (in the labs channel?)
This also makes changes to the blobs reference counting and core function signatures in cache pkg, so isn't ideal for an experimental/unsafe feature because it is not something that lives on its own and can be easily removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we just make this an exporter option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
↑ @tonistiigi WDYT?
The idea is to introduce an image exporter option like rewrite-epoch
.
The implementation will be similar to compression options like compression-level
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing. |
EDIT: Moved to #4057
Propagate the
build-arg:SOURCE_DATE_EPOCH
opt value to the differ, to limit the upper bound of the file timestamps and set the whiteout timestamps.With this commit, the following workarounds mentioned in
docs/build-repro.md
are no longer needed for reproducible builds:Limitations:
Backup
ac6f97b (withdrew due to dependency on ctx)