Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Moved to #4057] Apply SOURCE_DATE_EPOCH to the files and the whiteouts inside OCI tar layers #3560

Closed
wants to merge 1 commit into from

Conversation

AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented Jan 31, 2023

EDIT: Moved to #4057


Propagate the build-arg:SOURCE_DATE_EPOCH opt value to the differ, to limit the upper bound of the file timestamps and set the whiteout timestamps.

With this commit, the following workarounds mentioned in docs/build-repro.md are no longer needed for reproducible builds:

# Limit the timestamp upper bound to SOURCE_DATE_EPOCH.
# Workaround for https://github.com/moby/buildkit/issues/3180
ARG SOURCE_DATE_EPOCH
RUN find $( ls / | grep -E -v "^(dev|mnt|proc|sys)$" ) -newermt "@${SOURCE_DATE_EPOCH}" -writable -xdev | xargs touch --date="@${SOURCE_DATE_EPOCH}" --no-dereference
# Squash the entire stage for resetting the whiteout timestamps.
# Workaround for https://github.com/moby/buildkit/issues/3168
FROM scratch
COPY --from=0 / /

Limitations:

Backup

ac6f97b (withdrew due to dependency on ctx)

@AkihiroSuda AkihiroSuda force-pushed the source-date-epoch-1.7 branch from 7a4ab1e to ac6f97b Compare January 31, 2023 15:52
@AkihiroSuda AkihiroSuda marked this pull request as ready for review January 31, 2023 15:55
cache/blobs.go Outdated
@@ -62,6 +63,8 @@ func (sr *immutableRef) computeBlobChain(ctx context.Context, createIfNeeded boo
}

func computeBlobChain(ctx context.Context, sr *immutableRef, createIfNeeded bool, comp compression.Config, s session.Group, filter map[string]struct{}) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this is safe. If the blob already exists, then just the blob is used without any timestamp check. If the first build uses one epoch and the second uses another or no epoch at all then the second build gets epoch from the first build.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now validated in cache/blobs.go:blobExistsWithSourceDateEpoch()

@AkihiroSuda AkihiroSuda marked this pull request as draft February 8, 2023 05:52
@ReillyBrogan
Copy link

Is this and #3263 still blocked?

@AkihiroSuda
Copy link
Member Author

Is this and #3263 still blocked?

Yes, will try to update next week, sorry

@AkihiroSuda AkihiroSuda force-pushed the source-date-epoch-1.7 branch 3 times, most recently from 6155189 to 45acbcd Compare March 7, 2023 10:54
@AkihiroSuda AkihiroSuda marked this pull request as ready for review March 7, 2023 10:54
@AkihiroSuda AkihiroSuda requested a review from tonistiigi March 7, 2023 11:26
@AkihiroSuda
Copy link
Member Author

Rebased

@AkihiroSuda AkihiroSuda force-pushed the source-date-epoch-1.7 branch 2 times, most recently from 75276f4 to 7121a39 Compare March 14, 2023 05:15
@AkihiroSuda
Copy link
Member Author

Rebased again

Comment on lines +6529 to +6643
if cdAddress := sb.ContainerdAddress(); cdAddress != "" {
// https://github.com/containerd/containerd/commit/9c9f564a35df7990870a53a125afbf88ac412753
integration.CheckContainerdVersion(t, cdAddress, ">= 1.7.0-beta.1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good for containerd worker but I'm not sure what we should do for dockerd with containerd snapshotter enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe having a factory in the integration sandbox to return the containerd version depending on the worker if that makes sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it can be revisited in a separate PR?
The current code is passing the current CI.

@AkihiroSuda
Copy link
Member Author

Rebased

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iiuc correctly, this also converts the blobs of base images. I don't think that is correct, at least as the default behavior. I think it may be possible to use the DiffIDs in the image config that the frontend passes to the exporter to figure out which layers are from the base image and which are not.

Not for this PR, but we are also assuming that archive/tar and compress/gzip are deterministic. We at least need to add test coverage that would reveal possible changes in Go updates and give us option to vendor old library if needed.

})
}

if _, ok := filter[sr.ID()]; ok {
eg.Go(func() error {
_, err := g.Do(ctx, fmt.Sprintf("%s-%t", sr.ID(), createIfNeeded), func(ctx context.Context) (interface{}, error) {
if sr.getBlob() != "" {
if blobExistsWithSourceDateEpoch(sr, sourceDateEpoch) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if there is no blob for epoch but there is one for another one, we will run the differ again? Is it a good idea? I think it would be better to do something similar to compressions there we convert a blob to another one.

@@ -305,6 +320,9 @@ func (sr *immutableRef) setBlob(ctx context.Context, desc ocispecs.Descriptor) (
sr.queueMediaType(desc.MediaType)
sr.queueBlobSize(desc.Size)
sr.appendURLs(desc.URLs)
if sourceDateEpoch != nil {
sr.queueSourceDateEpoch(*sourceDateEpoch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand how this works. Doesn't it just fix a ref to a specific epoch forever? Instead, it should track multiple variants of blobs that are all associated with a ref (similar to compressions).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should track multiple variants of blobs that are all associated with a ref (similar to compressions).

Does that require putting the epoch value into an OCI descriptor?

Copy link
Member

@tonistiigi tonistiigi Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean for the remote build-cache or for something else? In build-cache we already have "created time" but not exactly the same(although both would be epoch value). We might need a separate field for that.

If you think for the case of avoiding regenerating blobs for base image layers if they were already generated with the same epoch, then yes, I think saving epoch would be needed. Otoh, I don't think any user would want us to regenerate the layers for the base image, just to change the timestamps in it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sr.queueSourceDateEpoch() is only called for new diffs, so it seems safe for now

@tonistiigi tonistiigi requested a review from ktock March 23, 2023 04:39
@ktock
Copy link
Collaborator

ktock commented Mar 23, 2023

Iiuc correctly, this also converts the blobs of base images. I don't think that is correct, at least as the default behavior.

Maybe we can introcue an envvar to experimentally allow this behavour?

Possible another way is managing epoch variants as compression variants by making compression.Type aware of the source date epoch.

@AkihiroSuda AkihiroSuda marked this pull request as draft March 28, 2023 23:01
@AkihiroSuda AkihiroSuda force-pushed the source-date-epoch-1.7 branch 3 times, most recently from 300f800 to 415a7ae Compare March 30, 2023 11:32
@AkihiroSuda AkihiroSuda requested a review from tonistiigi April 13, 2023 02:04
@AkihiroSuda
Copy link
Member Author

@tonistiigi Could you take a look?

@AkihiroSuda
Copy link
Member Author

Rebased

@AkihiroSuda AkihiroSuda added this to the v0.12.0 milestone May 24, 2023
@AkihiroSuda
Copy link
Member Author

Is there anything I can do to get this merged? 🙏

@thaJeztah
Copy link
Member

I have yet to fully comprehend the changes in this PR, but just a quick comment; IIUC, this PR changes SOURCE_DATE_EPOCH during build to become something that was previously ignored by BuildKit itself to become "active".

  • 👍 pro: SOURCE_DATE_EPOCH (like, e.g. HTTP_PROXY): it's an env-var that's already known, so provides a "polished" experience
  • 👎 con: change in behavior (previously, this would not directly affect building, but now it does)
  • ☝️ before this, RUN instructions in a build COULD take this env-var into account (and do "what they do with it")
  • ☝️ but now, BuildKit itself also takes this into account (so: a change in behavior)
  • ☝️ which means that there's potentially a difference between builds (depending on what version of BuildKit / front-end is used), and there's no option to opt-in/opt-out of this behavior.

Maybe that's ok (in which case: ignore me), but perhaps not; in that case, should there be a BuildKit-specific build-arg instead? (e.g. BUILDKIT_SOURCE_DATE_EPOCH, or BUILD_SOURCE_DATE_EPOCH (or some other var; haven't thought of the name))

@AkihiroSuda
Copy link
Member Author

IIUC, this PR changes SOURCE_DATE_EPOCH during build to become something that was previously ignored by BuildKit

Not true since v0.11:

The build arg value is used for:

  • the created timestamp in the OCI Image Config
  • the created timestamp in the history objects in the OCI Image Config
  • the org.opencontainers.image.created annotation in the OCI Image Index
  • the timestamp of the files exported with the local exporter
  • the timestamp of the files exported with the tar exporter

https://github.com/moby/buildkit/blob/v0.11.6/docs/build-repro.md#source_date_epoch

(And we didn't apply the env var to file timestamps in v0.11, because containerd v1.7 was not GA at that point)

@AkihiroSuda AkihiroSuda force-pushed the source-date-epoch-1.7 branch from ced39e5 to b04f204 Compare June 6, 2023 12:35
@AkihiroSuda
Copy link
Member Author

Rebased

… layers

Propagate the `build-arg:SOURCE_DATE_EPOCH` opt value to the differ,
to limit the upper bound of the file timestamps and set the whiteout timestamps.

With this commit, the following workarounds mentioned in `docs/build-repro.md`
are no longer needed for reproducible builds:

> ```dockerfile
> # Limit the timestamp upper bound to SOURCE_DATE_EPOCH.
> # Workaround for moby#3180
> ARG SOURCE_DATE_EPOCH
> RUN find $( ls / | grep -E -v "^(dev|mnt|proc|sys)$" ) -newermt "@${SOURCE_DATE_EPOCH}" -writable -xdev | xargs touch --date="@${SOURCE_DATE_EPOCH}" --no-dereference
> ```

> ```dockerfile
> # Squash the entire stage for resetting the whiteout timestamps.
> # Workaround for moby#3168
> FROM scratch
> COPY --from=0 / /
> ```

Limitations:
* containerd 1.7 is needed for the containerd worker mode.

Signed-off-by: Akihiro Suda <[email protected]>
@AkihiroSuda AkihiroSuda force-pushed the source-date-epoch-1.7 branch from b04f204 to 497fd3a Compare June 8, 2023 09:07
@tianon
Copy link
Member

tianon commented Jun 9, 2023

The opt-out is not setting SOURCE_DATE_EPOCH, right? (surely anyone setting that is expecting to have it help them achieve reproducible builds, and this makes that strictly more possible than before 😅)

@AkihiroSuda
Copy link
Member Author

The opt-out is not setting SOURCE_DATE_EPOCH, right? (surely anyone setting that is expecting to have it help them achieve reproducible builds, and this makes that strictly more possible than before 😅)

Right. Also, setting export SOURCE_DATE_EPOCH=... inside RUN is not affected by this PR.

@thaJeztah
Copy link
Member

The opt-out is not setting SOURCE_DATE_EPOCH, right? (surely anyone setting that is expecting to have it help them achieve reproducible builds, and this makes that strictly more possible than before 😅)

Right. Also, setting export SOURCE_DATE_EPOCH=... inside RUN is not affected by this PR.

The scenario I was thinking of was that (say)

ARG SOURCE_DATE_EPOCH=<foo>
RUN build-things.sh

Would perhaps change some files to use SOURCE_DATE_EPOCH, bot not others, and now SOURCE_DATE_EPOCH would be used for every file written to disk by the RUN command.

To be clear; I'm most definitely NOT against making this all more reproducible, but thought I'd list some data-points that could be relevant in deciding, and if we all agree that;

surely anyone setting that is expecting to have it help them achieve reproducible builds, and this makes that strictly more possible than before

Then 👍 👍

Comment on lines 195 to 197
if enableOverlay {
computed, ok, err := sr.tryComputeOverlayBlob(ctx, lower, upper, mediaType, sr.ID(), compressorFunc)
computed, ok, err := sr.tryComputeOverlayBlob(ctx, lower, upper, mediaType, sr.ID(), compressorFunc, sourceDateEpoch)
if !ok || err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes, but we can defer that, as overlaybd will stay optional.

@AkihiroSuda
Copy link
Member Author

The scenario I was thinking of was that (say)

ARG SOURCE_DATE_EPOCH=<foo>
RUN build-things.sh

Would perhaps change some files to use SOURCE_DATE_EPOCH, bot not others, and now SOURCE_DATE_EPOCH would be used for every file written to disk by the RUN command.

I don't think this change breaks somebody's Image.
Users can still use RUN SOURCE_DATE_EPOCH=<foo> build-things.sh so that BuildKit itself does not care about SOURCE_DAET_EPOCH.

if sourceDateEpoch != nil {
opts = append(opts,
archive.WithModTimeUpperBound(*sourceDateEpoch),
archive.WithWhiteoutTime(*sourceDateEpoch),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make the whiteout time configurable at all? These files never end up in a container, so the timestamp value is never visible. Lets just always make it zero.

If you agree, then this part can be separated out as no blob tracking etc is needed for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not possible with the current version of containerd (and I don't want this PR to be blocked until containerd v2.0), as https://pkg.go.dev/github.com/containerd/[email protected]/diff#Opt lacks a knob to set whiteout time without setting SOURCE_DATE_EPOCH.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. If containerd doesn't support it, why can't we just call archive.WithWhiteoutTime(zeroTime) in here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we can't specify WithWhiteoutTime for Differ.Compare(), although we can pass WithSourceDateEpoch to it.

buildkit/cache/blobs.go

Lines 191 to 214 in 2ff0d2a

if desc.Digest == "" && !isTypeWindows(sr) && comp.Type.NeedsComputeDiffBySelf() {
// These compression types aren't supported by containerd differ. So try to compute diff on buildkit side.
// This case can be happen on containerd worker + non-overlayfs snapshotter (e.g. native).
// See also: https://github.com/containerd/containerd/issues/4263
desc, err = walking.NewWalkingDiff(sr.cm.ContentStore).Compare(ctx, lower, upper,
diff.WithMediaType(mediaType),
diff.WithReference(sr.ID()),
diff.WithCompressor(compressorFunc),
)
if err != nil {
bklog.G(ctx).WithError(err).Warnf("failed to compute blob by buildkit differ")
}
}
if desc.Digest == "" {
desc, err = sr.cm.Differ.Compare(ctx, lower, upper,
diff.WithMediaType(mediaType),
diff.WithReference(sr.ID()),
diff.WithCompressor(compressorFunc),
)
if err != nil {
return nil, err
}
}

https://pkg.go.dev/github.com/containerd/[email protected]/diff#Comparer
https://pkg.go.dev/github.com/containerd/[email protected]/diff#WithSourceDateEpoch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first one is the complicated one that has many considerations to the UX, cache system, compression variants etc. It needs more validation/review, and I'm not sure if we can get it into the upcoming release as is.

Would it be mergable if we add a new build arg like BUILDKIT_SOURCE_DATE_EPOCH_MODE=(conservative|aggressive) ? (in the labs channel?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be mergable if we add a new build arg like BUILDKIT_SOURCE_DATE_EPOCH_MODE=(conservative|aggressive) ? (in the labs channel?)

This also makes changes to the blobs reference counting and core function signatures in cache pkg, so isn't ideal for an experimental/unsafe feature because it is not something that lives on its own and can be easily removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just make this an exporter option?

Copy link
Member Author

@AkihiroSuda AkihiroSuda Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonistiigi WDYT?
The idea is to introduce an image exporter option like rewrite-epoch.
The implementation will be similar to compression options like compression-level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Aug 7, 2023

Closing.
Hope #4057 will get merged instead.

@AkihiroSuda AkihiroSuda closed this Aug 7, 2023
@AkihiroSuda AkihiroSuda removed this from the v0.13 milestone Aug 7, 2023
@AkihiroSuda AkihiroSuda changed the title Apply SOURCE_DATE_EPOCH to the files and the whiteouts inside OCI tar layers [Moved to #4057] Apply SOURCE_DATE_EPOCH to the files and the whiteouts inside OCI tar layers Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants