Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with sharing cache across jobs #156

Closed
aloysbaillet opened this issue Sep 30, 2019 · 23 comments
Closed

Issue with sharing cache across jobs #156

aloysbaillet opened this issue Sep 30, 2019 · 23 comments

Comments

@aloysbaillet
Copy link

Hi,

I am trying to reuse a "local" cache across multiple CI jobs without using a registry. The main reason to avoid the registry is to nicely associate a CI run with the generated artifact, and not adding more images on the registry when the CI system already supports artifact tied to the builds.

I made an example repo with 2 images: imageA builds python from source using ccache https://github.com/aloysbaillet/buildx-testing/blob/master/imageA/Dockerfile and the second image just uses the first image. The build commands are in https://github.com/aloysbaillet/buildx-testing/blob/master/.github/workflows/dockerimage.yml but here's a summary:

build A (uses a previous local cache and saves a new cache for future runs)

docker buildx build \
          . \
          -f imageA/Dockerfile \
          -t aloysbaillet/buildx-testing-image-a:0 \
          --cache-from=type=local,src=docker-cache \
          --cache-to=type=local,mode=max,dest=docker-cache-a \
          --load

build B (trying to use cache A to avoid pulling the non-existing image aloysbaillet/buildx-testing-image-a:0)

docker buildx build \
          . \
          -f imageB/Dockerfile \
          -t aloysbaillet/buildx-testing-image-b:0 \
          --cache-from=type=local,src=docker-cache-a \
          --cache-to=type=local,mode=max,dest=docker-cache-b \
          --load

Here is the sequence of builds:
https://github.com/aloysbaillet/buildx-testing/runs/241659655

The main issue I'm facing is how to make job B believe that aloysbaillet/buildx-testing-image-a:0 is a valid docker image. Running docker load is ignored by buildx when using docker-container driver which is necessary for --cache-from and --cache-to to function.
Is there a way to populate the buildkit image from docker? I thought using the cache from job A would have been enough as the image A is tagged and is in cache A...
N.B. it would seem obvious to use a multi-stage job for images A and B, unfortunately in my real-world scenario image A takes around 2 hours to build, and one of my image Bs takes around 7 hours, which is timing out my free CI system... Hence the need to split the jobs and use ccache.

Thanks in advance for any help!

Cheers,

Aloys

@FernandoMiguel
Copy link
Contributor

@tonistiigi ^

@tonistiigi
Copy link
Member

tonistiigi commented Sep 30, 2019

N.B. it would seem obvious to use a multi-stage job for images A and B, unfortunately in my real-world scenario image A takes around 2 hours to build, and one of my image Bs takes around 7 hours, which is timing out my free CI system... Hence the need to split the jobs and use ccache.

You can still use a multi-stage Dockerfile with two separate runs where first time you build the a target and second time the b target(that depends on a). Instead of doing FROM a you just depend on the same parts of Dockerfile that you built before, therefore you get cache for it and it doesn't need to be rebuilt.

Eg.

FROM ubuntu AS stage-a
...

FROM stage-a
...

build --target=stage-a --cache-to=type=local,mode=max,dest=docker-cache-a
build --cache-from=type=local,src=docker-cache-a

For example, we use same pattern in buildkit CI parallelization. The first task builds integration-tests-base stage and exports the cache for it https://github.com/moby/buildkit/blob/master/hack/build_ci_first_pass#L35 . Other tasks that now run in parallel, build integration-tests stage that is on top of base stage with importing cache https://github.com/moby/buildkit/blob/master/hack/dockerfiles/test.buildkit.Dockerfile#L231

@aloysbaillet
Copy link
Author

Thank you very much Tonis!
This is a great idea indeed, and I will try it as soon as possible. One thing to note is that I will end up with a single very large Dockerfile ( I have 7 at around 50-100 lines, so roughly 500 lines). This also means that I have to share bits of cache that are not relevant between stages such as the ccache content of upstream builds.
I also noticed that the content of the mounted cache doesn't seem to be included in the --cache-to in local mode, is that a known issue?
Cheers,

Aloys

@tonistiigi
Copy link
Member

Cache mounts is a different concept from the instruction cache. They are local persistent directories that don't have a deterministic state. Theoretically, you can use the build itself to read files into or out of these directories but I don't think it would give you performance increase. The point of the instruction cache is to determine that the sources are still valid for build and skip over the instructions, while cache mounts provide an incremental boost while the run command is running.

For the big Dockerfiles, there has been discussions for supporting "include" to split files apart but nobody has made a prototype yet.

@aloysbaillet
Copy link
Author

Thanks Tonis.
Is there any way to persist the mounted cache folders across multiple CI jobs? The extra boost means that builds where the exact instruction cache is not met would go from 3-7 hours down to 10 minutes so still very very valuable. I was looking at copying the volume used by the buildx docker container as a way to persist the whole cache but that seems excessive and I'm unsure if buildx will let me do this?

@tonistiigi
Copy link
Member

Is there any way to persist the mounted cache folders across multiple CI jobs?

Only way to do this atm is with the build request itself. Eg. you can build a stage that loads files into a cache mount or a stage that just returns the files in the cache mount. I don't recommend doing this though unless you can clearly measure that it improves your performance.

@aloysbaillet
Copy link
Author

Thanks again Tonis!
I'm was assuming that multiple RUN --mount=type=cache,target=/tmp/ccache commands in a given Dockerfile would not share the actual same /tmp/ccache folder, but your answer seems to indicate that the target path is the only identifier to reuse this cache mount? I've been wondering about the exact behaviour of the cache mount...
I'd be happy to create a PR with added documentation on how the cache mount gets reused/invalidated if I get to find out :-)

@FernandoMiguel
Copy link
Contributor

@aloysbaillet for the same named target, the cache mount will be shared between jobs in the same builder.
You can use locks to prevent concurrent runs if that causes issues, but it will impact your performance ofc.

For instruction cache, you can push the intermediary layers to a network close registry and access that from multiple builders

@aloysbaillet
Copy link
Author

Thanks Fernando! That's great info, and I updated my test repo with all the useful knowledge gathered here.
With all this I believe I have enough to really benefit from buildx.
That said, I would still really like to know if there is any way to do the equivalent of docker load into a buildx container, it would be a nice closure for this issue :-)

@aloysbaillet
Copy link
Author

So I went back to my test project and tried to use both instruction cache and mounted cache as fallback, but I can't find a way to use these both at the same time!

I'm stuck between using instruction cache (really fast when every character of the RUN command has not changed, but needs complete rebuild when anything there changes) and the fallback of mounted cache which works really well to speed up builds when instruction cache is not matching (in my case a 3 hour build becomes a 10m build).

The main problem with trying to use both caches is that I need to somehow inject the content of the mounted cache into the build without invalidating the instruction cache (and ideally without requiring a rsync server as explained there: http://erouault.blogspot.com/2019/05/incremental-docker-builds-using-ccache.html ).

Using a bind mount to expose the previous ccache content cannot work as the first build will have an empty cache to mount and that is recorded in the instruction cache, and the second build will use the first build's cache, but find a non-empty ccache which will invalidate the instruction cache from the first build.

See this file for an example: https://github.com/aloysbaillet/buildx-testing/blob/master/Dockerfile#L15

FROM n0madic/alpine-gcc:8.3.0 as buildx-testing-image-a-builder

RUN --mount=type=cache,target=/tmp/ccache \
    --mount=type=cache,target=/tmp/downloads \
    --mount=type=bind,source=ccache,target=/tmp/ccache_from \
    export CCACHE_DIR=/tmp/ccache && \
    export DOWNLOADS_DIR=/tmp/downloads && \
    if [ -f /tmp/ccache_from/ccache.tar.gz ] ; then cd /tmp/ccache && tar xf /tmp/ccache_from/ccache.tar.gz && cd - ; fi && \
    if [ ! -f $DOWNLOADS_DIR/Python-3.7.3.tgz ] ; then curl --location https://www.python.org/ftp/python/3.7.3/Python-3.7.3.tgz -o $DOWNLOADS_DIR/Python-3.7.3.tgz ; fi && \
    tar xf $DOWNLOADS_DIR/Python-3.7.3.tgz && \
    cd Python-3.7.3 && \
    ./configure \
        --prefix=/usr/local \
        --enable-shared && \
    make -j4 && \
    make install && \
    ccache --show-stats && \
    tar cfz /tmp/ccache.tar.gz /tmp/ccache

FROM scratch as buildx-testing-image-a-ccache

COPY --from=buildx-testing-image-a-builder /tmp/ccache.tar.gz /ccache/ccache.tar.gz

which is used there: https://github.com/aloysbaillet/buildx-testing/blob/master/.github/workflows/dockerimage.yml#L30

tar xf ccache/ccache.tar.gz

# buildx-testing-image-a
docker buildx build \
  . \
  -t aloysbaillet/buildx-testing-image-a:0 \
  --target=buildx-testing-image-a \
  --cache-from=type=local,src=docker-cache \
  --cache-to=type=local,mode=max,dest=docker-cache-a \
  --load

# buildx-testing-image-a-ccache
docker buildx build \
  . \
  --target=buildx-testing-image-a-ccache \
  --platform=local \
  -o .

It really feels like to make this work we would need a new source flag in the mount command to be used like this: --mount=type=cache,target=/tmp/downloads,source=ccache
and it would load the content of the source as the cache initial content.

What do you think?

@FernandoMiguel
Copy link
Contributor

Do keep in mind that github nodes currently don't support any form of cache and each job runs from a new node, so any host cache is lost between runs and jobs

@aloysbaillet
Copy link
Author

Indeed, I'm actually maintaining a set of docker images that get built on Azure Pipelines (which has caching available in preview, I believe caching is coming across to GitHub actions very soon...) so I'm emulating this feature by doing a curl of previous build's artifact.

@aloysbaillet
Copy link
Author

After some introspection into the buildx container I found this hack that properly moves the whole buildkit cache between nodes:

backup:

        docker buildx create --name cacheable --use
        docker buildx build ... # no --cache-to or --cache-from
        docker run --rm \
          --volumes-from buildx_buildkit_cacheable0 \
          -v $(pwd)/buildkit-cache-a:/backup \
          alpine /bin/sh -c "cd / && tar cf /backup/backup.tar.gz /var/lib/buildkit"

restore:

        docker buildx create --name cacheable --use
        docker buildx inspect --bootstrap
        docker buildx stop
        docker run --rm \
          --volumes-from buildx_buildkit_cacheable0 \
          -v $(pwd)/buildkit-cache-a:/backup \
          alpine /bin/sh -c "cd / && tar xf /backup/backup.tar.gz"

Obviously this is a bit brittle as it assumes the naming convention of the docker container created by buildx create and the location of the buildkit data in the volume...
But it properly restores the cache as if the next job was happening on the same machine as the first one!

@FernandoMiguel
Copy link
Contributor

you can create cache images to disk, if you dont want to use a registry

@aloysbaillet
Copy link
Author

Unfortunately --cache-to does not save the mounted cache to disk, that's the main reason I opened this issue...

@FernandoMiguel
Copy link
Contributor

you mean host cache?

@tonistiigi
Copy link
Member

It really feels like to make this work we would need a new source flag in the mount command to be used like this: --mount=type=cache,target=/tmp/downloads,source=ccache
and it would load the content of the source as the cache initial content.

Cache mounts support from=basestage. Is this what you are looking for? https://github.com/moby/buildkit/blob/v0.6.2/frontend/dockerfile/docs/experimental.md#run---mounttypecache

@aloysbaillet
Copy link
Author

Thanks Tonis, I don't think this from=basestage helps as I can't find any official way to inject this cache data into any build stage without invalidating the instruction cache. The only way I can think of is to somehow download the content of the cache from a locally running file server which is cumbersome.

"mounted cache": the content of the cache defined by --mount=type=cache, keyed by id (which defaults to target):

  • content will be updated by many runs of the build command on a single builder instance.
  • cannot be pre-populated from the outside of the build

"instruction cache": the buildkit image cache, keyed by each RUN line:

  • content is immutable
  • can be pre-populated using --cache-from

@tonistiigi
Copy link
Member

FROM scratch AS cachebase
ADD /mycachebase.tar /

FROM ...
RUN --mount=type=cache,target=/cache,from=cachebase ...

And you only update mycachebase.tar when you are on a fresh node with no local cache. If you have local cache then you don't update it and use whatever is already in your local cache mount.

@aloysbaillet
Copy link
Author

But this means that one has to choose in advance between "mounted cache" and "instruction cache" and it is impossible to know in advance which one will be valid.

I need both caches to be available at all times, here's a timeline of build events:

Build 1:

create empty mountedcachebase.tar file
buildx --to-cache=instructioncache :

FROM scratch AS cachebase
ADD /mycachebase.tar /

FROM ... as builder
RUN --mount=type=cache,target=/cache,from=cachebase ...

FROM ...
COPY --from=builder /cache /mountedcache

-> everything builds

Build 2:

get previous mountedcachebase.tar file
buildx --from-cache=instructioncache :

FROM scratch AS cachebase
ADD /mycachebase.tar / # -> this is different from build1 and invalidate instructioncache

FROM ... as builder
RUN --mount=type=cache,target=/cache,from=cachebase ...

FROM ...
COPY --from=builder /cache

-> everything builds again!

If --cache-to and --cache-from also handled mounted cache then this would also be a solution to this problem.

@Lupus
Copy link

Lupus commented Oct 17, 2019

I'm also struggling with this. Just installed buildx into my Gitlab docker:dind pipeline in a hope that my mount caches will be exported along with layers, but job retry did not show any signs of cache being exported. And then I stumbled across this issue.

Would be really great if we could control inclusion of mount cache into cache export!

alpeb added a commit to linkerd/linkerd2 that referenced this issue Jul 17, 2020
## Build containers in parallel
The `docker_build` used in the `kind_integration.yml`, `cloud_integration.yml` and `release.yml` workflows relied on running `bin/docker-build` which builds all the containers in sequence. Now each container is built in parallel using `strategy.matrix`.

## New caching strategy
CI now uses `docker buildx` for building the container images, which allows using an external cache source for builds, a location in the filesystem in this case. That location gets cached using actions/cache, using the key `{{ runner.os }}-buildx-${{ matrix.target }}-${{ env.TAG }}` and the restore key `${{ runner.os }}-buildx-${{ matrix.target }}-`.

For example when building the `web` container, its image and all the intermediary layers gets cached under the key `Linux-buildx-web-git-abc0123`. When that has been cached in the `main` branch, that cache will be available to all the child branches, including forks. If a new branch in a fork asks for a key like `Linux-buildx-web-git-def456`, the key won't be found during the first CI run, but the system falls back to the key `Linux-buildx-web-git-abc0123` from `main` and so the build will start with a warm cache (more info about how keys are matched in the [actions/cache docs](https://docs.github.com/en/actions/configuring-and-managing-workflows/caching-dependencies-to-speed-up-workflows#matching-a-cache-key)).

## Packet host no longer needed
To benefit from the warm caches both in non-forks and forks like just explained, we're ditching doing the builds in Packet and now everything runs in the github runners VMs. The build performance for non-forks remains similar when using warm caches, in part due to the new parallel strategy. E.g. before, the docker builds (all sequential) were taking a total time around 2 mins in Packet, and now the longest parallel build (`cni-plugin`) takes around the same time.
This also means the workflow yamls were vastly simplified, no longer having to have separate logic for non-forks and forks.

## Local builds
You still are able to run `bin/docker-build` or any of the `docker-build.*` scripts. To make use of buildx, run those same scripts after having set the env var `DOCKER_BUILDKIT=1`. Using buildx supposes you have installed it, as instructed [here](https://github.com/docker/buildx).

## Other
- A new script `bin/docker-cache-prune` is used to remove unused images from the cache. Without that the cache grows constantly and we can rapidly hit the 5GB limit (when the limit is attained the oldest entries get evicted).
- The `go-deps` dockerfile base image was changed from `golang:1.14.2` (ubuntu based) to `golang-1:14.2-alpine` also to conserve cache space.

## Known issues
- Most dockerfiles rely on the `go-deps` base image at a hard-coded tag, that they retrieve from the gcr registry. Whenever that base image changes, it gets rebuilt prior to building the other images. Now we're using the docker-container driver for buildx, and it can't use the local cache like that (see docker/buildx#156). So changes to `go-deps` will break the build. This will be addressed in a separate PR.
alpeb added a commit to linkerd/linkerd2 that referenced this issue Jul 17, 2020
## Motivation
- Improve build times in forks. Specially when rerunning builds because of some flaky test.
- Start using `docker buildx` to pave the way for multiplatform builds.

## Performance improvements
These timings were taken for the `kind_integration.yml` workflow when we merged and rerun the lodash bump PR (#4762)

Before these improvements:
- when merging: `24:18`
- when rerunning after merge (docker cache warm): `19:00`
- when running the same changes in a fork (no docker cache): `32:15`

After these improvements:
- when merging: `25:38`
- when rerunning after merge (docker cache warm): `19:25`
- when running the same changes in a fork (docker cache warm): `19:25`

As explained below, non-forks and forks now use the same cache, so the important take is that forks will always start with a warm cache and we'll no longer see long build times like the `32:15` above.
The downside is a slight increase in the build times for non-forks (up to a little more than a minute, depending on the case).

## Build containers in parallel
The `docker_build` job in the `kind_integration.yml`, `cloud_integration.yml` and `release.yml` workflows relied on running `bin/docker-build` which builds all the containers in sequence. Now each container is built in parallel using a matrix strategy.

## New caching strategy
CI now uses `docker buildx` for building the container images, which allows using an external cache source for builds, a location in the filesystem in this case. That location gets cached using actions/cache, using the key `{{ runner.os }}-buildx-${{ matrix.target }}-${{ env.TAG }}` and the restore key `${{ runner.os }}-buildx-${{ matrix.target }}-`.

For example when building the `web` container, its image and all the intermediary layers get cached under the key `Linux-buildx-web-git-abc0123`. When that has been cached in the `main` branch, that cache will be available to all the child branches, including forks. If a new branch in a fork asks for a key like `Linux-buildx-web-git-def456`, the key won't be found during the first CI run, but the system falls back to the key `Linux-buildx-web-git-abc0123` from `main` and so the build will start with a warm cache (more info about how keys are matched in the [actions/cache docs](https://docs.github.com/en/actions/configuring-and-managing-workflows/caching-dependencies-to-speed-up-workflows#matching-a-cache-key)).

## Packet host no longer needed
To benefit from the warm caches both in non-forks and forks like just explained, we're required to ditch doing the builds in Packet and now everything runs in the github runners VMs.
As a result there's no longer separate logic for non-forks and forks in the workflow files; `kind_integration.yml` was greatly simplified but `cloud_integration.yml` and `release.yml` got a little bigger in order to use the actions artifacts as a repository for the images built. This bloat will be fixed when support for [composite actions](https://github.com/actions/runner/blob/users/ethanchewy/compositeADR/docs/adrs/0549-composite-run-steps.md) lands in github.

## Local builds
You still are able to run `bin/docker-build` or any of the `docker-build.*` scripts. And to make use of buildx, run those same scripts after having set the env var `DOCKER_BUILDKIT=1`. Using buildx supposes you have installed it, as instructed [here](https://github.com/docker/buildx).

## Other
- A new script `bin/docker-cache-prune` is used to remove unused images from the cache. Without that the cache grows constantly and we can rapidly hit the 5GB limit (when the limit is attained the oldest entries get evicted).
- The `go-deps` dockerfile base image was changed from `golang:1.14.2` (ubuntu based) to `golang-1:14.2-alpine` also to conserve cache space.

## Known issues to be addressed in a followup PR
- Most dockerfiles rely on the `go-deps` base image at a hard-coded tag, that they retrieve from the gcr registry. Whenever that base image changes, it gets rebuilt prior to building the other images. Now that we're using the docker-container driver for buildx, it can't use the local cache for retrieving the `go-deps` image just built (see docker/buildx#156). So changes to `go-deps` will break the build.
@dbackeus
Copy link

dbackeus commented Nov 2, 2022

It's been 3 years since the last activity in this issue. Can anyone confirm if the state of remote sharing of mount cache remains the same or if there has been any new developments / workarounds in this area?

@tonistiigi
Copy link
Member

Going to close this. The initial report is about reusing build result as images in another build what has been resolved with https://www.docker.com/blog/dockerfiles-now-support-multiple-build-contexts/

But then it goes to various unrelated cache mount topics, eventually how to transfer /var/lib/buildkit between machines etc.

For mount cache persistence follow moby/buildkit#1512 or experiment with copying cache mounts to instruction cache in #156 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants