Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add s3 remote cache #2824

Merged
merged 1 commit into from
May 20, 2022
Merged

Add s3 remote cache #2824

merged 1 commit into from
May 20, 2022

Conversation

bpaquet
Copy link
Contributor

@bpaquet bpaquet commented Apr 25, 2022

Largely inspired from #2323

@bpaquet
Copy link
Contributor Author

bpaquet commented Apr 25, 2022

I'm currently testing this under load @doctolib

@bpaquet bpaquet changed the title Add s3 remot cache Add s3 remote cache Apr 25, 2022
@tonistiigi
Copy link
Member

Same questions as #2323 .

  • How do we test this? We don't necessarily need to run existing tests with it but we should have something that verifies the basics on the CI so we don't accidentally break it. Especially because none of the current popular downstream would have many users on it(at least initially). A separate workflow would be ok. Or if that is hard then minimally external service that tests master branch and notifies if things go wrong.
  • Are you committing to maintaining this?

Why is this 40k lines bigger than the previous PR?

Take a look at the CI results as well.

cc @sethpollack

@bpaquet
Copy link
Contributor Author

bpaquet commented Apr 26, 2022

Same questions as #2323 .

  • How do we test this?

I'm currenlty testing this on my CI.

We don't necessarily need to run existing tests with it but we should have something that verifies the basics on the CI so we don't accidentally break it. Especially because none of the current popular downstream would have many users on it(at least initially). A separate workflow would be ok. Or if that is hard then minimally external service that tests master branch and notifies if things go wrong.

What do you mean exactly? A dedicated workflow in github which test this? An integration test? Do you have any base I can start from?

  • Are you committing to maintaining this?

Yes.

Why is this 40k lines bigger than the previous PR?

New version of the aws s3 sdk :(

Take a look at the CI results as well.

cc @sethpollack

@bpaquet bpaquet force-pushed the s3_cache_master branch 2 times, most recently from fe68519 to d856c7d Compare April 26, 2022 12:46
@tonistiigi
Copy link
Member

What do you mean exactly? A dedicated workflow in github which test this? An integration test? Do you have any base I can start from?

I assume testing something like this requires either token credentials or some kind of mock service. Our regular integration tests for things like this are in client_test.go or dockerfile_test.go. But as this would likely be something that can't just be run in a container with no external dependencies it can be a special github actions workflow. Eg. we currently have special workflow for windows testing. In that workflow/test you should run the basic loop of building, exporting cache, prune local cache, rebuild with cache and verify it matched. If you want you can use external tools in that workflow, eg. you could use buildx create for simplicity. @crazy-max

@bpaquet bpaquet force-pushed the s3_cache_master branch 3 times, most recently from 4a5b507 to d77e1d7 Compare May 5, 2022 12:26
@bpaquet
Copy link
Contributor Author

bpaquet commented May 5, 2022

@tonistiigi Can you review?

  • Tests are in hack/s3_test. I can move them where you want. Any feedback is welcome. Not sure how to taunch them in GHA, especially with the --privileged.
  • I duplicated your readerat.go from the go-actions-cache. Should we move it in buildkit repo?
  • Test under load still in progress.

@tonistiigi tonistiigi self-requested a review May 5, 2022 18:28
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the ./hack/s3_test/run_test.sh invocation to https://github.com/moby/buildkit/blob/master/.github/workflows/build.yml with needs: [base] .

I added a commit that fixes up the Dockerfile and includes the correct buildkit binaries.

It doesn't work for me locally though

Documentation: https://docs.min.io
Finished loading IAM sub-system (took 0.0s of 0.0s to load data).
+ mc alias set myminio http://172.17.0.2:9000 minioadmin minioadmin
mc: Configuration written to `/root/.mc/config.json`. Please update your access credentials.
mc: Successfully created `/root/.mc/share`.
mc: Initialized share uploads `/root/.mc/share/uploads.json` file.
mc: Initialized share downloads `/root/.mc/share/downloads.json` file.
mc: <ERROR> Unable to initialize new alias from the provided credentials. Get "http://172.17.0.2:9000/probe-bucket-sign-tszmiipmum4w/?location=": dial tcp 172.17.0.2:9000: connect: connection refused.

Assuming this works it doesn't actually look as complicated as I would have assumed. You can also consider adding the minio to the main Dockerfile integration-tests stage and adding a regular Go integration test.

FYI @crazy-max

FROM debian:bullseye-slim

RUN apt-get update \
&& apt-get install -y --no-install-recommends wget ca-certificates containerd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is containerd needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buildkit does not start without it

WARN[2022-05-06T08:27:58Z] skipping oci worker, as runc does not exist
WARN[2022-05-06T08:27:58Z] skipping containerd worker, as "/run/containerd/containerd.sock" does not exist
buildkitd: no worker found, rebuild the buildkit daemon?

May be you have a magical option to add to buildkitd -debugaddr 0.0.0.0:8060 :)

@@ -0,0 +1,4 @@
FROM public.ecr.aws/debian/debian:bullseye-slim

RUN echo first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commands should write a constant plus a random value. See https://github.com/moby/buildkit/blob/master/client/client_test.go#L1410 . Then when you build them export the result to a local directory and check that the unique part has not change. Therefore cache matched.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@tonistiigi
Copy link
Member

Not sure how to taunch them in GHA, especially with the --privileged.

--privileged is not a problem in GH actions.

I duplicated your readerat.go from the go-actions-cache. Should we move it in buildkit repo?

Need to think about this later. I don't want an import cycle there.

@bpaquet bpaquet force-pushed the s3_cache_master branch from d9ff1ab to 3570e94 Compare May 6, 2022 08:59
@crazy-max
Copy link
Member

You can also consider adding the minio to the main Dockerfile integration-tests stage and adding a regular Go integration test.

Added a commit to add minio to integration tests stage and also use it through linked context in the bake definition for s3 test.

I will add another commit to create a new job in our GHA workflow.

@bpaquet bpaquet force-pushed the s3_cache_master branch from 954c3fd to c4f992c Compare May 6, 2022 09:13
@bpaquet
Copy link
Contributor Author

bpaquet commented May 6, 2022

Add the ./hack/s3_test/run_test.sh invocation to https://github.com/moby/buildkit/blob/master/.github/workflows/build.yml with needs: [base] .

I added a commit that fixes up the Dockerfile and includes the correct buildkit binaries.

It doesn't work for me locally though

Documentation: https://docs.min.io
Finished loading IAM sub-system (took 0.0s of 0.0s to load data).
+ mc alias set myminio http://172.17.0.2:9000 minioadmin minioadmin
mc: Configuration written to `/root/.mc/config.json`. Please update your access credentials.
mc: Successfully created `/root/.mc/share`.
mc: Initialized share uploads `/root/.mc/share/uploads.json` file.
mc: Initialized share downloads `/root/.mc/share/downloads.json` file.
mc: <ERROR> Unable to initialize new alias from the provided credentials. Get "http://172.17.0.2:9000/probe-bucket-sign-tszmiipmum4w/?location=": dial tcp 172.17.0.2:9000: connect: connection refused.

Fixed

Assuming this works it doesn't actually look as complicated as I would have assumed. You can also consider adding the minio to the main Dockerfile integration-tests stage and adding a regular Go integration test.

I prefer to keep it like this if you dont mind.

FYI @crazy-max

I

@bpaquet
Copy link
Contributor Author

bpaquet commented May 6, 2022

You can also consider adding the minio to the main Dockerfile integration-tests stage and adding a regular Go integration test.

Added a commit to add minio to integration tests stage and also use it through linked context in the bake definition for s3 test.

I will add another commit to create a new job in our GHA workflow.

@crazy-max just force pushed, sorry may be I drop some stuff. I added something in the build workflow, please feel free to refact.

@crazy-max
Copy link
Member

@crazy-max just force pushed, sorry may be I drop some stuff. I added something in the build workflow, please feel free to refact.

Yes just saw that, that's fine. I bring it back.

I prefer to keep it like this if you dont mind.

In the future I think we don't want to rely on scripting but Go integration tests like we do for containerd and dockerd. See https://github.com/moby/buildkit/tree/master/util/testutil/integration

@crazy-max
Copy link
Member

Not linked to this PR but I noticed that when we use linked context in bake a warning from buildx appears: https://github.com/moby/buildkit/runs/6320810388?check_suite_focus=true#step:5:622

WARNING: No output specified for docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load

Looks linked to docker/buildx#968.

There is another one less obvious that seems to appear when using COPY --link and bake linked context. Maybe caching issue but as it's an error it should fail but build continues instead: https://github.com/moby/buildkit/runs/6320810388?check_suite_focus=true#step:5:590

#56 extracting sha256:5979a4d386a9a20337edbeed4404a714b811c40e4bf73aa97fa73ca0bcdc3b9e 0.3s done
#56 extracting sha256:7964a4d2bbf168b770bc9ee09ce82d556bb466804b07f06b9c749cf928b37fa2 0.4s done
#56 extracting sha256:7964a4d2bbf168b770bc9ee09ce82d556bb466804b07f06b9c749cf928b37fa2 0.4s done
#56 merging done
#56 ERROR: maximum timeout reached

cc @tonistiigi @sipsma

@crazy-max crazy-max force-pushed the s3_cache_master branch 2 times, most recently from 178546a to 34e8962 Compare May 6, 2022 16:32
@bpaquet bpaquet requested a review from tonistiigi May 9, 2022 10:07
}

func (e *exporter) Config() remotecache.Config {
return remotecache.Config{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return Compression: default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return config.Prefix + config.BlobsPrefix + dgst.String()
}

type cache struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this cache seems kind of confusing. It's more like s3Client

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

func (c *cache) getManifest(key string, config *v1.CacheConfig) (bool, error) {
fmt.Printf("Start downloading s3://%s/%s\n", c.config.Bucket, key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing to stdout is not allowed in here. If you want debug logs then you need to write them to the context logger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed all debug

}

func isNotFound(err error) bool {
awsErr, ok := err.(awserr.Error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use errors.As

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

FROM public.ecr.aws/debian/debian:bullseye-slim

RUN cat /dev/urandom | head -c 100 | sha256sum > unique_first
RUN cat /dev/urandom | head -c 100 | sha256sum > unique_second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a another stage in here

FROM scratch
COPY --link --from=build /unique_first /
COPY --link --from=build /unique_second /

And in the test add --output destdir . After cache import the contents of the files need to remain the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@bpaquet bpaquet force-pushed the s3_cache_master branch 3 times, most recently from a971027 to ede8d29 Compare May 10, 2022 11:28
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but try to squash the commits a bit. It doesn't need to be a single commit but we should avoid commits that just say "fixes" or "code review". Maybe squash these 2 with the first commit. The others can remain as they are.

func news3Client(config Config) (*s3Client, error) {
s, err := sess.NewSession(&aws.Config{Region: &config.Region, Endpoint: &config.EndpointURL, S3ForcePathStyle: &config.S3ForcePathStyle})
if err != nil {
return nil, fmt.Errorf("failed to create session: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use errors.Errorf always for stacktraces

@bpaquet bpaquet force-pushed the s3_cache_master branch 3 times, most recently from 3bb72f7 to 86c98b4 Compare May 12, 2022 10:05
@bpaquet
Copy link
Contributor Author

bpaquet commented May 12, 2022

@tonistiigi I squashed everything and moved to aws-sdk-go-v2. Ready for review :)

@bpaquet bpaquet force-pushed the s3_cache_master branch from 86c98b4 to cc4af04 Compare May 12, 2022 12:39
@@ -0,0 +1,7 @@
#!/bin/sh -ex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file looks unused now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using it to launch the test locally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a helper script we should call it in the CI. Otherwise nobody notices if it breaks or starts to behave differently than CI invocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used it in the CI

@bpaquet bpaquet force-pushed the s3_cache_master branch 2 times, most recently from 349c24c to e0b54a4 Compare May 13, 2022 09:34
go.mod Outdated
@@ -76,7 +80,22 @@ require (
google.golang.org/grpc v1.45.0
)

require github.com/aws/aws-sdk-go-v2/config v1.15.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be moved in the previous require block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I should do something wrong when I update mod / vendor :(

Signed-off-by: Bertrand Paquet <[email protected]>
@bpaquet bpaquet force-pushed the s3_cache_master branch from e0b54a4 to 09c5a7c Compare May 13, 2022 11:17
@sethpollack sethpollack mentioned this pull request May 13, 2022
@bpaquet bpaquet requested review from crazy-max and tonistiigi May 14, 2022 07:35
Copy link
Member

@crazy-max crazy-max left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

in a follow-up we should think about proper integration tests in testutil but might need to split first as suggested by @tonistiigi in #2734 (comment)

also README starts to be quite huge. wonder if we should not move some sections in dedicated pages under ./docs like the cache exporters.

@tonistiigi tonistiigi merged commit 5d2ca98 into moby:master May 20, 2022
@zchee
Copy link
Contributor

zchee commented Jun 17, 2022

@tonistiigi Can I send support GCS as well?

@bpaquet
Copy link
Contributor Author

bpaquet commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants