Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to ghcup as the installation method #44

Closed
wants to merge 4 commits into from

Conversation

AlistairB
Copy link
Contributor

@AlistairB AlistairB commented Aug 22, 2021

Closes #45
#40
#43

The current installation method is dependent on debian ghc and cabal packaging, which often is slow to be updated. Additionally, ghcup is becoming a standard installation technique of ghc ie. in the github actions, so we probably don't want to reinvent the wheel here.

I believe it also provides an easier path to support windows and arm based images.

Multi stage file?

I believe it is best that we don't include ghcup in the final image (this point will need to be discussed further). In which case a multi stage file is a nice way to ensure the final image doesn't include any cache used as part of the installation or packages that are only required by ghcup, but not ghc / cabal / stack users.

Size difference

Old 9.0.1 - 1.5GB
New 9.0.1 - 1.67GB

There may be more that can be stripped out that ghcup is installing, but the debian packages are not. The only other difference I note is the .so files are a bit bigger from the ghcup version ie.base-4.14.1.0/libHSbase-4.14.1.0-ghc8.10.4.so

ghcup - 12.8MB
debian package - 10.2MB

Still, overall the difference is small enough I'm inclined to ignore it.

Boot library haddocks

Related to size, I am stripping out /opt/ghc/share entirely from the images to save 286MB. The debian ghc package does not entirely remove this, but leaves in the Haddock files for the boot libraries. ie. /opt/ghc/8.10.4/share/doc/ghc-8.10.4/html/libraries/text-1.2.4.1/text.haddock. I don't think these are that useful? Or I could leave them in, it would just be a bit more wrangling.

Other changes

I have also taken this as an opportunity to revise the list of packages we include in the final image. The list is:

  • current stack listed deps (new are gcc, libc6-dev, libffi-dev)
  • ca-certificates which is needed to load stackage urls.
  • libtinfo-dev which is needed to build haskeline (and others?) which is a very common haskell pakage.

Removed

  • libsqlite3-dev
  • openssh-client
  • dirmngr

I'm not sure if these are old stack or other dependencies or if they are just needed to build the current images.

8.10/ghcup/Dockerfile Outdated Show resolved Hide resolved
8.10/ghcup/Dockerfile Outdated Show resolved Hide resolved
8.10/ghcup/Dockerfile Outdated Show resolved Hide resolved
8.10/ghcup/Dockerfile Outdated Show resolved Hide resolved
8.10/ghcup/Dockerfile Outdated Show resolved Hide resolved
@hasufell
Copy link
Member


RUN ./ghcup install stack $STACK_VERSION && \
./ghcup set stack $STACK_VERSION
COPY --from=builder /usr/local/bin /usr/local/bin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ghcup binary should also be copied

@AlistairB AlistairB changed the title Playing with ghcup as an install method Switch to ghcup as the installation method Aug 28, 2021
@AlistairB AlistairB force-pushed the ghcup branch 4 times, most recently from 6e99d30 to 1cc9ad3 Compare August 28, 2021 01:28
@AlistairB AlistairB marked this pull request as ready for review August 28, 2021 21:23
@psftw
Copy link
Contributor

psftw commented Sep 10, 2021

Sorry for delayed response! I'm all on board with this and it's exciting to see.

We do not want to use multistage builds for technocratic reasons: https://github.com/docker-library/faq#multi-stage-builds

We definitely want to keep openssh-client which supports downloading dependencies from private git repositories, as well as libsqlite3-dev since it is a common "batteries-included" dependency. dirmngr is a little more nuance as I think we can drop it, but that depends on the calls to gnupg we need to add back:

For the initial download of ghcup, we need to follow official image guidelines and validate the GPG signatures that are provided by upstream w/ releases (i.e. here).

My understanding (which could be totally wrong!) is that ghcup will download a metadata file at runtime which contains binary locations and corresponding sha256 to validate. Since the Official Image project folks can rebuild us at any time, we now depend on this to work reliably given specific versions we are trying to support (currently ~"2 most recent GHC branches x 2 most recent Debian releases").

Putting on my contrarian hat for a moment -- why not just do what ghcup does directly? We could technically retain better security/transparency by checking GPG signatures at the expense of more effort to maintain the Dockerfile.

I will want to look closer at the image contents between install methods to build more confidence in what these changes are doing, but overall I think this is a good direction to go.

RE: including ghcup in the final image -- I don't think we should include it, but I go back and forth on this one. There is precedent for including it w/ rustup, but then that tool does so much more post-install that you may want.

@hasufell
Copy link
Member

We do not want to use multistage builds for technocratic reasons: https://github.com/docker-library/faq#multi-stage-builds

I read the comment and can't see what's wrong with multi-stage builds. If you clean up your cache, then yes, you might lose those. That's expected?

@AlistairB
Copy link
Contributor Author

I read the comment and can't see what's wrong with multi-stage builds. If you clean up your cache, then yes, you might lose those. That's expected?

It's a bit confusing I agree. The way I am reading the FAQ is that the docker-library build process cannot currently cache intermediate stages of multi-stage builds. So if you modify only the final image, it still has to rebuild the previous stages which is not the case when just using docker build.

I believe we match the case 2 in the FAQ so sounds like it might be acceptable?

I think the motivation for a multi-stage build here is only if we don't want to provide ghcup in the final image. It is a convenient way to not include it, dependent packages and any cache used as part of the ghc / cabal / stack installation process.

We definitely want to keep openssh-client which supports downloading dependencies from private git repositories, as well as libsqlite3-dev since it is a common "batteries-included" dependency.

No worries, I will add them back.

For the initial download of ghcup, we need to follow official image guidelines and validate the GPG signatures that are provided by upstream w/ releases (i.e. here).

Good point. I will add this.

Putting on my contrarian hat for a moment -- why not just do what ghcup does directly? We could technically retain better security/transparency by checking GPG signatures at the expense of more effort to maintain the Dockerfile.

This is a good question. I don't know enough about GPG and I'm going to read up about it. But can ghcup do GPG verification in a similar way? If we can make the same improvement in ghcup, that has a broader impact and we get the same benefit. I also don't know enough about what else ghcup does as part of installation of ghc that we might need.

At least in the windows case (+ perhaps ARM?) I think ghcup would be doing a lot more wrangling for us, so for that case it is probably the way to go.

Having said that, if we can relatively easily install ghc + cabal from the released artefacts I think that may be the way to go. It would be easy to avoid a multi stage build and keep the layers clean. I'll do some more investigation + thinking about this.

The current installation method is dependent on debian ghc
and cabal packaging, which often is slow to be updated.
Additionally, ghcup is becoming a standard installation technique
so we probably don't want to reinvent the wheel here.

It also provides an easier path to support windows and arm based
images.
@AlistairB
Copy link
Contributor Author

AlistairB commented Sep 11, 2021

Re gpg keys:

I've been trying to get gpg verification with ghcup and failing. ghcup includes SHA256SUMS and SHA256SUMS.sig produced as documented.

Trying something like the following doesn't quite work.

# tried with these 3 match on the email https://keyserver.ubuntu.com/pks/lookup?search=hasufell%40posteo.de&fingerprint=on&op=index
$ gpg --keyserver keyserver.ubuntu.com --recv-keys 511B62C09D50CD28
$ gpg --batch --trusted-key 511B62C09D50CD28 --verify SHA256SUMS.sig SHA256SUMS
gpg: Signature made Thu 12 Aug 2021 03:34:23 AEST
gpg:                using RSA key 7784930957807690A66EBDBE3786C5262ECB4A3F
gpg:                issuer "[email protected]"
gpg: Can't check signature: No public key

I think I must be doing something wrong. I am mostly just copying what we do for stack gpg verification but perhaps because this is .sig it is different. (I similarly fail for ghc / cabal verification, but I'm not certain which key to use for that.)

@hasufell
Copy link
Member

It's a bit confusing I agree. The way I am reading the FAQ is that the docker-library build process cannot currently cache intermediate stages of multi-stage builds. So if you modify only the final image, it still has to rebuild the previous stages which is not the case when just using docker build.

I don't think that's true at all. I've just tried it and caching works fine.

@AlistairB AlistairB mentioned this pull request Sep 12, 2021
@AlistairB AlistairB marked this pull request as draft September 12, 2021 22:42
@AlistairB AlistairB closed this Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Continually blocked on upstream packages
3 participants