Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of images #339

Draft
wants to merge 4 commits into
base: dev
Choose a base branch
from
Draft

Conversation

lap1nou
Copy link
Contributor

@lap1nou lap1nou commented Apr 20, 2024

Description

Greetings,

This PR aims to reduce the size of the image (by around 775 MB) by removing files that aren't used and that take place in the layers of the final image.

Image size before: ~25,34 GB, after: ~24,374 GB, I used dive (https://github.com/wagoodman/dive) to know which file was wasting space.

I also have other optimisations that could reduce the size of the image, I can create another PR if this one is merged.

Regards.

Related issues

N/A

Point of attention

I only ran dive on the web image, I will probably use it on the full image if you find this PR interesting in order to find more optimizations.

@qu35t-code
Copy link
Member

Very good job @lap1nou !

@qu35t-code qu35t-code requested a review from ShutdownRepo April 20, 2024 07:51
@qu35t-code qu35t-code added enhancement New feature or request under review labels Apr 20, 2024
@gbe
Copy link
Contributor

gbe commented Apr 20, 2024

Nice catch !
Question: instead of deleting individual files, wouldn't it be better to simply wipe the content of the folder /tmp in the post install phase ? That way we'd know for sure there isn't any files left.
I can't check at the moment.

@ShutdownRepo
Copy link
Member

Nice catch ! Question: instead of deleting individual files, wouldn't it be better to simply wipe the content of the folder /tmp in the post install phase ? That way we'd know for sure there isn't any files left. I can't check at the moment.

already the case
image

@lap1nou any idea why the files you removed on your PR were still here?

@lap1nou
Copy link
Contributor Author

lap1nou commented Apr 20, 2024

Greetings,

@ShutdownRepo yes I think it's a Docker specificity, while you are indeed removing them, this happens in another layer (because it's another RUN command in the Dockerfile, each command create another layer), but the file are still present in one of the layer so this take space in the final Docker image.

References:

I'm not a Docker expert so I might be wrong, altough the size of the image do indeed shrink with these changes.

Regards.

@Dramelac
Copy link
Member

any idea why the files you removed on your PR were still here?

Indeed, Docker store every file of the layer at the end of each RUN commands, so removing them in the last layer is not enough.

Thanks @lap1nou for the PR, don't hesitates to submit more if you have time to dive into it ;)

@gbe
Copy link
Contributor

gbe commented Apr 21, 2024

Instead of merging this PR , I think a refactoring of the function post_install should be considered as this function already takes care of wiping /tmp.

It means that the function post_install should :

It could also be a new dedicated function like clean_docker_layer.

@lap1nou
Copy link
Contributor Author

lap1nou commented Apr 21, 2024

I agree @gbe,

However I have to say that removing pip folder after they are written is less efficient than telling pip to not write them in the first place, but a clean function would indeed be more clear / readable / maintenable.

@lap1nou
Copy link
Contributor Author

lap1nou commented Apr 23, 2024

Hey,

Here is some more space we could save:

  • Use GOFLAGS='-ldflags=-s' before any go build / install, this should save around 4 MB per binary (it would save more combined with the -w LDFLAGS but I can't use it as the go command is handled by catch_and_retry()), this should represent something like 150 MB in total
  • Remove bundler cache with --no-cache (or remove the cache folder from a dedicated function as suggested)
  • Run npm prune --production, this should remove all devDependencies (I think this should save around 200 MB)

I need your feedback to know which of these optimisations I can apply to this PR, I already tested them a bit offline and it seems to work fine.

My full image seems to be at 41 GB now but this may not be precise.

@gbe
Copy link
Contributor

gbe commented Apr 23, 2024

I had already suggested in a previous PR to remove the .git folders but @ShutdownRepo wants to keep them so users can pull the latest versions if needed.

@ShutdownRepo
Copy link
Member

Instead of merging this PR , I think a refactoring of the function post_install should be considered as this function already takes care of wiping /tmp.

It means that the function post_install should :

* not be called at the latest layer, but at the end of each layer to avoid the oversize

* include the work done in [Reduce image size - remove pip cache dir #341](https://github.com/ThePorgs/Exegol-images/pull/341)

It could also be a new dedicated function like clean_docker_layer.

agreed let's do that

Hey,

Here is some more space we could save:

* [ ]  Use `GOFLAGS='-ldflags=-s'` before any `go build / install`, this should save around 4 MB per binary (it would save more combined with the `-w` LDFLAGS but I can't use it as the `go` command is handled by `catch_and_retry()`), this should represent something like 150 MB in total

* [ ]  Remove `bundler` cache with `--no-cache` (or remove the cache folder from a dedicated function as suggested)

* [ ]  Run `npm prune --production`, this should remove all devDependencies (I think this should save around 200 MB)

I need your feedback to know which of these optimisations I can apply to this PR, I already tested them a bit offline and it seems to work fine.

My full image seems to be at 41 GB now but this may not be precise.

Let's do 2 (bundler) and 3 (npm prune)

I had already suggested in a previous PR to remove the .git folders but @ShutdownRepo wants to keep them so users can pull the latest versions if needed.

Yes, imo we need to keep .git. But I'd be curious to have a benchmark of how much space we could gain. Because if we were to save a lot of space, maybe I would be wiser to reconsider.

Anyways, thank you @gbe and @lap1nou for sharing your insight and investing your time in this thread!! Reducing the size of the images is important for the project and its users.

@lap1nou
Copy link
Contributor Author

lap1nou commented Apr 30, 2024

Good I'm going to apply all of this, the bundler thing could be applied in the clean layer function but not the npm one.

To answer your question @ShutdownRepo, .git folder takes 1.52 GB on the full image, here is the script used (remove the last result as we don't count the mounted resources .git folder):

#!/bin/bash

# Find all .git folders and calculate their sizes
total_size=0
while IFS= read -r -d '' git_folder; do
    git_size=$(du -b -s "$git_folder" | awk '{print $1}')
    total_size=$(echo $total_size + $git_size | bc)
done < <(find / -type d -name ".git" -print0 2>/dev/null)

# Display the total size in human-readable format
echo "Total size of all .git folders: $total_size"

@gbe
Copy link
Contributor

gbe commented Apr 30, 2024

Maybe the .git folders could be left exclusively in the nightly image for users willing to have cutting edge versions?

@qu35t-code qu35t-code marked this pull request as draft June 9, 2024 13:04
@dafta
Copy link

dafta commented Oct 20, 2024

You could also do a shallow clone with git clone --depth=1. This will only download the latest commit and won't have the commit history older than that, saving a ton of space, and yet you can still git pull and get the newest version. And if, for some reason, someone might need the history, it's easy to get it back with git fetch --unshallow.

@qu35t-code
Copy link
Member

It's already in the code !

@dafta
Copy link

dafta commented Oct 20, 2024

Woops, sorry, my bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants