Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image build process Freezes on Taking snapshot of full filesystem... #1333

Open
abhi1git opened this issue Jun 29, 2020 · 51 comments
Open

Image build process Freezes on Taking snapshot of full filesystem... #1333

abhi1git opened this issue Jun 29, 2020 · 51 comments
Labels
area/ci-cd area/multi-stage builds issues related to kaniko multi-stage builds area/performance issues related to kaniko performance enhancement categorized differs-from-docker gitlab issue/hang issue/oom ok-to-close? possible-dupe priority/p2 High impact feature/bug. Will get a lot of users happy works-with-docker
Milestone

Comments

@abhi1git
Copy link

abhi1git commented Jun 29, 2020

Actual behavior
While building image using gcr.io/kaniko-project/executor:debug in gitlab CI runner hosted on kubernetes using helm chart the image build process freezes on Taking snapshot of full filesystem... for the time till the runner timeouts(1 hr)
This behaviour is intermittent as for the same project image build stage works sometimes

Issue arises in multistage as well as single stage Dockerfile.

Expected behavior
Image build should not freeze at Taking snapshot of full filesystem... and should be successful everytime.

To Reproduce
As the behaviour is intermittent not sure how it can be reproduced

Description Yes/No
Please check if this a new feature you are proposing
Please check if the build works in docker but not in kaniko
  • - [Yes ]
Please check if this error is seen when you use --cache flag
Please check if your dockerfile is a multistage dockerfile

@tejal29

@abhi1git
Copy link
Author

Can you please elaborate snapshot for which filesystem is being taken while building image so that we can see if filesystem size is causing this issue. we are using kaniko to build images in gitlab cicd and runner is deployed on kubernetes using helm chart.
Preiously this issue used to arise randomly but all of our kaniko build image jobs get freeze on Taking snapshot of full filesystem...
@tejal29

@tejal29
Copy link
Contributor

tejal29 commented Jul 28, 2020

@abhi1git can you try the newer snapshot modes --snapshotMode=redo?

@tejal29 tejal29 added the area/performance issues related to kaniko performance enhancement label Jul 28, 2020
@tejal29
Copy link
Contributor

tejal29 commented Aug 12, 2020

@abhi1git please switch to using --snapshotMode=redo. See comments here #1305 (comment)

@tejal29 tejal29 closed this as completed Aug 12, 2020
@Kiddinglife
Copy link

Kiddinglife commented Dec 8, 2020

@abhi1git please switch to using --snapshotMode=redo. See comments here #1305 (comment)
I suffered from the same issue and --snapshotMode=redo did not resolve the issue.
@abhi1git do you get it work ?

@tejal29
Copy link
Contributor

tejal29 commented Dec 8, 2020

@Kiddinglife Can you provide your dockerfile or some stats on number of files in your repo?

@leoschet
Copy link

leoschet commented May 6, 2021

I am experience this problem while building an image with less than a gb. Interesting that it fails silently. GitLab CI job will be marked as successfull but no image is actually pushed.

We are using kaniko for several other projects but this error only happens on two projects. Both are monorepos and use lerna for extending yarn commands to sub packages.

I must say it was working at some point and it does work normally when using docker to build the image

Here is a snippet of the build logs:

INFO[0363] RUN yarn install --network-timeout 100000    
INFO[0363] cmd: /bin/sh                                 
INFO[0363] args: [-c yarn install --network-timeout 100000] 
INFO[0363] util.Lookup returned: &{Uid:1000 Gid:1000 Username:node Name: HomeDir:/home/node} 
INFO[0363] performing slow lookup of group ids for node 
INFO[0363] Running: [/bin/sh -c yarn install --network-timeout 100000] 
yarn install v1.22.5
info No lockfile found.
[1/4] Resolving packages...
INFO[0368] Pushed image to 1 destinations               
... A bunch of yarn logs ...
[4/4] Building fresh packages...
success Saved lockfile.
$ lerna bootstrap
lerna notice cli v3.22.1
lerna info bootstrap root only
yarn install v1.22.5
[1/4] Resolving packages...
success Already up-to-date.
$ lerna bootstrap
lerna notice cli v3.22.1
lerna WARN bootstrap Skipping recursive execution
Done in 20.00s.
Done in 616.92s.
INFO[0982] Taking snapshot of full filesystem...   

Interesting to note that RUN yarn install --network-timeout 100000 is not the last step in the dockerfile.

neither --snapshotMode=redo nor --use-new-run solved the problem

@sneerin
Copy link

sneerin commented Jun 29, 2021

same issue , nothing changed only version of kaniko

@Bobgy
Copy link

Bobgy commented Jul 30, 2021

I'm hitting the same problem, tried --snapshotMode=redo, but it does not always help.
What will help us resolve the issue here? Does reproducible dockerfile + number of files help with debugging?
I'm trying --use-new-run now.

@Bobgy
Copy link

Bobgy commented Jul 30, 2021

Adding a data point, I was initially observing the build process freezing problem, when I do not add any memory/cpu request/limits.
Then I added memory/cpu request & limits, the process starts to OOM. I increased memory limit to 6GB, but it still reaches OOM killed. When looking at the memory usage, it skyrockets at the end -- when log reaches taking snapshot of file system.
EDIT: I tried building the same image in local docker, and maximum memory usage is less than 1GB.

image

logs

+ dockerfile=v2/container/driver/Dockerfile
+ context_uri=
+ context_artifact_path=/tmp/inputs/context_artifact/data
+ context_sub_path=
+ destination=gcr.io/kfp-ci/4674c4982ab8fcf476e610f372fc0e4a38686805/v2-sample-test/test/kfp-driver
+ digest_output_path=/tmp/outputs/digest/data
+ cache=true
+ cache_ttl=24h
+ context=
+ '[['  '!='  ]]
+ context=dir:///tmp/inputs/context_artifact/data
+ dirname /tmp/outputs/digest/data
+ mkdir -p /tmp/outputs/digest
+ /kaniko/executor --dockerfile v2/container/driver/Dockerfile --context dir:///tmp/inputs/context_artifact/data --destination gcr.io/kfp-ci/4674c4982ab8fcf476e610f372fc0e4a38686805/v2-sample-test/test/kfp-driver --snapshotMode redo --image-name-with-digest-file /tmp/outputs/digest/data '--cache=true' '--cache-ttl=24h'
E0730 12:20:40.314406      21 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
	For verbose messaging see aws.Config.CredentialsChainVerboseErrors
�[36mINFO�[0m[0000] Resolved base name golang:1.15-alpine to builder 
�[36mINFO�[0m[0000] Using dockerignore file: /tmp/inputs/context_artifact/data/.dockerignore 
�[36mINFO�[0m[0000] Retrieving image manifest golang:1.15-alpine 
�[36mINFO�[0m[0000] Retrieving image golang:1.15-alpine from registry index.docker.io 
E0730 12:20:40.518068      21 metadata.go:166] while reading 'google-dockercfg-url' metadata: http status code: 404 while fetching url 
http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg-url
�[36mINFO�[0m[0001] Retrieving image manifest golang:1.15-alpine 
�[36mINFO�[0m[0001] Returning cached image manifest              
�[36mINFO�[0m[0001] No base image, nothing to extract            
�[36mINFO�[0m[0001] Built cross stage deps: map[0:[/build/v2/build/driver]] 
�[36mINFO�[0m[0001] Retrieving image manifest golang:1.15-alpine 
�[36mINFO�[0m[0001] Returning cached image manifest              
�[36mINFO�[0m[0001] Retrieving image manifest golang:1.15-alpine 
�[36mINFO�[0m[0001] Returning cached image manifest              
�[36mINFO�[0m[0001] Executing 0 build triggers                   
�[36mINFO�[0m[0001] Checking for cached layer gcr.io/kfp-ci/4674c4982ab8fcf476e610f372fc0e4a38686805/v2-sample-test/test/kfp-driver/cache:9164be18ba887abd9388518d533d79a6e2fda9f81f33e57e0c71319d7a6da78e... 
�[36mINFO�[0m[0001] No cached layer found for cmd RUN apk add --no-cache make bash 
�[36mINFO�[0m[0001] Unpacking rootfs as cmd RUN apk add --no-cache make bash requires it. 
�[36mINFO�[0m[0009] RUN apk add --no-cache make bash             
�[36mINFO�[0m[0009] Taking snapshot of full filesystem...        
�[36mINFO�[0m[0016] cmd: /bin/sh                                 
�[36mINFO�[0m[0016] args: [-c apk add --no-cache make bash]      
�[36mINFO�[0m[0016] Running: [/bin/sh -c apk add --no-cache make bash] 
fetch 
https://dl-cdn.alpinelinux.org/alpine/v3.14/main/x86_64/APKINDEX.tar.gz
fetch 
https://dl-cdn.alpinelinux.org/alpine/v3.14/community/x86_64/APKINDEX.tar.gz
(1/5) Installing ncurses-terminfo-base (6.2_p20210612-r0)
(2/5) Installing ncurses-libs (6.2_p20210612-r0)
(3/5) Installing readline (8.1.0-r0)
(4/5) Installing bash (5.1.4-r0)
Executing bash-5.1.4-r0.post-install
(5/5) Installing make (4.3-r0)
Executing busybox-1.33.1-r2.trigger
OK: 9 MiB in 20 packages
�[36mINFO�[0m[0016] Taking snapshot of full filesystem...        
�[36mINFO�[0m[0017] Pushing layer gcr.io/kfp-ci/4674c4982ab8fcf476e610f372fc0e4a38686805/v2-sample-test/test/kfp-driver/cache:9164be18ba887abd9388518d533d79a6e2fda9f81f33e57e0c71319d7a6da78e to cache now 
�[36mINFO�[0m[0017] WORKDIR /build                               
�[36mINFO�[0m[0017] cmd: workdir                                 
�[36mINFO�[0m[0017] Changed working directory to /build          
�[36mINFO�[0m[0017] Creating directory /build                    
�[36mINFO�[0m[0017] Taking snapshot of files...                  
�[36mINFO�[0m[0017] Pushing image to gcr.io/kfp-ci/4674c4982ab8fcf476e610f372fc0e4a38686805/v2-sample-test/test/kfp-driver/cache:9164be18ba887abd9388518d533d79a6e2fda9f81f33e57e0c71319d7a6da78e 
�[36mINFO�[0m[0017] COPY api/go.mod api/go.sum api/              
�[36mINFO�[0m[0017] Taking snapshot of files...                  
�[36mINFO�[0m[0017] COPY v2/go.mod v2/go.sum v2/                 
�[36mINFO�[0m[0017] Taking snapshot of files...                  
�[36mINFO�[0m[0017] RUN cd v2 && go mod download                 
�[36mINFO�[0m[0017] cmd: /bin/sh                                 
�[36mINFO�[0m[0017] args: [-c cd v2 && go mod download]          
�[36mINFO�[0m[0017] Running: [/bin/sh -c cd v2 && go mod download] 
�[36mINFO�[0m[0018] Pushed image to 1 destinations               
�[36mINFO�[0m[0140] Taking snapshot of full filesystem...        
Killed

version: gcr.io/kaniko-project/executor:v1.6.0-debug
args: I added snapshotMode redo, cache=true
env: GKE 1.19, use kubeflow pipelies to run kaniko containers

@Bobgy
Copy link

Bobgy commented Aug 1, 2021

I guess the root cause is actually insufficient memory, but when we do not allocate enough memory it will freeze on taking snapshot of full filesystem... as a symptom.

@Bobgy
Copy link

Bobgy commented Aug 2, 2021

Edit: my guess is wrong, I reverted to kaniko:1.3.0-debug and added enough memory requests & limit, but I'm still observing the image build freezing problem from time to time.

@oussemos
Copy link

Hi @abhi1git, did you find a solution for your issue ? I am facing the same.

@sph3rex
Copy link

sph3rex commented Oct 26, 2021

The issue is still actual for me too. Any updates?

@pY4x3g
Copy link

pY4x3g commented Dec 9, 2021

Same issue here, the system has enough memory (not hitting any memory limits),
--snapshotMode=redo and --use-new-run are not changing the behavior at all, I do not see any problems when using trace verbosity. I am currently using 1.17.0-debug

@oussemos
Copy link

Hi @abhi1git, did you find a solution for your issue ? I am facing the same.

For us, after investigations, we found that the WAF in front of our Gitlab was blocking the requests. After whitelisting it, all is working fine.

@jsravn
Copy link
Contributor

jsravn commented Jan 20, 2022

Still an issue, can you reopen @tejal29? Building an image like this shouldn't be OOMKilling/using GBs of RAM - seems like a clear cut bug to me.

@imjasonh imjasonh reopened this Jan 20, 2022
@gtskaushik
Copy link

Hi @abhi1git, did you find a solution for your issue ? I am facing the same.

For us, after investigations, we found that the WAF in front of our Gitlab was blocking the requests. After whitelisting it, all is working fine.

What kind of whitelisting was required for this? Can you help me to clarify how to set it up?

@oussemos
Copy link

Hi @abhi1git, did you find a solution for your issue ? I am facing the same.

For us, after investigations, we found that the WAF in front of our Gitlab was blocking the requests. After whitelisting it, all is working fine.

What kind of whitelisting was required for this? Can you help me to clarify how to set it up?

If you have a WAF in front of Gitlab, It would be good to check your logs and confirm what kind of requests is blocking first.

@irizzant
Copy link

Anyone tried with version 1.7.0?

@imjasonh
Copy link
Collaborator

Anyone tried with version 1.7.0?

v1.7.0 is about 4 months old, and had some showstopper auth issues, and :latest currently points to :v1.6.0, so I would guess that not many folks are using :v1.7.0

Instead, while we wait for v1.8.0 (#1871) you can try a commit-tagged image, the latest of which is currently :09e70e44d9e9a3fecfcf70cb809a654445837631

@irizzant
Copy link

Thanks @imjasonh I'm going to try gcr.io/kaniko-project/executor:09e70e44d9e9a3fecfcf70cb809a654445837631-debug

@irizzant
Copy link

irizzant commented Feb 16, 2022

I've tried gcr.io/kaniko-project/executor:09e70e44d9e9a3fecfcf70cb809a654445837631-debug with --snapshotMode=redo --use-new-run, my pipeline is still stuck in

INFO[0009] Taking snapshot of full filesystem...        

Guess the only solution is waiting for another commit-tagged image or 1.8.0 to be released

@imjasonh
Copy link
Collaborator

Guess the only solution is waiting for another commit-tagged image or 1.8.0 to be released

It sounds like whatever bug is causing that is still present, so it won't be fixed by releasing the latest image as v1.8.0. We just need someone to figure out why it gets stuck and fix it.

Unfortunately Kaniko is not really actively staffed at the moment, so it's probably going to fall to you or me or some other kind soul reading this to investigate and get us back on the track to solving this. Any takers?

@irizzant
Copy link

It sounds like whatever bug is causing that is still present, so it won't be fixed by releasing the latest image as v1.8.0. We just need someone to figure out why it gets stuck and fix it.

Hold on a second, maybe I spoke early!

My pipeline currently builds multiple images in parallel.
I didn't realize before that one of them that before was stuck in taking snapshot now goes on smoothly with --snapshotMode=redo --use-new-run and gcr.io/kaniko-project/executor:09e70e44d9e9a3fecfcf70cb809a654445837631-debug.

The images actually stuck are basically the same Postgres image built with different build-arg values, so this ends up by running in parallel (and caching in parallel) the same layers.

I consequently tried to remove this parallelism and tried to build these Postgres images in sequence.
I ended up with Postgres images stuck in taking snapshot in parallel with a totally different NodeJs image, also stuck in taking snapshots.

So from my tests it looks like when building images happens in parallel against the same registry mirror used as cache, if one image is taking snapshots in parallel with another it gets stuck.

It may be a coincidence, maybe not.
I repeat: this is from my tests, it could be totally unrelated to the problem

@chenlein
Copy link

same issue:

  containers:
  - args:
    - --dockerfile=/workspace/Dockerfile
    - --context=dir:///workspace/
    - --destination=xxxx/xxx/xxx:1.0.0
    - --skip-tls-verify
    - --verbosity=debug
    - --build-arg="http_proxy='http://xxxx'"
    - --build-arg="https_proxy='http://xxxx'"
    - --build-arg="HTTP_PROXY='http://xxxx'"
    - --build-arg="HTTPS_PROXY='http://xxxx'"
    image: gcr.io/kaniko-project/executor:v1.7.0
    imagePullPolicy: IfNotPresent
    name: kaniko
    volumeMounts:
    - mountPath: /kaniko/.docker
      name: secret
    - mountPath: /workspace
      name: code

here are some logs, maybe useful

......
DEBU[0021] Whiting out /usr/share/doc/linux-libc-dev/.wh..wh..opq 
DEBU[0021] not including whiteout files                 
DEBU[0021] Whiting out /usr/share/doc/make/.wh..wh..opq 
DEBU[0021] not including whiteout files                 
DEBU[0021] Whiting out /usr/share/doc/pkg-config/.wh..wh..opq 
DEBU[0021] not including whiteout files                 
DEBU[0021] Whiting out /usr/share/gdb/auto-load/lib/.wh..wh..opq 
DEBU[0021] not including whiteout files                 
DEBU[0021] Whiting out /usr/share/glib-2.0/.wh..wh..opq 
DEBU[0021] not including whiteout files                 
DEBU[0021] Whiting out /usr/share/perl5/Dpkg/.wh..wh..opq 
DEBU[0021] not including whiteout files                 
DEBU[0021] Whiting out /usr/share/pkgconfig/.wh..wh..opq 
DEBU[0021] not including whiteout files                 
DEBU[0021] Whiting out /usr/local/go/.wh..wh..opq       
DEBU[0021] not including whiteout files                 
DEBU[0030] Whiting out /go/.wh..wh..opq                 
DEBU[0030] not including whiteout files                 
INFO[0030] ENV GOPRIVATE "gitee.com/dmcca/*"            
DEBU[0030] build: skipping snapshot for [ENV GOPRIVATE "gitee.com/dmcca/*"] 
INFO[0030] ENV GOPROXY "https://goproxy.cn,direct"      
DEBU[0030] build: skipping snapshot for [ENV GOPROXY "https://goproxy.cn,direct"] 
DEBU[0030] Resolved ./.netrc to .netrc                  
DEBU[0030] Resolved /root/.netrc to /root/.netrc        
DEBU[0030] Getting files and contents at root /workspace/ for /workspace/.netrc 
DEBU[0030] Using files from context: [/workspace/.netrc] 
INFO[0030] COPY ./.netrc /root/.netrc                   
DEBU[0030] Resolved ./.netrc to .netrc                  
DEBU[0030] Resolved /root/.netrc to /root/.netrc        
DEBU[0030] Getting files and contents at root /workspace/ for /workspace/.netrc 
DEBU[0030] Copying file /workspace/.netrc to /root/.netrc 
INFO[0030] Taking snapshot of files...                  
DEBU[0030] Taking snapshot of files [/root/.netrc / /root] 
INFO[0030] RUN chmod 600 /root/.netrc                   
INFO[0030] Taking snapshot of full filesystem...        

@max-au
Copy link

max-au commented Mar 14, 2022

Same issue here:

INFO[0163] Taking snapshot of full filesystem...        
fatal error: runtime: out of memory
runtime stack:
runtime.throw({0x12f3614, 0x16})
	/usr/local/go/src/runtime/panic.go:1198 +0x54
runtime.sysMap(0x4041c00000, 0x20000000, 0x220fdd0)
	/usr/local/go/src/runtime/mem_linux.go:169 +0xbc

<...>
github.com/google/go-containerregistry/pkg/v1/tarball.WithCompressedCaching.func1()
	/src/vendor/github.com/google/go-containerregistry/pkg/v1/tarball/layer.go:119 +0x6c fp=0x40005d3b10 sp=0x40005d3a80 pc=0xa6134c
github.com/google/go-containerregistry/pkg/v1/tarball.computeDigest(0x40008a5d70)
	/src/vendor/github.com/google/go-containerregistry/pkg/v1/tarball/layer.go:278 +0x44 fp=0x40005d3b80 sp=0x40005d3b10 pc=0xa624e4
github.com/google/go-containerregistry/pkg/v1/tarball.LayerFromOpener(0x400000d2c0, {0x40005d3cf8, 0x1, 0x1})
	/src/vendor/github.com/google/go-containerregistry/pkg/v1/tarball/layer.go:247 +0x3f4 fp=0x40005d3c20 sp=0x40005d3b80 pc=0xa62174
github.com/google/go-containerregistry/pkg/v1/tarball.LayerFromFile({0x4000a22018, 0x12}, {0x40005d3cf8, 0x1, 0x1})
	/src/vendor/github.com/google/go-containerregistry/pkg/v1/tarball/layer.go:188 +0x8c fp=0x40005d3c70 sp=0x40005d3c20 pc=0xa61cbc
github.com/GoogleContainerTools/kaniko/pkg/executor.pushLayerToCache(0x21d93a0, {0x40008b75c0, 0x40}, {0x4000a22018, 0x12}, {0x400016d940, 0x3a})
	/src/pkg/executor/push.go:295 +0x68 fp=0x40005d3ee0 sp=0x40005d3c70 pc=0xf1d4a8
github.com/GoogleContainerTools/kaniko/pkg/executor.(*stageBuilder).build.func3()
	/src/pkg/executor/build.go:425 +0xa4 fp=0x40005d3f60 sp=0x40005d3ee0 pc=0xf16474
<...>
compress/gzip.(*Writer).Write(0x40006780b0, {0x40014f6000, 0x8000, 0x8000})
	/usr/local/go/src/compress/gzip/gzip.go:196 +0x388
io.copyBuffer({0x1678960, 0x40006780b0}, {0x167dfe0, 0x40006163e8}, {0x0, 0x0, 0x0})
	/usr/local/go/src/io/io.go:425 +0x224
io.Copy(...)
	/usr/local/go/src/io/io.go:382
github.com/google/go-containerregistry/internal/gzip.ReadCloserLevel.func1(0x400064be80, 0x1, 0x40006163f8, {0x16902e0, 0x40006163e8})
	/src/vendor/github.com/google/go-containerregistry/internal/gzip/zip.go:60 +0xb4
created by github.com/google/go-containerregistry/internal/gzip.ReadCloserLevel
	/src/vendor/github.com/google/go-containerregistry/internal/gzip/zip.go:52 +0x230

Docker works fine (yet requires privileged mode).

@aleksey-masl
Copy link

Hello everyone! I found solution here https://stackoverflow.com/questions/67748472/can-kaniko-take-snapshots-by-each-stage-not-each-run-or-copy-operation adding option to kaniko --single-snapshot

/kaniko/executor
--context "${CI_PROJECT_DIR}"
--dockerfile "${CI_PROJECT_DIR}/Dockerfile"
--destination "${YC_CI_REGISTRY}/${YC_CI_REGISTRY_ID}/${CI_PROJECT_PATH}:${CI_COMMIT_SHA}"
--single-snapshot

@aaron-prindle aaron-prindle added this to the v1.15.0 milestone Aug 8, 2023
@iamkhalidbashir
Copy link

I have this problem too in Gitlab CI/CD

Same for me too

@aleksey-masl
Copy link

If it doesn't work, then may try adding --use-new-run and --snapshot-mode=redo
All flags https://github.com/GoogleContainerTools/kaniko/blob/main/README.md#
For mу it is working!

- mkdir -p /kaniko/.docker
- echo "{\"auths\":{\"${YC_CI_REGISTRY}\":{\"auth\":\"$(printf "%s:%s" "${YC_CI_REGISTRY_USER}" "${YC_CI_REGISTRY_PASSWORD}" | base64 | tr -d '\n')\"}}}" > /kaniko/.docker/config.json
- >-
  /kaniko/executor
  --context "${CI_PROJECT_DIR}"
  --use-new-run
  --snapshot-mode=redo
  --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
  --destination "${YC_CI_REGISTRY}/${YC_CI_REGISTRY_ID}/${CI_PROJECT_PATH}:${CI_COMMIT_REF_SLUG}-${CI_COMMIT_SHA}"

@aaron-prindle aaron-prindle self-assigned this Aug 15, 2023
@aaron-prindle aaron-prindle modified the milestones: v1.15.0, v1.16.0 Aug 29, 2023
@aaron-prindle aaron-prindle removed their assignment Oct 10, 2023
@bhack
Copy link

bhack commented Nov 24, 2023

I have the same issue. Is it a disk size issue?

@aminya
Copy link

aminya commented Nov 24, 2023

we could fix the gitlab cicd pipeline error

Taking snapshot of full filesystem....
Killed

with --compressed-caching=false and v1.8.0-debug. The image is around 2 GB. Alpine reported around 4 GB in around 100 packages.

I see this answer being lost in this thread, but it fixed the issue for me. Just pass this flag to Kaniko

--compressed-caching=false

@bhack
Copy link

bhack commented Nov 24, 2023

--compressed-caching=false

it Is not available in the Skaffold schema for Kaniko. So I am trying to understand the root cause of this issue

@neighbour-oldhuang
Copy link

俺也一样

@kirin-13
Copy link

kirin-13 commented Aug 8, 2024

俺也一样

添加--single-snapshot参数试试。

@hottehead
Copy link

I had this problem when trying to install terraform in an alpine linux image with the recommendations from this page https://www.hashicorp.com/blog/installing-hashicorp-tools-in-alpine-linux-containers

However the apk del .deps command in the very last line triggered the issue. Presumably this changes a lot of files?

@DaDummy
Copy link

DaDummy commented Sep 16, 2024

It seems that GitLab Kubernetes Runner pods may appear stuck when the build container is OOMKilled, as the helper container keeps running in that case until job timeout. In our case this is exactly what is happening.

So the issue for us is not about performance at all, but rather about memory usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci-cd area/multi-stage builds issues related to kaniko multi-stage builds area/performance issues related to kaniko performance enhancement categorized differs-from-docker gitlab issue/hang issue/oom ok-to-close? possible-dupe priority/p2 High impact feature/bug. Will get a lot of users happy works-with-docker
Projects
None yet
Development

No branches or pull requests