Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pushing Images to zarf registry hangs forever #1218

Closed
corang opened this issue Jan 18, 2023 · 28 comments
Closed

Pushing Images to zarf registry hangs forever #1218

corang opened this issue Jan 18, 2023 · 28 comments
Assignees

Comments

@corang
Copy link
Contributor

corang commented Jan 18, 2023

Environment

Device and OS: Ubuntu 22
App version: 0.23.2+
Kubernetes distro being used: k3s in k3d
k8s version: v1.24.4+k3s1
Other:

Steps to reproduce

  1. Try to deploy zarf init package or any package, more images more chance of happening

Expected result

All images are pushed to registry and package deployment continues

Actual Result

Sometimes the package is pushed successfully (looking at registry logs) but the zarf CLI just hangs forever on "Updating Image xxx"

Visual Proof (screenshots, videos, text, etc)

image
Has been on this image effectively the entire time

Severity/Priority

High, I can't use zarf as it is now

Additional Context

@corang
Copy link
Contributor Author

corang commented Jan 18, 2023

The image push eventually finished but is happening on other images as well where they're taking way too long to push to registry for their size/this hardware

@corang
Copy link
Contributor Author

corang commented Jan 18, 2023

Last relevant registry log when CLI is hanging
image

@corang
Copy link
Contributor Author

corang commented Jan 18, 2023

CLI while hanging with debug out
image

@corang
Copy link
Contributor Author

corang commented Jan 18, 2023

Pod that was handling the image push was scaled out and deleted

@corang
Copy link
Contributor Author

corang commented Jan 18, 2023

CLI error 2023-01-18T16:36:24-06:00 - "msg"="lost connection to pod\n" "error"=null
The next attempt at pushing the image worked

@corang
Copy link
Contributor Author

corang commented Jan 18, 2023

Same error when pod isn't scaled out 2023-01-18T16:46:43-06:00 - "msg"="lost connection to pod\n" "error"=null

@corang
Copy link
Contributor Author

corang commented Jan 18, 2023

All http PUTs seem to work fine, every time a hang starts it's on an http PATCH but not every PATCH fails

@corang
Copy link
Contributor Author

corang commented Jan 18, 2023

PATCH log that never completes

127.0.0.1 - - [18/Jan/2023:22:46:58 +0000] "PATCH /v2/ironbank/opensource/jaegertracing/all-in-one-3462929528/blobs/uploads/fef99b5e-dda6-4a1f-bf76-53513b7f24ac?_state=yt_eiCMM8qAiT65Sl3QCxUhXdi9TSwI2B0QB33s74017Ik5hbWUiOiJpcm9uYmFuay9vcGVuc291cmNlL2phZWdlcnRyYWNpbmcvYWxsLWluLW9uZS0zNDYyOTI5NTI4IiwiVVVJRCI6ImZlZjk5YjVlLWRkYTYtNGExZi1iZjc2LTUzNTEzYjdmMjRhYyIsIk9mZnNldCI6MCwiU3RhcnRlZEF0IjoiMjAyMy0wMS0xOFQyMjo0Njo1OC44NTY4MTMwMTlaIn0%3D HTTP/1.1" 202 0 "" "go-containerregistry/v0.12.1"

@corang
Copy link
Contributor Author

corang commented Jan 18, 2023

New error

  DEBUG   2023-01-18T16:58:29-06:00  -  "msg"="lost connection to pod\n" "error"=null                                                                                                 
  ⠸  Updating image registry1.dso.mil/ironbank/opensource/jaegertracing/all-in-one:1.37.0 (11m39s)                                                                                    
  DEBUG   2023-01-18T16:58:30-06:00  -  tunnel.Close()
  DEBUG   2023-01-18T16:58:35-06:00  -  unable to deploy all components in this Zarf Package: unable to deploy component big-bang: unable to push images to the registry: Head "http://127.0.0.1:46217/v2/ironbank/opensource/jaegertracing/all-in-one-3462929528/blobs/sha256:e2873b3fbbec402bafc59c62563e7264f20ad5dccce4dae4ebfb5edd55afb4cb": dial tcp 127.0.0.1:46217: connect: connection refused
     ERROR:  Failed to deploy package: unable to deploy all components in this Zarf Package: unable to deploy
             component big-bang: unable to push images to the registry: Head
             "http://127.0.0.1:46217/v2/ironbank/opensource/jaegertracing/all-in-one-3462929528/blobs/sha256:e2873b3fbbec402bafc59c62563e7264f20ad5dccce4dae4ebfb5edd55afb4cb":
             dial tcp 127.0.0.1:46217: connect: connection refused
  DEBUG   2023-01-18T16:58:35-06:00  -  goroutine 1 [running]:
          runtime/debug.Stack()
                /opt/hostedtoolcache/go/1.19.4/x64/src/runtime/debug/stack.go:24 +0x65
          github.com/defenseunicorns/zarf/src/pkg/message.Fatalf({0x2f7bfa0?, 0xc0000ba380?}, {0x3489a1f, 0x1c}, {0xc001dbbd38, 0x1, 0x1})
                /home/runner/work/zarf/zarf/src/pkg/message/message.go:150 +0x1df
          github.com/defenseunicorns/zarf/src/cmd.glob..func12(0x6031d20?, {0xc000b4a680?, 0x4?, 0x4?})
                /home/runner/work/zarf/zarf/src/cmd/package.go:86 +0x11f
          github.com/spf13/cobra.(*Command).execute(0x6031d20, {0xc000b4a640, 0x4, 0x4})
                /home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:920 +0x847
          github.com/spf13/cobra.(*Command).ExecuteC(0x6032e60)
                /home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1044 +0x3bd
          github.com/spf13/cobra.(*Command).Execute(...)
                /home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:968
          github.com/defenseunicorns/zarf/src/cmd.Execute()
                /home/runner/work/zarf/zarf/src/cmd/root.go:64 +0x25
          main.main()
                /home/runner/work/zarf/zarf/main.go:23 +0x6f

@corang
Copy link
Contributor Author

corang commented Jan 19, 2023

Restarting my machine seems to have solved this however I want to leave this issue open to discuss building a better mechanism around detected lost connections when pushing images

@corang
Copy link
Contributor Author

corang commented Jan 20, 2023

Restarting only works for first cluster stood up after restart, if that cluster is torn down and another is started the issue resurfaces. My current workaround is to run sync && echo 3 | sudo tee /proc/sys/vm/drop_caches between tearing down and starting a new cluster.

@jeff-mccoy
Copy link
Contributor

When the tunnel is failing it would be great to see the k8s events and the pod logs at the same time.

@supcom234
Copy link

OS is Ubuntu Server 20.04 LTS

root@rke1:# zarf version
v0.23.5
root@rke1:
# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.9+rke2r2", GitCommit:"9710807c82740b9799453677c977758becf0acbb", GitTreeState:"clean", BuildDate:"2023-01-04T18:07:51Z", GoVersion:"go1.18.9b7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.9+rke2r2", GitCommit:"9710807c82740b9799453677c977758becf0acbb", GitTreeState:"clean", BuildDate:"2023-01-04T18:07:51Z", GoVersion:"go1.18.9b7", Compiler:"gc", Platform:"linux/amd64"}
root@rke1:~# rke2 --version
rke2 version v1.24.9+rke2r2 (2f4571a879954e1ea8d4560023eaf57c567df737)
go version go1.18.7b7

Running into same error with zarf package deploy command. It hangs on deploying the fluxcd/helm-controller. Furthermore, it has hung on zarf init pulling the gitea package as well.

image

@corang
Copy link
Contributor Author

corang commented Jan 24, 2023

Failing on zarf init @jeff-mccoy
k3d cluster command:

k3d cluster create di2me   --volume /etc/machine-id:/etc/machine-id   --k3s-arg "--disable=traefik@server:0"   --port 80:80@loadbalancer   --port 443:443@loadbalancer   --api-port 6443 --agents 3

registry log:

time="2023-01-24T22:00:33.064345161Z" level=info msg="debug server listening :5001" 
time="2023-01-24T22:00:33.064653723Z" level=info msg="redis not configured" go.version=go1.16.15 instance.id=dc1047f5-e870-4edc-b6d8-a925530aeec9 service=registry version="v2.8.1+unknown" 
time="2023-01-24T22:00:33.06466825Z" level=info msg="Starting upload purge in 39m0s" go.version=go1.16.15 instance.id=dc1047f5-e870-4edc-b6d8-a925530aeec9 service=registry version="v2.8.1+unknown" 
time="2023-01-24T22:00:33.069429855Z" level=info msg="using inmemory blob descriptor cache" go.version=go1.16.15 instance.id=dc1047f5-e870-4edc-b6d8-a925530aeec9 service=registry version="v2.8.1+unknown" 
time="2023-01-24T22:00:33.069468617Z" level=info msg="listening on [::]:5000" go.version=go1.16.15 instance.id=dc1047f5-e870-4edc-b6d8-a925530aeec9 service=registry version="v2.8.1+unknown" 
10.42.3.1 - - [24/Jan/2023:22:00:33 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
127.0.0.1 - - [24/Jan/2023:22:00:37 +0000] "GET /v2/ HTTP/1.1" 401 87 "" "Go-http-client/1.1"
time="2023-01-24T22:00:37.211732262Z" level=warning msg="error authorizing context: basic authentication challenge for realm "Registry Realm": invalid authorization credential" go.version=go1.16.15 http.request.host="127.0.0.1:33469" http.request.id=74b6bf73-2d53-47cc-a17d-effbfd895b17 http.request.method=GET http.request.remoteaddr="127.0.0.1:51450" http.request.uri="/v2/" http.request.useragent="Go-http-client/1.1" 
time="2023-01-24T22:00:37.266681636Z" level=info msg="authorized request" go.version=go1.16.15 http.request.host="127.0.0.1:33469" http.request.id=3f46f4a8-7ae8-4802-b5c0-abcd2f7d4cce http.request.method=HEAD http.request.remoteaddr="127.0.0.1:51474" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/sha256:d367ade92ae80e1dc31fb2ef62e4fdd5544ee2e5ed137daaec6f9346da771249" http.request.useragent="go-containerregistry/v0.12.1" vars.digest="sha256:d367ade92ae80e1dc31fb2ef62e4fdd5544ee2e5ed137daaec6f9346da771249" vars.name="defenseunicorns/zarf/dev-agent" 
127.0.0.1 - - [24/Jan/2023:22:00:37 +0000] "HEAD /v2/defenseunicorns/zarf/dev-agent/blobs/sha256:d367ade92ae80e1dc31fb2ef62e4fdd5544ee2e5ed137daaec6f9346da771249 HTTP/1.1" 404 157 "" "go-containerregistry/v0.12.1"
time="2023-01-24T22:00:37.266866647Z" level=error msg="response completed with error" auth.user.name=zarf-push err.code="blob unknown" err.detail=sha256:d367ade92ae80e1dc31fb2ef62e4fdd5544ee2e5ed137daaec6f9346da771249 err.message="blob unknown to registry" go.version=go1.16.15 http.request.host="127.0.0.1:33469" http.request.id=3f46f4a8-7ae8-4802-b5c0-abcd2f7d4cce http.request.method=HEAD http.request.remoteaddr="127.0.0.1:51474" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/sha256:d367ade92ae80e1dc31fb2ef62e4fdd5544ee2e5ed137daaec6f9346da771249" http.request.useragent="go-containerregistry/v0.12.1" http.response.contenttype="application/json; charset=utf-8" http.response.duration=52.52265ms http.response.status=404 http.response.written=157 vars.digest="sha256:d367ade92ae80e1dc31fb2ef62e4fdd5544ee2e5ed137daaec6f9346da771249" vars.name="defenseunicorns/zarf/dev-agent" 
time="2023-01-24T22:00:37.267881029Z" level=info msg="authorized request" go.version=go1.16.15 http.request.host="127.0.0.1:33469" http.request.id=50c3fa3a-3398-4a10-aca2-f6c1db61635e http.request.method=HEAD http.request.remoteaddr="127.0.0.1:51450" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/sha256:9a4281627a6ffb9d5778b50a48ff8e25778f26ad164d4c526da45f85908ae052" http.request.useragent="go-containerregistry/v0.12.1" vars.digest="sha256:9a4281627a6ffb9d5778b50a48ff8e25778f26ad164d4c526da45f85908ae052" vars.name="defenseunicorns/zarf/dev-agent" 
time="2023-01-24T22:00:37.267965678Z" level=error msg="response completed with error" auth.user.name=zarf-push err.code="blob unknown" err.detail=sha256:9a4281627a6ffb9d5778b50a48ff8e25778f26ad164d4c526da45f85908ae052 err.message="blob unknown to registry" go.version=go1.16.15 http.request.host="127.0.0.1:33469" http.request.id=50c3fa3a-3398-4a10-aca2-f6c1db61635e http.request.method=HEAD http.request.remoteaddr="127.0.0.1:51450" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/sha256:9a4281627a6ffb9d5778b50a48ff8e25778f26ad164d4c526da45f85908ae052" http.request.useragent="go-containerregistry/v0.12.1" http.response.contenttype="application/json; charset=utf-8" http.response.duration=55.519982ms http.response.status=404 http.response.written=157 vars.digest="sha256:9a4281627a6ffb9d5778b50a48ff8e25778f26ad164d4c526da45f85908ae052" vars.name="defenseunicorns/zarf/dev-agent" 
127.0.0.1 - - [24/Jan/2023:22:00:37 +0000] "HEAD /v2/defenseunicorns/zarf/dev-agent/blobs/sha256:9a4281627a6ffb9d5778b50a48ff8e25778f26ad164d4c526da45f85908ae052 HTTP/1.1" 404 157 "" "go-containerregistry/v0.12.1"
time="2023-01-24T22:00:37.276177678Z" level=info msg="authorized request" go.version=go1.16.15 http.request.host="127.0.0.1:33469" http.request.id=df8f27f4-f265-47cd-91dd-16ce8660fafb http.request.method=HEAD http.request.remoteaddr="127.0.0.1:51460" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/sha256:fbad7aa519f7da86dde82ab83e3130d8024e9ffebc315cead1fecb230e208340" http.request.useragent="go-containerregistry/v0.12.1" vars.digest="sha256:fbad7aa519f7da86dde82ab83e3130d8024e9ffebc315cead1fecb230e208340" vars.name="defenseunicorns/zarf/dev-agent" 
127.0.0.1 - - [24/Jan/2023:22:00:37 +0000] "HEAD /v2/defenseunicorns/zarf/dev-agent/blobs/sha256:fbad7aa519f7da86dde82ab83e3130d8024e9ffebc315cead1fecb230e208340 HTTP/1.1" 404 157 "" "go-containerregistry/v0.12.1"
time="2023-01-24T22:00:37.276285584Z" level=error msg="response completed with error" auth.user.name=zarf-push err.code="blob unknown" err.detail=sha256:fbad7aa519f7da86dde82ab83e3130d8024e9ffebc315cead1fecb230e208340 err.message="blob unknown to registry" go.version=go1.16.15 http.request.host="127.0.0.1:33469" http.request.id=df8f27f4-f265-47cd-91dd-16ce8660fafb http.request.method=HEAD http.request.remoteaddr="127.0.0.1:51460" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/sha256:fbad7aa519f7da86dde82ab83e3130d8024e9ffebc315cead1fecb230e208340" http.request.useragent="go-containerregistry/v0.12.1" http.response.contenttype="application/json; charset=utf-8" http.response.duration=62.035245ms http.response.status=404 http.response.written=157 vars.digest="sha256:fbad7aa519f7da86dde82ab83e3130d8024e9ffebc315cead1fecb230e208340" vars.name="defenseunicorns/zarf/dev-agent" 
time="2023-01-24T22:00:37.326187629Z" level=info msg="authorized request" go.version=go1.16.15 http.request.contenttype="application/json" http.request.host="127.0.0.1:33469" http.request.id=4ea14d5b-68cc-4dff-b6ce-018ba6945ed7 http.request.method=POST http.request.remoteaddr="127.0.0.1:51474" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/uploads/" http.request.useragent="go-containerregistry/v0.12.1" vars.name="defenseunicorns/zarf/dev-agent" 
time="2023-01-24T22:00:37.32655807Z" level=info msg="authorized request" go.version=go1.16.15 http.request.contenttype="application/json" http.request.host="127.0.0.1:33469" http.request.id=22948833-368c-4397-b6ee-a37cf4da0ac2 http.request.method=POST http.request.remoteaddr="127.0.0.1:51450" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/uploads/" http.request.useragent="go-containerregistry/v0.12.1" vars.name="defenseunicorns/zarf/dev-agent" 
time="2023-01-24T22:00:37.332268616Z" level=info msg="authorized request" go.version=go1.16.15 http.request.contenttype="application/json" http.request.host="127.0.0.1:33469" http.request.id=0a125555-18ef-45de-9d58-f939415c4a49 http.request.method=POST http.request.remoteaddr="127.0.0.1:51460" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/uploads/" http.request.useragent="go-containerregistry/v0.12.1" vars.name="defenseunicorns/zarf/dev-agent" 
time="2023-01-24T22:00:37.353378374Z" level=info msg="response completed" go.version=go1.16.15 http.request.contenttype="application/json" http.request.host="127.0.0.1:33469" http.request.id=22948833-368c-4397-b6ee-a37cf4da0ac2 http.request.method=POST http.request.remoteaddr="127.0.0.1:51450" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/uploads/" http.request.useragent="go-containerregistry/v0.12.1" http.response.duration=84.736628ms http.response.status=202 http.response.written=0 
127.0.0.1 - - [24/Jan/2023:22:00:37 +0000] "POST /v2/defenseunicorns/zarf/dev-agent/blobs/uploads/ HTTP/1.1" 202 0 "" "go-containerregistry/v0.12.1"
127.0.0.1 - - [24/Jan/2023:22:00:37 +0000] "POST /v2/defenseunicorns/zarf/dev-agent/blobs/uploads/ HTTP/1.1" 202 0 "" "go-containerregistry/v0.12.1"
time="2023-01-24T22:00:37.353481112Z" level=info msg="response completed" go.version=go1.16.15 http.request.contenttype="application/json" http.request.host="127.0.0.1:33469" http.request.id=0a125555-18ef-45de-9d58-f939415c4a49 http.request.method=POST http.request.remoteaddr="127.0.0.1:51460" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/uploads/" http.request.useragent="go-containerregistry/v0.12.1" http.response.duration=76.39968ms http.response.status=202 http.response.written=0 
time="2023-01-24T22:00:37.356954764Z" level=info msg="response completed" go.version=go1.16.15 http.request.contenttype="application/json" http.request.host="127.0.0.1:33469" http.request.id=4ea14d5b-68cc-4dff-b6ce-018ba6945ed7 http.request.method=POST http.request.remoteaddr="127.0.0.1:51474" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/uploads/" http.request.useragent="go-containerregistry/v0.12.1" http.response.duration=89.178218ms http.response.status=202 http.response.written=0 
127.0.0.1 - - [24/Jan/2023:22:00:37 +0000] "POST /v2/defenseunicorns/zarf/dev-agent/blobs/uploads/ HTTP/1.1" 202 0 "" "go-containerregistry/v0.12.1"
time="2023-01-24T22:00:37.413000724Z" level=info msg="authorized request" go.version=go1.16.15 http.request.contenttype="application/octet-stream" http.request.host="127.0.0.1:33469" http.request.id=6f104919-ab37-4a28-b517-4339e948ff0b http.request.method=PATCH http.request.remoteaddr="127.0.0.1:51460" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/uploads/12319e4a-169e-4d00-a293-2c927b2b6850?_state=zTvk_4-hPiPgkXyx84aZCfd8rw6dBwD-gK7XKJkadyB7Ik5hbWUiOiJkZWZlbnNldW5pY29ybnMvemFyZi9kZXYtYWdlbnQiLCJVVUlEIjoiMTIzMTllNGEtMTY5ZS00ZDAwLWEyOTMtMmM5MjdiMmI2ODUwIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIzLTAxLTI0VDIyOjAwOjM3LjMyNjU5MTE3NVoifQ%3D%3D" http.request.useragent="go-containerregistry/v0.12.1" vars.name="defenseunicorns/zarf/dev-agent" vars.uuid=12319e4a-169e-4d00-a293-2c927b2b6850 
time="2023-01-24T22:00:37.425766062Z" level=info msg="response completed" go.version=go1.16.15 http.request.contenttype="application/octet-stream" http.request.host="127.0.0.1:33469" http.request.id=6f104919-ab37-4a28-b517-4339e948ff0b http.request.method=PATCH http.request.remoteaddr="127.0.0.1:51460" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/uploads/12319e4a-169e-4d00-a293-2c927b2b6850?_state=zTvk_4-hPiPgkXyx84aZCfd8rw6dBwD-gK7XKJkadyB7Ik5hbWUiOiJkZWZlbnNldW5pY29ybnMvemFyZi9kZXYtYWdlbnQiLCJVVUlEIjoiMTIzMTllNGEtMTY5ZS00ZDAwLWEyOTMtMmM5MjdiMmI2ODUwIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIzLTAxLTI0VDIyOjAwOjM3LjMyNjU5MTE3NVoifQ%3D%3D" http.request.useragent="go-containerregistry/v0.12.1" http.response.duration=71.307514ms http.response.status=202 http.response.written=0 
127.0.0.1 - - [24/Jan/2023:22:00:37 +0000] "PATCH /v2/defenseunicorns/zarf/dev-agent/blobs/uploads/12319e4a-169e-4d00-a293-2c927b2b6850?_state=zTvk_4-hPiPgkXyx84aZCfd8rw6dBwD-gK7XKJkadyB7Ik5hbWUiOiJkZWZlbnNldW5pY29ybnMvemFyZi9kZXYtYWdlbnQiLCJVVUlEIjoiMTIzMTllNGEtMTY5ZS00ZDAwLWEyOTMtMmM5MjdiMmI2ODUwIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIzLTAxLTI0VDIyOjAwOjM3LjMyNjU5MTE3NVoifQ%3D%3D HTTP/1.1" 202 0 "" "go-containerregistry/v0.12.1"
time="2023-01-24T22:00:37.427377033Z" level=info msg="authorized request" go.version=go1.16.15 http.request.contenttype="application/octet-stream" http.request.host="127.0.0.1:33469" http.request.id=3c52741b-2166-41b4-a87f-d6353ab878aa http.request.method=PATCH http.request.remoteaddr="127.0.0.1:51450" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/uploads/7f26765d-5f60-4d7e-b8dd-bc71590974a8?_state=7TJeZVvW6CeHWuMoPktU2pcPE7sBCL5CWMYkSNIaIO57Ik5hbWUiOiJkZWZlbnNldW5pY29ybnMvemFyZi9kZXYtYWdlbnQiLCJVVUlEIjoiN2YyNjc2NWQtNWY2MC00ZDdlLWI4ZGQtYmM3MTU5MDk3NGE4IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIzLTAxLTI0VDIyOjAwOjM3LjMzMjI5NDY2N1oifQ%3D%3D" http.request.useragent="go-containerregistry/v0.12.1" vars.name="defenseunicorns/zarf/dev-agent" vars.uuid=7f26765d-5f60-4d7e-b8dd-bc71590974a8 
time="2023-01-24T22:00:37.430973258Z" level=info msg="authorized request" go.version=go1.16.15 http.request.contenttype="application/octet-stream" http.request.host="127.0.0.1:33469" http.request.id=5345f7a8-1088-4322-a3c9-b0b264d4424c http.request.method=PATCH http.request.remoteaddr="127.0.0.1:51474" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/uploads/d166ea9d-22d5-45c1-bd51-7e4b374781ce?_state=TJiARm0BfgW5voExWr80mwkR9oIgf6Wtw3QfZQCWMql7Ik5hbWUiOiJkZWZlbnNldW5pY29ybnMvemFyZi9kZXYtYWdlbnQiLCJVVUlEIjoiZDE2NmVhOWQtMjJkNS00NWMxLWJkNTEtN2U0YjM3NDc4MWNlIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIzLTAxLTI0VDIyOjAwOjM3LjMyNjI0Mjk0NFoifQ%3D%3D" http.request.useragent="go-containerregistry/v0.12.1" vars.name="defenseunicorns/zarf/dev-agent" vars.uuid=d166ea9d-22d5-45c1-bd51-7e4b374781ce 
time="2023-01-24T22:00:37.438797287Z" level=info msg="response completed" go.version=go1.16.15 http.request.contenttype="application/octet-stream" http.request.host="127.0.0.1:33469" http.request.id=3c52741b-2166-41b4-a87f-d6353ab878aa http.request.method=PATCH http.request.remoteaddr="127.0.0.1:51450" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/uploads/7f26765d-5f60-4d7e-b8dd-bc71590974a8?_state=7TJeZVvW6CeHWuMoPktU2pcPE7sBCL5CWMYkSNIaIO57Ik5hbWUiOiJkZWZlbnNldW5pY29ybnMvemFyZi9kZXYtYWdlbnQiLCJVVUlEIjoiN2YyNjc2NWQtNWY2MC00ZDdlLWI4ZGQtYmM3MTU5MDk3NGE4IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIzLTAxLTI0VDIyOjAwOjM3LjMzMjI5NDY2N1oifQ%3D%3D" http.request.useragent="go-containerregistry/v0.12.1" http.response.duration=82.975148ms http.response.status=202 http.response.written=0 
127.0.0.1 - - [24/Jan/2023:22:00:37 +0000] "PATCH /v2/defenseunicorns/zarf/dev-agent/blobs/uploads/7f26765d-5f60-4d7e-b8dd-bc71590974a8?_state=7TJeZVvW6CeHWuMoPktU2pcPE7sBCL5CWMYkSNIaIO57Ik5hbWUiOiJkZWZlbnNldW5pY29ybnMvemFyZi9kZXYtYWdlbnQiLCJVVUlEIjoiN2YyNjc2NWQtNWY2MC00ZDdlLWI4ZGQtYmM3MTU5MDk3NGE4IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIzLTAxLTI0VDIyOjAwOjM3LjMzMjI5NDY2N1oifQ%3D%3D HTTP/1.1" 202 0 "" "go-containerregistry/v0.12.1"
time="2023-01-24T22:00:37.539174431Z" level=info msg="response completed" go.version=go1.16.15 http.request.contenttype="application/octet-stream" http.request.host="127.0.0.1:33469" http.request.id=5345f7a8-1088-4322-a3c9-b0b264d4424c http.request.method=PATCH http.request.remoteaddr="127.0.0.1:51474" http.request.uri="/v2/defenseunicorns/zarf/dev-agent/blobs/uploads/d166ea9d-22d5-45c1-bd51-7e4b374781ce?_state=TJiARm0BfgW5voExWr80mwkR9oIgf6Wtw3QfZQCWMql7Ik5hbWUiOiJkZWZlbnNldW5pY29ybnMvemFyZi9kZXYtYWdlbnQiLCJVVUlEIjoiZDE2NmVhOWQtMjJkNS00NWMxLWJkNTEtN2U0YjM3NDc4MWNlIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIzLTAxLTI0VDIyOjAwOjM3LjMyNjI0Mjk0NFoifQ%3D%3D" http.request.useragent="go-containerregistry/v0.12.1" http.response.duration=173.209445ms http.response.status=202 http.response.written=0 
127.0.0.1 - - [24/Jan/2023:22:00:37 +0000] "PATCH /v2/defenseunicorns/zarf/dev-agent/blobs/uploads/d166ea9d-22d5-45c1-bd51-7e4b374781ce?_state=TJiARm0BfgW5voExWr80mwkR9oIgf6Wtw3QfZQCWMql7Ik5hbWUiOiJkZWZlbnNldW5pY29ybnMvemFyZi9kZXYtYWdlbnQiLCJVVUlEIjoiZDE2NmVhOWQtMjJkNS00NWMxLWJkNTEtN2U0YjM3NDc4MWNlIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIzLTAxLTI0VDIyOjAwOjM3LjMyNjI0Mjk0NFoifQ%3D%3D HTTP/1.1" 202 0 "" "go-containerregistry/v0.12.1"
10.42.3.1 - - [24/Jan/2023:22:00:42 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:00:42 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:00:52 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:00:52 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:01:02 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:01:02 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:01:12 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:01:12 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:01:22 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:01:22 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:01:32 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:01:32 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:01:42 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:01:42 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:01:52 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:01:52 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:02:02 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"
10.42.3.1 - - [24/Jan/2023:22:02:

zarf.log
02 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.24"

@corang
Copy link
Contributor Author

corang commented Jan 24, 2023

events in zarf namespace during failure:

LAST SEEN   TYPE      REASON                         OBJECT                                         MESSAGE
11m         Normal    Scheduled                      pod/injector                                   Successfully assigned zarf/injector to k3d-di2me-agent-2
11m         Normal    Pulled                         pod/injector                                   Container image "rancher/mirrored-coredns-coredns:1.9.1" already present on machine
11m         Normal    Created                        pod/injector                                   Created container injector
11m         Normal    Started                        pod/injector                                   Started container injector
11m         Normal    NoPods                         poddisruptionbudget/zarf-docker-registry       No matching pods found
11m         Normal    WaitForFirstConsumer           persistentvolumeclaim/zarf-docker-registry     waiting for first consumer to be created before binding
11m         Normal    ScalingReplicaSet              deployment/zarf-docker-registry                Scaled up replica set zarf-docker-registry-6fd8f65cb to 1
11m         Normal    SuccessfulCreate               replicaset/zarf-docker-registry-6fd8f65cb      Created pod: zarf-docker-registry-6fd8f65cb-wpqw7
11m         Normal    ExternalProvisioning           persistentvolumeclaim/zarf-docker-registry     waiting for a volume to be created, either by external provisioner "rancher.io/local-path" or manually created by system administrator
11m         Normal    Provisioning                   persistentvolumeclaim/zarf-docker-registry     External provisioner is provisioning volume for claim "zarf/zarf-docker-registry"
11m         Normal    ProvisioningSucceeded          persistentvolumeclaim/zarf-docker-registry     Successfully provisioned volume pvc-bcaa07f8-09d2-4161-a8dd-a8e52f212f6f
10m         Normal    Scheduled                      pod/zarf-docker-registry-6fd8f65cb-wpqw7       Successfully assigned zarf/zarf-docker-registry-6fd8f65cb-wpqw7 to k3d-di2me-agent-1
10m         Normal    Pulling                        pod/zarf-docker-registry-6fd8f65cb-wpqw7       Pulling image "127.0.0.1:32108/library/registry:2.8.1"
10m         Normal    Pulled                         pod/zarf-docker-registry-6fd8f65cb-wpqw7       Successfully pulled image "127.0.0.1:32108/library/registry:2.8.1" in 188.526384ms
10m         Normal    Created                        pod/zarf-docker-registry-6fd8f65cb-wpqw7       Created container docker-registry
10m         Normal    Started                        pod/zarf-docker-registry-6fd8f65cb-wpqw7       Started container docker-registry
10m         Normal    Killing                        pod/injector                                   Stopping container injector
10m         Normal    ScalingReplicaSet              deployment/zarf-docker-registry                Scaled up replica set zarf-docker-registry-c485bf8f7 to 1
10m         Normal    SuccessfulCreate               replicaset/zarf-docker-registry-c485bf8f7      Created pod: zarf-docker-registry-c485bf8f7-ml8j6
10m         Normal    Scheduled                      pod/zarf-docker-registry-c485bf8f7-ml8j6       Successfully assigned zarf/zarf-docker-registry-c485bf8f7-ml8j6 to k3d-di2me-agent-1
10m         Normal    Pulling                        pod/zarf-docker-registry-c485bf8f7-ml8j6       Pulling image "127.0.0.1:31999/library/registry:2.8.1"
10m         Normal    Pulled                         pod/zarf-docker-registry-c485bf8f7-ml8j6       Successfully pulled image "127.0.0.1:31999/library/registry:2.8.1" in 116.219991ms
10m         Normal    Created                        pod/zarf-docker-registry-c485bf8f7-ml8j6       Created container docker-registry
10m         Normal    Started                        pod/zarf-docker-registry-c485bf8f7-ml8j6       Started container docker-registry
10m         Normal    ScalingReplicaSet              deployment/zarf-docker-registry                Scaled down replica set zarf-docker-registry-6fd8f65cb to 0
10m         Normal    SuccessfulDelete               replicaset/zarf-docker-registry-6fd8f65cb      Deleted pod: zarf-docker-registry-6fd8f65cb-wpqw7
10m         Normal    Killing                        pod/zarf-docker-registry-6fd8f65cb-wpqw7       Stopping container docker-registry
10m         Warning   FailedGetResourceMetric        horizontalpodautoscaler/zarf-docker-registry   failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
10m         Warning   FailedComputeMetricsReplicas   horizontalpodautoscaler/zarf-docker-registry   invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
10m         Warning   FailedGetResourceMetric        horizontalpodautoscaler/zarf-docker-registry   failed to get cpu utilization: did not receive metrics for any ready pods
10m         Warning   FailedComputeMetricsReplicas   horizontalpodautoscaler/zarf-docker-registry   invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: did not receive metrics for any ready pods
34s         Normal    ScalingReplicaSet              deployment/agent-hook                          Scaled up replica set agent-hook-b45ffcf55 to 2
34s         Normal    SuccessfulCreate               replicaset/agent-hook-b45ffcf55                Created pod: agent-hook-b45ffcf55-6pgbv
34s         Normal    Scheduled                      pod/agent-hook-b45ffcf55-6pgbv                 Successfully assigned zarf/agent-hook-b45ffcf55-6pgbv to k3d-di2me-agent-1
34s         Normal    Scheduled                      pod/agent-hook-b45ffcf55-vnbc7                 Successfully assigned zarf/agent-hook-b45ffcf55-vnbc7 to k3d-di2me-server-0
34s         Normal    SuccessfulCreate               replicaset/agent-hook-b45ffcf55                Created pod: agent-hook-b45ffcf55-vnbc7
34s         Normal    Pulling                        pod/agent-hook-b45ffcf55-6pgbv                 Pulling image "127.0.0.1:31999/defenseunicorns/zarf/dev-agent:e32f41ab50f994302614adf62ab6f13a7ecfbb25"
34s         Normal    Pulling                        pod/agent-hook-b45ffcf55-vnbc7                 Pulling image "127.0.0.1:31999/defenseunicorns/zarf/dev-agent:e32f41ab50f994302614adf62ab6f13a7ecfbb25"
33s         Normal    Pulled                         pod/agent-hook-b45ffcf55-6pgbv                 Successfully pulled image "127.0.0.1:31999/defenseunicorns/zarf/dev-agent:e32f41ab50f994302614adf62ab6f13a7ecfbb25" in 746.466542ms
33s         Normal    Pulled                         pod/agent-hook-b45ffcf55-vnbc7                 Successfully pulled image "127.0.0.1:31999/defenseunicorns/zarf/dev-agent:e32f41ab50f994302614adf62ab6f13a7ecfbb25" in 754.266892ms
33s         Normal    Created                        pod/agent-hook-b45ffcf55-6pgbv                 Created container server
33s         Normal    Created                        pod/agent-hook-b45ffcf55-vnbc7                 Created container server
33s         Normal    Started                        pod/agent-hook-b45ffcf55-vnbc7                 Started container server
33s         Normal    Started                        pod/agent-hook-b45ffcf55-6pgbv                 Started container server

@corang
Copy link
Contributor Author

corang commented Jan 24, 2023

k3d.agent.log
Log from the node that the registry is running on

@corang
Copy link
Contributor Author

corang commented Jan 24, 2023

Zarf init managed to finish with some retries. Waiting on a different deployment with more/larger images to fail now. Cycle seems to be as follows

  1. Zarf pushes image
  2. Some layers get pushed
  3. A push of a layer fails ALWAYS a PATCH
  4. Registry sits idle and zarf thinks its pushing the image until it hits a 600s timeout
  5. Zarf retries image push and gets through more layers
  6. rinse and repeat depending on how many images/layers need to be pushed
  7. If zarf fails 3 times on the same image then it errors out

@corang
Copy link
Contributor Author

corang commented Jan 25, 2023

zarf-2023-01-24-16-13-38-334878943.log
Log for package that actually failed

@corang
Copy link
Contributor Author

corang commented Jan 25, 2023

As a side note while zarf is failing to push an image, even if you delete the registry pod it doesn't realize that there is no longer a connection

@corang
Copy link
Contributor Author

corang commented Jan 25, 2023

Restarting only works for first cluster stood up after restart, if that cluster is torn down and another is started the issue resurfaces. My current workaround is to run sync && echo 3 | sudo tee /proc/sys/vm/drop_caches between tearing down and starting a new cluster.

This is not a reliable fix. Still can happen when running this command.

@corang
Copy link
Contributor Author

corang commented Jan 25, 2023

And it's also still happening directly after a restart

@corang
Copy link
Contributor Author

corang commented Jan 25, 2023

The lines that crane hangs on are either here or here

@corang
Copy link
Contributor Author

corang commented Jan 31, 2023

Happened again but on different hardware/software:
64 cores epyc
128 GB memory
ZFS storage
VM with host cpu passthrough
bare k3s v1.24.10+k3s1
zarf version v0.24.0-rc3
DI2ME package from this branch https://github.com/defenseunicorns/zarf-package-software-factory/tree/kyverno
Zarf log: zarf-2023-01-31-20-49-21-152043956.log

@dgershman
Copy link
Contributor

I have the same issue. Intermittent hanging on pushes to registry.

@dgershman
Copy link
Contributor

I'm using 0.25.0 and facing this issue.

Racer159 pushed a commit that referenced this issue Apr 5, 2023
## Description

This enables some additional logging information for pushes to
registries.

## Related Issue

Relates to #1218 

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)

## Checklist before merging

- [ ] Test, docs, adr added or updated as needed
- [x] [Contributor Guide
Steps](https://github.com/defenseunicorns/zarf/blob/main/CONTRIBUTING.md#developer-workflow)
followed
@Racer159 Racer159 self-assigned this Apr 18, 2023
@Racer159 Racer159 added this to the v0.26.1 milestone Apr 18, 2023
Racer159 added a commit that referenced this issue Apr 18, 2023
…ushes (#1590)

## Description

This PR creates a tunnel per image push (making it easier to implement
concurrency - may do that in this PR if we can confirm that issues are
mitigated) moves the CRC from the image name to the tag and changes the
UI to use a progressbar instead of a spinner for better user feedback.

## Related Issue

Relates to #1568 , #1433, #1218, #1364

This also will make #1594 slightly easier.

(See aws/containers-roadmap#853)

Fixes: #1541

## Type of change

- [X] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)

## Checklist before merging

- [X] Test, docs, adr added or updated as needed
- [X] [Contributor Guide
Steps](https://github.com/defenseunicorns/zarf/blob/main/CONTRIBUTING.md#developer-workflow)
followed
@Racer159 Racer159 modified the milestones: v0.26 (m2), v0.26 (m3) Apr 20, 2023
@Racer159
Copy link
Contributor

@corang have you gotten a chance to try v0.26.0 and have you seen any intermittent issues with that?

@corang
Copy link
Contributor Author

corang commented Apr 25, 2023

I have not, but I also haven't had any issues with image pushes since I stopped using k3s/rke2 for things and now use kind for everything

@Racer159
Copy link
Contributor

Racer159 commented May 2, 2023

Going to close in favor of troubleshooting this further in #1568

@Racer159 Racer159 closed this as completed May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants