git push timeouts results in excessive flux-write-sync tags #3075

coopernetes · 2020-05-24T17:33:54Z

Describe the bug
Original thread on Slack: https://kubernetes.slack.com/archives/CBT6N1ASG/p1589473771368100

When flux is attempting to push the write-sync tag during sync-loop, if the upstream git server doesn't respond in a reasonable time, a new tag is created due to the pseudo-randomization introduced in #2684 and new attempts to push the sync tag result in an additional tag per sync interval. In large environments with multiple clusters and Flux deployments watching the same repo, this can result in an explosion of randomized git tags. In my situation, the upstream repo received 8000 flux-write-sync-<randombits> tags!

To Reproduce
This is a bit tricky to reproduce since it was originally an issue with the upstream git repo (GitHub Enterprise). A fairly standard installation was used and the repo used for synchronizing Kubernetes manifests was Kustomize based with only a handful of resources (NetworkPolicy, about a dozen). Included flux deployment args below:

Args:
      --memcached-hostname=memcached-<suffix>
      --ssh-keygen-dir=/var/fluxd/keygen
      [email protected]/org/flux-repo-network-policy
      --git-branch=master
      --git-path=kustomize/...
      --git-label=flux
      --git-user=flux-user
      [email protected]
      --sync-garbage-collection
      --manifest-generation=false
      --listen-metrics=:3031
      --registry-exclude-image=*
      --k8s-secret-name=flux-github-ssh-key-<suffix>-2cbgbggf67

Expected behavior

Not really sure how to change this behaviour except to always bail out on error & delete the created tag.

Logs

ts=2020-05-13T21:31:41.327147223Z caller=loop.go:133 component=sync-loop event=refreshed url=ssh://[email protected]/org/flux-repo branch=master HEAD=4e390fd8ff13f7695f586f2875b690ed1072b6ef
--
ts=2020-05-13T21:31:57.880366208Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T21:31:57.952299504Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T21:36:57.880525147Z caller=loop.go:107 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository., full output:\n Cloning into bare repository '/tmp/flux-gitclone838059889'...\nssh: Could not resolve hostname rbcgithub.fg.rbc.com: Try again\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n"
ts=2020-05-13T21:36:57.952652565Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T21:36:57.952717369Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T21:41:57.880946143Z caller=loop.go:107 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository., full output:\n Cloning into bare repository '/tmp/flux-gitclone109254341'...\nssh: Could not resolve hostname rbcgithub.fg.rbc.com: Try again\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n"
ts=2020-05-13T21:41:57.953294779Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T21:41:57.953115467Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T21:47:05.036321887Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-4dc7e6f133]: context deadline exceeded"
ts=2020-05-13T21:47:05.036377091Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T21:47:05.03636719Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T21:52:05.036860787Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T21:52:05.036710577Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-a1458a05d2]: context deadline exceeded"
ts=2020-05-13T21:52:05.03690049Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T21:54:55.43926669Z caller=main.go:796 exiting=terminated
ts=2020-05-13T21:54:55.439343696Z caller=warming.go:127 component=warmer stopping=true
ts=2020-05-13T21:54:55.439336796Z caller=loop.go:76 component=sync-loop stopping=true
ts=2020-05-13T21:55:03.11210037Z caller=main.go:396 msg="using kube config: \"/root/.kube/config\" to connect to the cluster"
ts=2020-05-13T21:55:03.112047666Z caller=main.go:256 version=1.18.0
ts=2020-05-13T21:55:03.152914199Z caller=main.go:481 component=cluster identity=/etc/fluxd/ssh/identity
ts=2020-05-13T21:55:03.153039209Z caller=main.go:499 kubectl=/usr/local/bin/kubectl
ts=2020-05-13T21:55:03.152995305Z caller=main.go:487 host=https://172.18.0.1:443 version=kubernetes-v1.15.10
ts=2020-05-13T21:55:03.152965403Z caller=main.go:482 component=cluster identity.pub="ssh-rsa <redacted>"
ts=2020-05-13T21:55:03.154067485Z caller=main.go:514 ping=true
ts=2020-05-13T21:55:03.157391132Z caller=main.go:759 upstream="no upstream URL given"
ts=2020-05-13T21:55:03.157546743Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T21:55:03.157601847Z caller=loop.go:107 component=sync-loop err="git repo not ready: git repo has not been cloned yet"
ts=2020-05-13T21:55:03.157572245Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T21:55:03.157481639Z caller=main.go:790 metrics-addr=:3031
ts=2020-05-13T21:55:03.157345728Z caller=main.go:653 url=ssh://[email protected]/org/flux-repo user=flux-user [email protected] signing-key= verify-signatures=false sync-tag=flux state=git readonly=false registry-disable-scanning=false notes-ref=flux set-author=false git-secret=false sops=false
ts=2020-05-13T21:55:03.157953974Z caller=main.go:782 addr=:3030
ts=2020-05-13T22:00:03.157976153Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:00:03.157918949Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:00:03.157829043Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-464b6ef6a0]: context deadline exceeded"
ts=2020-05-13T22:05:03.158505593Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:05:03.158411787Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-adbf38c5f9]: context deadline exceeded"
ts=2020-05-13T22:05:03.158527495Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:10:03.158920893Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:10:03.158942395Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:10:03.158814886Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-124072bd8e]: context deadline exceeded"
ts=2020-05-13T22:15:03.15917571Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:15:03.159189511Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:15:03.159105505Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-124072bd8e]: context deadline exceeded"
ts=2020-05-13T22:20:03.159401855Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-124072bd8e]: context deadline exceeded"
ts=2020-05-13T22:20:03.159449958Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:20:03.159459159Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:25:17.158941074Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:25:17.158958376Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:25:17.158856969Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-66655f708b]: context deadline exceeded"
ts=2020-05-13T22:30:17.159093642Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-22efec1a4c]: context deadline exceeded"
ts=2020-05-13T22:30:17.159163146Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:30:17.159150845Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:35:17.15961281Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:35:17.159603409Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:35:17.159554906Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-4dae0039f9]: context deadline exceeded"
ts=2020-05-13T22:40:17.159935098Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:40:17.159855493Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-48f984b7c7]: context deadline exceeded"
ts=2020-05-13T22:40:17.159920497Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:45:17.160347788Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:45:17.160410392Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-1144038ee5]: context deadline exceeded"
ts=2020-05-13T22:45:17.160266183Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:50:17.160712872Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:50:17.160633767Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:50:17.160747574Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-eef76b879c]: context deadline exceeded"
ts=2020-05-13T22:55:17.160966605Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:55:17.16103821Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:55:17.161084913Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-c867df598d]: context deadline exceeded"
ts=2020-05-13T23:00:17.161421618Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-c867df598d]: context deadline exceeded"
ts=2020-05-13T23:00:17.161381016Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T23:00:17.161314111Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T23:05:17.161735941Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T23:05:17.161812546Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T23:05:17.161856349Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-0c9785064d]: context deadline exceeded"
ts=2020-05-13T23:10:17.162260423Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T23:10:17.162163017Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T23:10:17.162311127Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-d9c8e4afab]: context deadline exceeded"
ts=2020-05-13T23:15:17.162544196Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T23:15:17.162481092Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T23:15:17.162888119Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-d9c8e4afab]: context deadline exceeded"
ts=2020-05-13T23:20:17.162936773Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T23:20:17.163213291Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-ba25919858]: context deadline exceeded"

Additional context

Flux version: 1.18
Kubernetes version: 1.15.10
Git provider: GitHub Enterprise
Container registry provider: Azure Container Registry

The text was updated successfully, but these errors were encountered:

maximmold · 2020-06-26T04:23:17Z

When I first filed #2683, I think we talked about just making the tag have the branch name as the suffix rather than something pseudorandom... not sure if that would be a viable option or not.

maximmold · 2020-06-26T12:20:27Z

One other option I'm curious about is whether we can enable an option to assume repository write access if read access is available and if there are any exceptions to just defer them to when you are pushing the HelmReleases or whatever. We don't have to make this the default, but I have gotten some complaints about how often we are pushing this flux-write-check tag.

Marx2 · 2020-08-13T11:43:06Z

I'm coming here with the same problem. My GIT is holding number of those tags. Is it safe to delete them?

zepptron · 2021-02-03T09:47:18Z

@Marx2 have you tried deleting them? I'm currently in the same situation, got 32k tags but I'm not sure if this would cause issues

Marx2 · 2021-02-03T09:55:00Z

Yes, I've deleted them. After Flux upgrade and some configuration changes I don't remember now, it has not appear anymore

kingdonb · 2021-02-04T15:12:06Z

That is an excessive number of tags and sounds like something important to fix, since it must be quite a pain to delete them all. The flux-write-sync tags should be safe to delete. If I add a long delay to my git server, I should be able to reproduce?

If I understand what is happening correctly, the timeout happens before the write check succeeds, and the write check tag is not cleaned up since there is no indication that creating it was successful (but the git server eventually catches up, creating the tag, and this repeats creating those tags over and over until one write check is able to succeed within the timeout period.)

dewe · 2021-02-04T15:37:05Z

We got less of those tags when increasing the timeout.

Deleting the tags is safe. Run this in the repo:

#!/usr/bin/env bash
while read tag; do
  git push --delete origin $tag
  git tag --delete $tag
done < <(git tag --list 'flux-write-check-*')

kingdonb · 2021-02-18T23:18:52Z

This can be reopened if there is a non-breaking change suggested that we can implement to make this kind of catastrophic overtagging less likely. A simple process was provided to help delete the extraneous tags without RSI, barring any contributor has a suggestion for how to make it better, I'm going to close this issue for now. Thanks for responding, thanks for using Flux!

coopernetes added blocked-needs-validation Issue is waiting to be validated before we can proceed bug labels May 24, 2020

kingdonb self-assigned this Feb 5, 2021

kingdonb mentioned this issue Feb 12, 2021

flux-write-check pipeline randomly fails #3424

Closed

kingdonb closed this as completed Feb 18, 2021

kingdonb mentioned this issue Mar 11, 2021

Multiple tags flux-write-check #3444

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

git push timeouts results in excessive flux-write-sync tags #3075

git push timeouts results in excessive flux-write-sync tags #3075

coopernetes commented May 24, 2020

maximmold commented Jun 26, 2020

maximmold commented Jun 26, 2020

Marx2 commented Aug 13, 2020

zepptron commented Feb 3, 2021

Marx2 commented Feb 3, 2021

kingdonb commented Feb 4, 2021

dewe commented Feb 4, 2021

kingdonb commented Feb 18, 2021

git push timeouts results in excessive flux-write-sync tags #3075

git push timeouts results in excessive flux-write-sync tags #3075

Comments

coopernetes commented May 24, 2020

maximmold commented Jun 26, 2020

maximmold commented Jun 26, 2020

Marx2 commented Aug 13, 2020

zepptron commented Feb 3, 2021

Marx2 commented Feb 3, 2021

kingdonb commented Feb 4, 2021

dewe commented Feb 4, 2021

kingdonb commented Feb 18, 2021