Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

git push timeouts results in excessive flux-write-sync tags #3075

Closed
coopernetes opened this issue May 24, 2020 · 8 comments
Closed

git push timeouts results in excessive flux-write-sync tags #3075

coopernetes opened this issue May 24, 2020 · 8 comments
Assignees
Labels
blocked-needs-validation Issue is waiting to be validated before we can proceed bug

Comments

@coopernetes
Copy link

Describe the bug
Original thread on Slack: https://kubernetes.slack.com/archives/CBT6N1ASG/p1589473771368100

When flux is attempting to push the write-sync tag during sync-loop, if the upstream git server doesn't respond in a reasonable time, a new tag is created due to the pseudo-randomization introduced in #2684 and new attempts to push the sync tag result in an additional tag per sync interval. In large environments with multiple clusters and Flux deployments watching the same repo, this can result in an explosion of randomized git tags. In my situation, the upstream repo received 8000 flux-write-sync-<randombits> tags!

To Reproduce
This is a bit tricky to reproduce since it was originally an issue with the upstream git repo (GitHub Enterprise). A fairly standard installation was used and the repo used for synchronizing Kubernetes manifests was Kustomize based with only a handful of resources (NetworkPolicy, about a dozen). Included flux deployment args below:

Args:
      --memcached-hostname=memcached-<suffix>
      --ssh-keygen-dir=/var/fluxd/keygen
      [email protected]/org/flux-repo-network-policy
      --git-branch=master
      --git-path=kustomize/...
      --git-label=flux
      --git-user=flux-user
      [email protected]
      --sync-garbage-collection
      --manifest-generation=false
      --listen-metrics=:3031
      --registry-exclude-image=*
      --k8s-secret-name=flux-github-ssh-key-<suffix>-2cbgbggf67

Expected behavior

Not really sure how to change this behaviour except to always bail out on error & delete the created tag.

Logs

ts=2020-05-13T21:31:41.327147223Z caller=loop.go:133 component=sync-loop event=refreshed url=ssh://[email protected]/org/flux-repo branch=master HEAD=4e390fd8ff13f7695f586f2875b690ed1072b6ef
--
ts=2020-05-13T21:31:57.880366208Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T21:31:57.952299504Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T21:36:57.880525147Z caller=loop.go:107 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository., full output:\n Cloning into bare repository '/tmp/flux-gitclone838059889'...\nssh: Could not resolve hostname rbcgithub.fg.rbc.com: Try again\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n"
ts=2020-05-13T21:36:57.952652565Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T21:36:57.952717369Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T21:41:57.880946143Z caller=loop.go:107 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository., full output:\n Cloning into bare repository '/tmp/flux-gitclone109254341'...\nssh: Could not resolve hostname rbcgithub.fg.rbc.com: Try again\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n"
ts=2020-05-13T21:41:57.953294779Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T21:41:57.953115467Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T21:47:05.036321887Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-4dc7e6f133]: context deadline exceeded"
ts=2020-05-13T21:47:05.036377091Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T21:47:05.03636719Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T21:52:05.036860787Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T21:52:05.036710577Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-a1458a05d2]: context deadline exceeded"
ts=2020-05-13T21:52:05.03690049Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T21:54:55.43926669Z caller=main.go:796 exiting=terminated
ts=2020-05-13T21:54:55.439343696Z caller=warming.go:127 component=warmer stopping=true
ts=2020-05-13T21:54:55.439336796Z caller=loop.go:76 component=sync-loop stopping=true
ts=2020-05-13T21:55:03.11210037Z caller=main.go:396 msg="using kube config: \"/root/.kube/config\" to connect to the cluster"
ts=2020-05-13T21:55:03.112047666Z caller=main.go:256 version=1.18.0
ts=2020-05-13T21:55:03.152914199Z caller=main.go:481 component=cluster identity=/etc/fluxd/ssh/identity
ts=2020-05-13T21:55:03.153039209Z caller=main.go:499 kubectl=/usr/local/bin/kubectl
ts=2020-05-13T21:55:03.152995305Z caller=main.go:487 host=https://172.18.0.1:443 version=kubernetes-v1.15.10
ts=2020-05-13T21:55:03.152965403Z caller=main.go:482 component=cluster identity.pub="ssh-rsa <redacted>"
ts=2020-05-13T21:55:03.154067485Z caller=main.go:514 ping=true
ts=2020-05-13T21:55:03.157391132Z caller=main.go:759 upstream="no upstream URL given"
ts=2020-05-13T21:55:03.157546743Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T21:55:03.157601847Z caller=loop.go:107 component=sync-loop err="git repo not ready: git repo has not been cloned yet"
ts=2020-05-13T21:55:03.157572245Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T21:55:03.157481639Z caller=main.go:790 metrics-addr=:3031
ts=2020-05-13T21:55:03.157345728Z caller=main.go:653 url=ssh://[email protected]/org/flux-repo user=flux-user [email protected] signing-key= verify-signatures=false sync-tag=flux state=git readonly=false registry-disable-scanning=false notes-ref=flux set-author=false git-secret=false sops=false
ts=2020-05-13T21:55:03.157953974Z caller=main.go:782 addr=:3030
ts=2020-05-13T22:00:03.157976153Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:00:03.157918949Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:00:03.157829043Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-464b6ef6a0]: context deadline exceeded"
ts=2020-05-13T22:05:03.158505593Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:05:03.158411787Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-adbf38c5f9]: context deadline exceeded"
ts=2020-05-13T22:05:03.158527495Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:10:03.158920893Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:10:03.158942395Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:10:03.158814886Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-124072bd8e]: context deadline exceeded"
ts=2020-05-13T22:15:03.15917571Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:15:03.159189511Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:15:03.159105505Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-124072bd8e]: context deadline exceeded"
ts=2020-05-13T22:20:03.159401855Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-124072bd8e]: context deadline exceeded"
ts=2020-05-13T22:20:03.159449958Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:20:03.159459159Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:25:17.158941074Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:25:17.158958376Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:25:17.158856969Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-66655f708b]: context deadline exceeded"
ts=2020-05-13T22:30:17.159093642Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-22efec1a4c]: context deadline exceeded"
ts=2020-05-13T22:30:17.159163146Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:30:17.159150845Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:35:17.15961281Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:35:17.159603409Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:35:17.159554906Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-4dae0039f9]: context deadline exceeded"
ts=2020-05-13T22:40:17.159935098Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:40:17.159855493Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-48f984b7c7]: context deadline exceeded"
ts=2020-05-13T22:40:17.159920497Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:45:17.160347788Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:45:17.160410392Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-1144038ee5]: context deadline exceeded"
ts=2020-05-13T22:45:17.160266183Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:50:17.160712872Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:50:17.160633767Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:50:17.160747574Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-eef76b879c]: context deadline exceeded"
ts=2020-05-13T22:55:17.160966605Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T22:55:17.16103821Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T22:55:17.161084913Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-c867df598d]: context deadline exceeded"
ts=2020-05-13T23:00:17.161421618Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-c867df598d]: context deadline exceeded"
ts=2020-05-13T23:00:17.161381016Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T23:00:17.161314111Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T23:05:17.161735941Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T23:05:17.161812546Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T23:05:17.161856349Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-0c9785064d]: context deadline exceeded"
ts=2020-05-13T23:10:17.162260423Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T23:10:17.162163017Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T23:10:17.162311127Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-d9c8e4afab]: context deadline exceeded"
ts=2020-05-13T23:15:17.162544196Z caller=images.go:27 component=sync-loop msg="no automated workloads"
ts=2020-05-13T23:15:17.162481092Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T23:15:17.162888119Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-d9c8e4afab]: context deadline exceeded"
ts=2020-05-13T23:20:17.162936773Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-05-13T23:20:17.163213291Z caller=loop.go:107 component=sync-loop err="git repo not ready: attempt to push tag: running git command: git [push [email protected]/org/flux-repo tag flux-write-check-ba25919858]: context deadline exceeded"

Additional context

  • Flux version: 1.18
  • Kubernetes version: 1.15.10
  • Git provider: GitHub Enterprise
  • Container registry provider: Azure Container Registry
@coopernetes coopernetes added blocked-needs-validation Issue is waiting to be validated before we can proceed bug labels May 24, 2020
@maximmold
Copy link

When I first filed #2683, I think we talked about just making the tag have the branch name as the suffix rather than something pseudorandom... not sure if that would be a viable option or not.

@maximmold
Copy link

One other option I'm curious about is whether we can enable an option to assume repository write access if read access is available and if there are any exceptions to just defer them to when you are pushing the HelmReleases or whatever. We don't have to make this the default, but I have gotten some complaints about how often we are pushing this flux-write-check tag.

@Marx2
Copy link

Marx2 commented Aug 13, 2020

I'm coming here with the same problem. My GIT is holding number of those tags. Is it safe to delete them?

@zepptron
Copy link

zepptron commented Feb 3, 2021

@Marx2 have you tried deleting them? I'm currently in the same situation, got 32k tags but I'm not sure if this would cause issues

@Marx2
Copy link

Marx2 commented Feb 3, 2021

Yes, I've deleted them. After Flux upgrade and some configuration changes I don't remember now, it has not appear anymore

@kingdonb
Copy link
Member

kingdonb commented Feb 4, 2021

That is an excessive number of tags and sounds like something important to fix, since it must be quite a pain to delete them all. The flux-write-sync tags should be safe to delete. If I add a long delay to my git server, I should be able to reproduce?

If I understand what is happening correctly, the timeout happens before the write check succeeds, and the write check tag is not cleaned up since there is no indication that creating it was successful (but the git server eventually catches up, creating the tag, and this repeats creating those tags over and over until one write check is able to succeed within the timeout period.)

@dewe
Copy link
Contributor

dewe commented Feb 4, 2021

We got less of those tags when increasing the timeout.

Deleting the tags is safe. Run this in the repo:

#!/usr/bin/env bash
while read tag; do
  git push --delete origin $tag
  git tag --delete $tag
done < <(git tag --list 'flux-write-check-*')

@kingdonb
Copy link
Member

This can be reopened if there is a non-breaking change suggested that we can implement to make this kind of catastrophic overtagging less likely. A simple process was provided to help delete the extraneous tags without RSI, barring any contributor has a suggestion for how to make it better, I'm going to close this issue for now. Thanks for responding, thanks for using Flux!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
blocked-needs-validation Issue is waiting to be validated before we can proceed bug
Projects
None yet
Development

No branches or pull requests

6 participants