Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

--sync-timeout is capped at 60s #3049

Closed
awwithro opened this issue May 8, 2020 · 7 comments · Fixed by #3228
Closed

--sync-timeout is capped at 60s #3049

awwithro opened this issue May 8, 2020 · 7 comments · Fixed by #3228
Labels
blocked-needs-validation Issue is waiting to be validated before we can proceed bug

Comments

@awwithro
Copy link

awwithro commented May 8, 2020

Describe the bug

We're running a sync that is taking longer than a minute and hits a timeout. We tried to raise the timeout to account for this but we still get a context deadline exceeded error. If we lower the timeout to less than 60s we see timeouts sooner. I suspect there is another timeout that is being hit that is overriding the effective sync-timeout

To Reproduce

  1. Create a job that will take longer than 60s (sleep 300)
  2. Run flux with a --sync-duration greater than the job's run time

Expected behavior

The sync timeout should be respected

Logs

our args

      - --log-format=fmt
      - --ssh-keygen-dir=/var/fluxd/keygen
      - --ssh-keygen-format=RFC4716
      - --k8s-secret-name=flux-git-deploy
      - --memcached-hostname=fluxcd-memcached
      - --sync-state=git
      - --sync-timeout=2m
      - --memcached-service=
      - --git-url=ssh://{{our repo}}
      - --git-branch=master
      - --git-path={{ our path }}
      - --git-readonly=true
      - --git-user={{our user }}
      - --git-email={{ our email }
      - --git-verify-signatures=false
      - --git-set-author=false
      - --git-poll-interval=5m
      - --git-timeout=20s
      - --sync-interval=5m
      - --git-ci-skip=false
      - --manifest-generation=true
      - --automation-interval=5m
      - --registry-rps=200
      - --registry-burst=125
      - --registry-trace=false
      - --registry-disable-scanning
ts=2020-05-08T15:37:36.922163159Z caller=sync.go:73 component=daemon info="trying to sync git changes to the cluster" old=99ccd2c99ce8a771caf3139db6807d8af2b5b78e new=d4fc25c5a4a2d545007cff0e3433b59a452308d2
ts=2020-05-08T15:38:40.902810713Z caller=loop.go:107 component=sync-loop err="loading resources from repo: error executing generator command \"../../bin/flux-generator.sh\" from file \"../../.flux.yaml\": context deadline exceeded\nerror output:\n\ngenerated output:\n"

Additional context

  • Flux version: 1.19.0
  • Kubernetes version: 1.18.2
  • Git provider: AWS code commit
  • Container registry provider: bintray
@awwithro awwithro added blocked-needs-validation Issue is waiting to be validated before we can proceed bug labels May 8, 2020
@sara4dev
Copy link

sara4dev commented May 11, 2020

the command execution timesout at 60s, it's set here - https://github.com/fluxcd/flux/blob/v1.19.0/pkg/manifests/configfile.go#L24 and used here to execute the command - https://github.com/fluxcd/flux/blob/v1.19.0/pkg/manifests/configfile.go#L490

It would be nice to make it configurable, or use the --sync-timeout here.

@stefanprodan
Copy link
Member

I agree that the command timeout could use --sync-timeout since you can put whatever script in .flux.yaml including a kubectl apply. @squaremo what are your throughs on this?

@marshallford
Copy link
Contributor

Kustomize tends to take quite a while when remote resources in git are referenced, it would be nice to see the command timeout respect the --sync-timeout flag.

@mmckane
Copy link

mmckane commented Oct 1, 2020

Does anyone have time to look at the PR from @marshallford possibly @stefanprodan? We are running into this issue as we use kustomize to install everything in our cluster, our manifest generation is now timing out 90% of the time, and changing sync-timeout has no effect, you can change it to 20min and it still times out. After 2 or 3 days though one of the syncs seems to sneak through successfully.

@rdubya16
Copy link

rdubya16 commented Oct 1, 2020

This is also impacting us in certain regions. We recently spun up a cluster in Australia and its taking 1m 30s to perform a kustomize build due to a large amount of remote git repos and we have no way to adjust.

@uberspot
Copy link

Hi, could someone look at merging the PR potentially? It seems this issue is also impacting us.
And flux fails syncing with:

{"caller":"loop.go:108","component":"sync-loop","err":"collating resources in cluster for sync: the server was unable to return a response in the time allotted, but may still be processing the request","ts":"2020-10-16T16:54:53.135656371Z"}

@marshallford
Copy link
Contributor

marshallford commented Oct 16, 2020

I'll be working on the PR review items this weekend.

EDIT: That error is different than the one I'm trying to fix.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
blocked-needs-validation Issue is waiting to be validated before we can proceed bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants