-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploying harbor tile might fail due to canary timed out #3735
Comments
The root cause seems the timeout 10min is too short. Although the bosh returns the error code, the harbor instance actually is already successfully started. We may need to increase the timeout threshold. This failure depends on the opsman env If the network is poor, timeout will easily appear. Issue is found in the local env and local VC. Not found in US env. |
logs: ==== [Sun Dec 3 14:14:17 UTC 2017] Starting Harbor 1.2.0 at https://testing.harbor.vmware.com |
Concourse ci log:Task 608 | 14:08:07 | Preparing deployment: Preparing deployment (00:00:00) Task 608 Started Sun Dec 3 14:08:07 UTC 2017 Updating deployment: |
Seems the timeout is always 10 minutes |
The update watch time used in deploying harbor with tile seems is still the default values: =====
|
Add more logs:harbor-app/168b4d39-0633-4268-984e-3ad52887dca3:/var/vcap/sys/log/harbor$ cat ctl.stdout.log |
From the two ctl.stdout.log files, we can see that the time interval of two "Starting Harbor 1.2.0" log are exactly the same 6 minutes 53 seconds. There are probably some timeout setting which causes the 2nd invoke of harbor clt start while 1st harbor ctl start is still running. |
Per investigation,
Let's remove it and extend the timeout for
extend it to 900 or 1200 seconds. We believe with this change the "multiple start" problem will be fixed. However there may be another issue that the
let's work with pivotal to see if there's a fix for that. |
This issue is resolved by these patches. We're running CI to verify this fix. Load harbor docker images in harbor job prestart instead of job start. Remove the restart logic in harbor monit spec Wait for dockerd to start after starting the dockerd daemon Restart harbor job when detecting harbor service failure |
Verified in CI. |
Unzip the harbor images tgz file into a tar file in packaging phase, then docker loading this tar file will be faster than loading the tgz file. This is to reduce the possibility of canary timeout issue #3735. Issue: goharbor/harbor#3735
This can solve the canary watch timeout issue #3735, because the time of prestart execution is not included in the canary_watch_time. This patch moves docker related utils and bosh env vars to src/common/utils.sh, and the key code for loading images is in jobs/harbor/templates/bin/pre-start.erb. Issue: goharbor/harbor#3735
Always in the first deployment.
The text was updated successfully, but these errors were encountered: