Skip to content
This repository has been archived by the owner on May 6, 2020. It is now read-only.

Controller fails sometimes during build process #1204

Closed
Bregor opened this issue Jan 17, 2017 · 5 comments
Closed

Controller fails sometimes during build process #1204

Bregor opened this issue Jan 17, 2017 · 5 comments

Comments

@Bregor
Copy link
Contributor

Bregor commented Jan 17, 2017

Environment:

  • k8s-1.5.1
  • deis-workflow-2.10.0
  • helm-2.1.3
  • bare metal servers

Log:

...
2017-01-17T14:43:50.765913714Z INFO [ebay-mag-2]: waited 110s and 2 pods are in service
2017-01-17T14:44:01.400580036Z INFO [ebay-mag-2]: waited 120s and 2 pods are in service
2017-01-17T14:44:11.865622405Z INFO [ebay-mag-2]: waited 130s and 2 pods are in service
2017-01-17T14:44:22.311102246Z INFO [ebay-mag-2]: waited 140s and 2 pods are in service
2017-01-17T14:44:32.819195791Z ERROR [ebay-mag-2]: (app::deploy): There was a problem while deploying v117 of ebay-mag-2-web. Additional information:
2017-01-17T14:44:32.819438994Z Readiness probe failed: Get http://192.168.48.11:5000/health: dial tcp 192.168.48.11:5000: getsockopt: connection refused
2017-01-17T14:44:32.819783064Z ERROR [ebay-mag-2]: (app::deploy): (app::deploy): There was a problem while deploying v117 of ebay-mag-2-web. Additional information:
2017-01-17T14:44:32.819841269Z Readiness probe failed: Get http://192.168.48.11:5000/health: dial tcp 192.168.48.11:5000: getsockopt: connection refused
2017-01-17T14:44:32.866680654Z ERROR:root:(app::deploy): (app::deploy): There was a problem while deploying v117 of ebay-mag-2-web. Additional information:
2017-01-17T14:44:32.866843178Z Readiness probe failed: Get http://192.168.48.11:5000/health: dial tcp 192.168.48.11:5000: getsockopt: connection refused
2017-01-17T14:44:32.867032926Z Traceback (most recent call last):
2017-01-17T14:44:32.867084354Z   File "/app/scheduler/__init__.py", line 270, in deploy
2017-01-17T14:44:32.867243865Z     namespace, name, image, entrypoint, command, **kwargs
2017-01-17T14:44:32.867553414Z   File "/app/scheduler/resources/deployment.py", line 138, in update
2017-01-17T14:44:32.867685122Z     self.wait_until_ready(namespace, name, **kwargs)
2017-01-17T14:44:32.867738913Z   File "/app/scheduler/resources/deployment.py", line 353, in wait_until_ready
2017-01-17T14:44:32.867979613Z     self.pod._handle_not_ready_pods(namespace, labels)
2017-01-17T14:44:32.868180507Z   File "/app/scheduler/resources/pod.py", line 720, in _handle_not_ready_pods
2017-01-17T14:44:32.868400739Z     raise KubeException(message)
2017-01-17T14:44:32.868469760Z scheduler.exceptions.KubeException: Readiness probe failed: Get http://192.168.48.11:5000/health: dial tcp 192.168.48.11:5000: getsockopt: connection refused
2017-01-17T14:44:32.868609130Z
2017-01-17T14:44:32.868696406Z The above exception was the direct cause of the following exception:
2017-01-17T14:44:32.868925044Z
2017-01-17T14:44:32.869044298Z Traceback (most recent call last):
2017-01-17T14:44:32.869186538Z   File "/app/api/models/app.py", line 493, in deploy
2017-01-17T14:44:32.869286426Z     async_run(tasks)
2017-01-17T14:44:32.869409438Z   File "/app/api/utils.py", line 169, in async_run
2017-01-17T14:44:32.869568113Z     raise error
2017-01-17T14:44:32.869818258Z   File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
2017-01-17T14:44:32.869858910Z     result = coro.throw(exc)
2017-01-17T14:44:32.870301740Z   File "/app/api/utils.py", line 182, in async_task
2017-01-17T14:44:32.870341577Z     yield from loop.run_in_executor(None, params)
2017-01-17T14:44:32.870566900Z   File "/usr/lib/python3.5/asyncio/futures.py", line 361, in __iter__
2017-01-17T14:44:32.870703792Z     yield self  # This tells Task to wait for completion.
2017-01-17T14:44:32.870892427Z   File "/usr/lib/python3.5/asyncio/tasks.py", line 296, in _wakeup
2017-01-17T14:44:32.870954657Z     future.result()
2017-01-17T14:44:32.871112001Z   File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
2017-01-17T14:44:32.871186216Z     raise self._exception
2017-01-17T14:44:32.871434413Z   File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
2017-01-17T14:44:32.871479018Z     result = self.fn(*self.args, **self.kwargs)
2017-01-17T14:44:32.871622828Z   File "/app/scheduler/__init__.py", line 276, in deploy
2017-01-17T14:44:32.871774240Z     ) from e
2017-01-17T14:44:32.873469294Z scheduler.exceptions.KubeException: There was a problem while deploying v117 of ebay-mag-2-web. Additional information:
2017-01-17T14:44:32.873544691Z Readiness probe failed: Get http://192.168.48.11:5000/health: dial tcp 192.168.48.11:5000: getsockopt: connection refused
2017-01-17T14:44:32.873646094Z
2017-01-17T14:44:32.873673318Z The above exception was the direct cause of the following exception:
2017-01-17T14:44:32.873768882Z
2017-01-17T14:44:32.873822916Z Traceback (most recent call last):
2017-01-17T14:44:32.874037404Z   File "/app/api/models/app.py", line 502, in deploy
2017-01-17T14:44:32.874095885Z     self.deploy(release.previous(), force_deploy=True, rollback_on_failure=False)
2017-01-17T14:44:32.874203931Z   File "/app/api/models/app.py", line 512, in deploy
2017-01-17T14:44:32.874244296Z     raise ServiceUnavailable(err) from e
2017-01-17T14:44:32.874421577Z api.exceptions.ServiceUnavailable: (app::deploy): There was a problem while deploying v117 of ebay-mag-2-web. Additional information:
2017-01-17T14:44:32.874479111Z Readiness probe failed: Get http://192.168.48.11:5000/health: dial tcp 192.168.48.11:5000: getsockopt: connection refused
2017-01-17T14:44:32.874661794Z
2017-01-17T14:44:32.874700852Z The above exception was the direct cause of the following exception:
2017-01-17T14:44:32.874880355Z
2017-01-17T14:44:32.874918072Z Traceback (most recent call last):
2017-01-17T14:44:32.875017049Z   File "/app/api/models/build.py", line 65, in create
2017-01-17T14:44:32.875074610Z     self.app.deploy(new_release)
2017-01-17T14:44:32.875220869Z   File "/app/api/models/app.py", line 512, in deploy
2017-01-17T14:44:32.875250461Z     raise ServiceUnavailable(err) from e
2017-01-17T14:44:32.875458402Z api.exceptions.ServiceUnavailable: (app::deploy): (app::deploy): There was a problem while deploying v117 of ebay-mag-2-web. Additional information:
2017-01-17T14:44:32.875504292Z Readiness probe failed: Get http://192.168.48.11:5000/health: dial tcp 192.168.48.11:5000: getsockopt: connection refused
2017-01-17T14:44:32.875904162Z
2017-01-17T14:44:32.875951563Z The above exception was the direct cause of the following exception:
2017-01-17T14:44:32.876060714Z
2017-01-17T14:44:32.876087876Z Traceback (most recent call last):
2017-01-17T14:44:32.876250523Z   File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 474, in dispatch
2017-01-17T14:44:32.876316003Z     response = handler(request, *args, **kwargs)
2017-01-17T14:44:32.876445112Z   File "/app/api/views.py", line 527, in create
2017-01-17T14:44:32.876525601Z     super(BuildHookViewSet, self).create(request, *args, **kwargs)
2017-01-17T14:44:32.876856525Z   File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
2017-01-17T14:44:32.876931534Z     self.perform_create(serializer)
2017-01-17T14:44:32.877093936Z   File "/app/api/viewsets.py", line 21, in perform_create
2017-01-17T14:44:32.877157884Z     self.post_save(obj)
2017-01-17T14:44:32.877280365Z   File "/app/api/views.py", line 533, in post_save
2017-01-17T14:44:32.877347150Z     build.create(self.user)
2017-01-17T14:44:32.877533890Z   File "/app/api/models/build.py", line 79, in create
2017-01-17T14:44:32.877564441Z     raise DeisException(str(e)) from e
2017-01-17T14:44:32.877723698Z api.exceptions.DeisException: (app::deploy): (app::deploy): There was a problem while deploying v117 of ebay-mag-2-web. Additional information:
2017-01-17T14:44:32.877756724Z Readiness probe failed: Get http://192.168.48.11:5000/health: dial tcp 192.168.48.11:5000: getsockopt: connection refused
2017-01-17T14:44:32.885184165Z 192.168.48.5 "POST /v2/hooks/build/ HTTP/1.1" 400 249 "deis-builder"
@bacongobbler
Copy link
Member

What do your application logs say? That error indicates your pods were not ready in time based on the configured healthcheck, so the release was rolled back.

@Bregor
Copy link
Contributor Author

Bregor commented Jan 17, 2017

In application logs I see strange behaviour of buildpack (node or multi):

2017-01-17T14:45:27.562530731Z tar: ./node_modules/postcss-svgo/node_modules/.bin: Directory renamed before its status could be extracted
2017-01-17T14:45:27.562717454Z tar: ./node_modules/postcss-svgo/node_modules: Directory renamed before its status could be extracted
2017-01-17T14:45:27.562753301Z tar: ./node_modules/postcss-svgo: Directory renamed before its status could be extracted
2017-01-17T14:45:27.562779941Z tar: ./node_modules/recast/node_modules/.bin: Directory renamed before its status could be extracted
2017-01-17T14:45:27.562805074Z tar: ./node_modules/recast/node_modules: Directory renamed before its status could be extracted
2017-01-17T14:45:27.562829651Z tar: ./node_modules/recast: Directory renamed before its status could be extracted
2017-01-17T14:45:27.562853204Z tar: ./node_modules/doiuse/node_modules/yargs/node_modules/.bin: Directory renamed before its status could be extracted
2017-01-17T14:45:27.562876847Z tar: ./node_modules/doiuse/node_modules/yargs/node_modules: Directory renamed before its status could be extracted
2017-01-17T14:45:27.562896314Z tar: ./node_modules/doiuse/node_modules/yargs: Directory renamed before its status could be extracted
2017-01-17T14:45:27.562912167Z tar: ./node_modules/doiuse/node_modules/.bin: Directory renamed before its status could be extracted
2017-01-17T14:45:27.562925047Z tar: ./node_modules/doiuse/node_modules: Directory renamed before its status could be extracted
2017-01-17T14:45:27.562937661Z tar: ./node_modules/doiuse: Directory renamed before its status could be extracted
...
2017-01-17T14:45:27.569175972Z tar: Exiting with failure status due to previous errors

But why controller's readiness probe failed?

@bacongobbler
Copy link
Member

That is actually your app's healthcheck failing. The controller does a health check on the app if configured after it is deployed. My guess is the nodejs app did not start based on the logs shown above.

@Bregor
Copy link
Contributor Author

Bregor commented Jan 17, 2017

OK, I understand, thank you.
So we can close this issue?

@bacongobbler
Copy link
Member

Yep :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants