Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quickstart script reports containers as ready while flowapi fails to start up #745

Closed
maxalbert opened this issue May 10, 2019 · 2 comments · Fixed by #747
Closed

Quickstart script reports containers as ready while flowapi fails to start up #745

maxalbert opened this issue May 10, 2019 · 2 comments · Fixed by #747
Labels
bug Something isn't working deployment

Comments

@maxalbert
Copy link
Contributor

Describe the bug
The quick-start script fails to abort and report an error if one of the services (here: FlowAPI) fails to fully start up.

Product
Quickstart script.

Version
Current master (36ab9af2).

To Reproduce

On a fresh VM, run:

$ bash <(curl -s https://raw.githubusercontent.com/Flowminder/FlowKit/master/quick_start.sh)

Output:

Starting containers (this may take a few minutes)
Pulling flowdb                   ... done
Pulling flowdb_testdata          ... done
Pulling flowdb_synthetic_data    ... done
Pulling worked_examples          ... done
Pulling flowapi                  ... done
Pulling flowauth                 ... done
Pulling flowmachine_query_locker ... done
Pulling flowmachine              ... done
Pulling flowetl                  ... done
Pulling flowetl_db               ... done
Creating network "flowkit_db" with the default driver
Creating network "flowkit_redis" with the default driver
Creating network "flowkit_zero" with the default driver
Creating network "flowkit_default" with the default driver
Creating network "flowkit_flowetl_db" with the default driver
Creating flowmachine_query_locker ... done
Creating flowauth                 ... done
Creating flowdb_testdata          ... done
Creating flowapi                  ... done
Creating flowmachine              ... done
Waiting for containers to be ready..
127.0.0.1:5432 - no response
Waiting 10s
127.0.0.1:5432 - no response
Waiting 10s
127.0.0.1:5432 - no response
Waiting 10s
127.0.0.1:5432 - no response
Waiting 10s
127.0.0.1:5432 - accepting connections
FlowDB ready.
Waiting 10s
FlowMachine ready
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
Waiting 10s
FlowAPI ready.
FlowAuth ready.
Worked examples ready.
All containers ready!
Access FlowDB using 'PGHOST=localhost PGPORT=9000 PGDATABASE=flowdb PGUSER=flowmachine PGPASSWORD=foo psql'
Access FlowAPI using FlowClient at http://localhost:9090
View the FlowAPI spec at http://localhost:9090/api/0/spec/redoc
Generate FlowAPI access tokens using FlowAuth with user TEST_USER and password DUMMY_PASSWORD at http://localhost:9091

So it looks like the containers are ready. However:

$ docker ps
CONTAINER ID        IMAGE                               COMMAND                  CREATED             STATUS                          PORTS                           NAMES
37998660aff4        flowminder/flowmachine:latest       "pipenv run flowmach…"   13 minutes ago      Up 12 minutes                   0.0.0.0:5555->5555/tcp          flowmachine
727854ee4293        flowminder/flowapi:latest           "pipenv run ./start.…"   13 minutes ago      Restarting (1) 53 seconds ago                                   flowapi
00af61326972        flowminder/flowdb-testdata:latest   "docker-entrypoint.s…"   13 minutes ago      Up 13 minutes                   0.0.0.0:9000->5432/tcp          flowdb_testdata
5abbd9173121        flowminder/flowauth:latest          "/entrypoint.sh /sta…"   13 minutes ago      Up 13 minutes                   443/tcp, 0.0.0.0:9091->80/tcp   flowauth
011fd778c962        bitnami/redis                       "/entrypoint.sh /run…"   13 minutes ago      Up 13 minutes                   0.0.0.0:6379->6379/tcp          flowmachine_query_locker

The reason FlowAPI fails is because for some reason FLOWAPI_IDENTIFIER is not passed on to it:

$ docker logs flowapi
[...]
Running on 0.0.0.0:9090 over http (CTRL + C to quit)
Traceback (most recent call last):
  File "/root/.local/share/virtualenvs/flowapi-GLBIuhGh/bin/hypercorn", line 10, in <module>
    sys.exit(main())
  File "/root/.local/share/virtualenvs/flowapi-GLBIuhGh/lib/python3.7/site-packages/hypercorn/__main__.py", line 190, in main
    run(config)
  File "/root/.local/share/virtualenvs/flowapi-GLBIuhGh/lib/python3.7/site-packages/hypercorn/run.py", line 34, in run
    worker_func(config)
  File "/root/.local/share/virtualenvs/flowapi-GLBIuhGh/lib/python3.7/site-packages/hypercorn/asyncio/run.py", line 198, in asyncio_worker
    app = load_application(config.application_path)
  File "/root/.local/share/virtualenvs/flowapi-GLBIuhGh/lib/python3.7/site-packages/hypercorn/utils.py", line 76, in load_application
    return eval(app_name, vars(module))
  File "<string>", line 1, in <module>
  File "/flowapi/flowapi/main.py", line 113, in create_app
    app.config.from_mapping(get_config())
  File "/flowapi/flowapi/config.py", line 47, in get_config
    "FLOWAPI_IDENTIFIER", os.environ["FLOWAPI_IDENTIFIER"]
  File "/root/.local/share/virtualenvs/flowapi-GLBIuhGh/lib/python3.7/os.py", line 678, in __getitem__
    raise KeyError(key) from None
KeyError: 'FLOWAPI_IDENTIFIER'

Expected behavior
The quickstart script should report FlowAPI as failed if it can't start up successfully.

Additional context
It feels to me that we hit these kinds of deployment issues repeatedly and so far haven't fully resolved them in a satisfactory way. We previously had issues with one service (e.g. flowapi) not reliably knowing whether a corresponding service (e.g. flowmachine) has been successfully started up. Similarly, from the "outside" we want to be able to reliably check whether individual services and/or the whole system are fully started up and in a functional state.

It would be really helpful if we had unique, clearly defined and documented "hooks" which can be used both internally by the various FlowKit services and by an external user to do a one-step verification and diagnostic for each service and the whole system.

My impression (which may be wrong) is that we use a couple of different methods in different places to check that services are up and running, and these methods are sometimes limited to specific circumstances and/or a little ad-hoc, and so we keep running into these problems repeatedly because we don't have a single reliable and well-documented check to verify that services are up and running.

@maxalbert maxalbert added bug Something isn't working deployment labels May 10, 2019
@greenape
Copy link
Member

Ahhh. I obviously neglected to put it in the compose file!

Looks like I fail bash as well, because the timing out isn’t triggering the fail condition as it should.

@maxalbert
Copy link
Contributor Author

Ah, ok. I'll add it there. 👌

@maxalbert maxalbert mentioned this issue May 10, 2019
8 tasks
@maxalbert maxalbert mentioned this issue May 10, 2019
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working deployment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants