Skip to content

[RFC] Configuration & Environment

Romain Dorgueil edited this page Oct 8, 2017 · 5 revisions
Subject: Configuration & Environment
Authors: CW Andrews
Romain Dorgueil
Created: Sep 7, 2017
Modified: Oct 7, 2017
Target: 0.5
Status: First bits released, draft needs cleanup.

THIS IS A DRAFT

TL;DR

ETL jobs needs to be parametrizable by the end user.

  • For the simplest needs, system environment is sufficient and one can read from os.environ (easy to do since 0.5).
  • There may be a need for "validation". For example, a variable may be required (api key...) or needs to be an int (number of queries...). This is not yet possible but should enhance developper experience (target: future).
  • There should be some possibilities to change the graph topography depending on configuration. For example, the "slack api" branch in the graph may be added only if SLACK_KEY is present.

Environment variables

Runtime configuration should be done using environment variables.

In the future (0.7+), there may be a way to add validation, but not for now.

Order of priority should be, from lower to higher (higher wins, if set):

  1. default values
    • os.getenv("VARNAME", default_value)
    • The user/writer/creator of the graph is responsible for setting these.
  2. --default-env-file values Not yet implemented
    • Specify file to read default env values from. Each env var in the file is used if the var isn't already a corresponding value set at the system environment (system environment vars not overwritten).
    • If --default-env-file not passed but '.env' file exists in working directory then '.env' will be used as the default-env-file.
  3. --default-env values Not yet implemented
    • Works like #2 but the default NAME=var are passed individually, with one key=value pair for each --default-env flag rather than gathered from a specified file.
  4. system environment values
    • Env vars already set at the system level. It is worth noting that passed env vars via NAME=value bonobo run ... falls here in the order of priority.
  5. --env-file values Not yet implemented
    • Env vars specified here are set like those in #2 albeit that these values have priority over those set at the system level.
  6. --env values
    • Env vars set using the --env / -e flag work like #3 but take priority over all other env vars.

### Notes * Way to go for runtime configuration. * Reading a value from environment is done using the standard os.getenv("VARNAME", default_value). * There is no way to "validate" those options yet, not sure about whether it's needed so for now, let's do nothing.

Overriding from shell

If you have a bash like shell, you can override variables in the shell.

FOO=bar bonobo run ...

Overriding with arguments

Some shells apparently make it harder to override env from the command line. Bonobo now includes the --env / -e flag to pass vars in a shell-agnostic way.

  • bonobo run --env FOO=bar ...
  • bonobo run -e FOO=bar ...

Environment file (__future__)

_Not implemented yet._

  • If it exists in the working directory, '.env' will be used for --default-env-file values unless --default-env-file is passed explicitly.

  • --default-env-file and --env-file will both be usable in the future with the differences being where each falls in the order of priority (see above) and that --default-env-file will use '.env' by default, if it exists, while --env-file will have to be passed explicitly.

  • Environment file format will be key=value value pairs with one pair per line.
    • For Example:

    ` # .env FOO=bar FIZZ=buzz SECRET='my secret' `

  • Although not set in stone, it is likely we will add a dependency such as https://github.com/theskumar/python-dotenv to parse environment variables rather than using our own implementation.

Documentation

This needs a complete documentation.