Skip to content

Commit

Permalink
Store config description in Airflow configuration object (apache#32669)
Browse files Browse the repository at this point in the history
We would like to use the config.yml approach as our main source of truth
for airflow configuration. So far it has been split amongst multiple
files:

  * confg.yml -> descroption
  * default_airflow.cfg -> JINJA template to keep cofiguration, examples
    and description (used to generate airflow.cfg when airflow starts)
  * default_test.cfg -> storing test-only configuration used in some
    test cases - used to generate unittest.cfg
  * scripts/in_container/airflow_ci.cfg where dCI-specific configuration
    overwrote the unittest.cfg

This change consolidates it all into unified appraoch, where all
configuration information is retrieved from .yml files stored in
config_templates. No more additional template files processed by
JINJA, no more special CI versions of it, no more unittestdb.cfg file
where such configuration would be generated, no more unittestdb to
be used separately for tests.

* The default_*.cfg files were not real configuration files, becuase
  they were really JINJA templates and it got peoeple confused when
  copying the files. This change leaves the file empty with the
  comment that instructs the user how they can get the default
  configuration.
* The default_airflow.cfg is gone and instead, we have a way to
  show defaults via `airflow config list --defaults` command
* Unittest config is generated on-the-flight using defaults stored
  in confg_templates. constituing a single place where they need
  to be changed for the tests to use them
* internally, description of the configuration is stored in
  AirflowConfigurationParser and used by the parser to generate
  the default configuration when needed.
* we got rid of `{{{{` in templated config defaults by marking
  the templates with ``is_template`` and getting rid of processing
  those entries with regular formatting when generating the default
  values. This only concerns defaults from config.yml. Reading
  those configuration entries from file is unaffected.

This change aims to be 100% backwards compatible with the previous
implementation when it comes to functionality, even if internals
changed. It also does not add provider-specific changes that are
coming separately.

The only changes visible to the user are:

* generated airflow.cfg is slightly more
  readable and displays names of variables that can be used to override
  each configuration (which is very useful for copy&pasting)

* user are advised, instead of copying the default_airflow.cfg to use
  `airflow config list --defaults` to generate production config. This
  configuration has all the entries commented out, so that they can
  selectively uncomment and change the values they want. This is now
  promoted as "best practice" in the documentation.
  • Loading branch information
potiuk authored Jul 20, 2023
1 parent 7123dc1 commit faa8f54
Show file tree
Hide file tree
Showing 37 changed files with 1,139 additions and 2,422 deletions.
5 changes: 0 additions & 5 deletions .github/mergeable.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,3 @@ mergeable:
changed:
file: 'package.json'
files: ['yarn.lock']
# If config.yml is updated, so should default_airflow.cfg
- do: dependent
changed:
file: 'config.yml'
files: ['default_airflow.cfg']
2 changes: 0 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
# Airflow configuration
airflow.cfg
unittests.cfg
airflow_login.py
dbinit.py
initdb.py
secrets.py

# Airflow sqlite databases
airflow.db
unittests.db

# Airflow temporary artifacts
airflow/git_version
Expand Down
8 changes: 0 additions & 8 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -618,14 +618,6 @@ repos:
pass_filenames: true
additional_dependencies: ['click', 'rich>=12.4.4', 'pyyaml']
require_serial: true
- id: check-airflow-config-yaml-consistent
name: Check consistency between config.yml and default_config.cfg
language: python
entry: ./scripts/ci/pre_commit/pre_commit_yaml_to_cfg.py
files: config\.yml$|default_airflow\.cfg$|default\.cfg$
pass_filenames: false
require_serial: true
additional_dependencies: ['pyyaml', 'packaging']
- id: check-boring-cyborg-configuration
name: Checks for Boring Cyborg configuration consistency
language: python
Expand Down
1 change: 0 additions & 1 deletion .rat-excludes
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ metastore_db
.*md5
.*zip
.*lock
unittests.cfg
logs
.bash_aliases
venv
Expand Down
1 change: 0 additions & 1 deletion Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -837,7 +837,6 @@ if [[ ${SKIP_ENVIRONMENT_INITIALIZATION=} != "true" ]]; then
unset AIRFLOW__CORE__UNIT_TEST_MODE

mkdir -pv "${AIRFLOW_HOME}/logs/"
cp -f "${IN_CONTAINER_DIR}/airflow_ci.cfg" "${AIRFLOW_HOME}/unittests.cfg"

# Change the default worker_concurrency for tests
export AIRFLOW__CELERY__WORKER_CONCURRENCY=8
Expand Down
2 changes: 0 additions & 2 deletions STATIC_CODE_CHECKS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -146,8 +146,6 @@ require Breeze Docker image to be built locally.
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| check-aiobotocore-optional | Check if aiobotocore is an optional dependency only | |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| check-airflow-config-yaml-consistent | Check consistency between config.yml and default_config.cfg | |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| check-airflow-provider-compatibility | Check compatibility of Providers with Airflow | |
+-----------------------------------------------------------+--------------------------------------------------------------+---------+
| check-apache-license-rat | Check if licenses are OK for Apache | |
Expand Down
33 changes: 31 additions & 2 deletions TESTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,37 @@ Follow the guidelines when writing unit tests:
tests, so we run Pytest with ``--disable-warnings`` but instead we have ``pytest-capture-warnings`` plugin that
overrides ``recwarn`` fixture behaviour.

**NOTE:** We plan to convert all unit tests to standard "asserts" semi-automatically, but this will be done later
in Airflow 2.0 development phase. That will include setUp/tearDown/context managers and decorators.

.. note::

We are in the process of converting all unit tests to standard "asserts" and pytest fixtures
so if you find some tests that are still using classic setUp/tearDown approach or unittest asserts, feel
free to convert them to pytest.

Airflow configuration for unit tests
------------------------------------

Some of the unit tests require special configuration set as the ``default``. This is done automatically by
adding ``AIRFLOW__CORE__UNIT_TEST_MODE=True`` to the environment variables in Pytest auto-used
fixture. This in turn makes Airflow load test configuration from the file
``airflow/config_templates/unit_tests.cfg``. Test configuration from there replaces the original
defaults from ``airflow/config_templates/config.yml``. If you want to add some test-only configuration,
as default for all tests you should add the value to this file.

You can also of course override the values in individual test by patching environment variables following
the usual ``AIRFLOW__SECTION__KEY`` pattern or ``conf_vars`` context manager.

.. note::

The test configuration for Airflow before July 2023 was automatically generated in a file named
``AIRFLOW_HOME/unittest.cfg``. The template for it was stored in "config_templates" next to the yaml file.
However writing the file was only done for the first time you run airflow and you had to manually
maintain the file. It was pretty arcane knowledge, and this generated file in {AIRFLOW_HOME}
has been overwritten in the Breeze environment with another CI-specific file. Using ``unit_tests.cfg``
as a single source of the configuration for tests - coming from Airflow sources
rather than from {AIRFLOW_HOME} is much more convenient and it is automatically used by pytest.

The unittest.cfg file generated in {AIRFLOW_HOME} will no longer be used and can be removed.

Airflow test types
------------------
Expand Down
37 changes: 36 additions & 1 deletion airflow/cli/cli_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,31 @@ def string_lower_type(val):
ARG_MARK_SUCCESS = Arg(
("-m", "--mark-success"), help="Mark jobs as succeeded without running them", action="store_true"
)
ARG_INCLUDE_DESCRIPTIONS = Arg(
("-d", "--include-descriptions"),
help="Show descriptions for the configuration variables",
action="store_true",
)
ARG_INCLUDE_EXAMPLES = Arg(
("-e", "--include-examples"), help="Show examples for the configuration variables", action="store_true"
)
ARG_INCLUDE_SOURCES = Arg(
("-s", "--include-sources"), help="Show source of the configuration variable", action="store_true"
)
ARG_INCLUDE_ENV_VARS = Arg(
("-V", "--include-env-vars"), help="Show environment variable for each option", action="store_true"
)
ARG_COMMENT_OUT_EVERYTHING = Arg(
("-c", "--comment-out-everything"),
help="Comment out all configuration options. Useful as starting point for new installation",
action="store_true",
)
ARG_DEFAULTS = Arg(
("-a", "--defaults"),
help="Show only defaults - do not include local configuration, sources,"
" includes descriptions, examples, variables. Comment out everything.",
action="store_true",
)
ARG_VERBOSE = Arg(("-v", "--verbose"), help="Make logging output more verbose", action="store_true")
ARG_LOCAL = Arg(("-l", "--local"), help="Run the task using the LocalExecutor", action="store_true")
ARG_DONOT_PICKLE = Arg(
Expand Down Expand Up @@ -2002,7 +2027,17 @@ class GroupCommand(NamedTuple):
name="list",
help="List options for the configuration",
func=lazy_load_command("airflow.cli.commands.config_command.show_config"),
args=(ARG_OPTIONAL_SECTION, ARG_COLOR, ARG_VERBOSE),
args=(
ARG_OPTIONAL_SECTION,
ARG_COLOR,
ARG_INCLUDE_DESCRIPTIONS,
ARG_INCLUDE_EXAMPLES,
ARG_INCLUDE_SOURCES,
ARG_INCLUDE_ENV_VARS,
ARG_COMMENT_OUT_EVERYTHING,
ARG_DEFAULTS,
ARG_VERBOSE,
),
),
)

Expand Down
11 changes: 10 additions & 1 deletion airflow/cli/commands/config_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,16 @@
def show_config(args):
"""Show current application configuration."""
with io.StringIO() as output:
conf.write(output, section=args.section)
conf.write(
output,
section=args.section,
include_examples=args.include_examples or args.defaults,
include_descriptions=args.include_descriptions or args.defaults,
include_sources=args.include_sources and not args.defaults,
include_env_vars=args.include_env_vars or args.defaults,
comment_out_everything=args.comment_out_everything or args.defaults,
only_defaults=args.defaults,
)
code = output.getvalue()
if should_use_colors(args):
code = pygments.highlight(code=code, formatter=get_terminal_formatter(), lexer=IniLexer())
Expand Down
34 changes: 19 additions & 15 deletions airflow/config_templates/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,10 @@ core:
default: "airflow.utils.net.getfqdn"
might_contain_dag_callable:
description: |
A callable to check if a python file has airflow dags defined or not
with argument as: `(file_path: str, zip_file: zipfile.ZipFile | None = None)`
return True if it has dags otherwise False
If this is not provided, Airflow uses its own heuristic rules.
A callable to check if a python file has airflow dags defined or not
with argument as: `(file_path: str, zip_file: zipfile.ZipFile | None = None)`
return True if it has dags otherwise False
If this is not provided, Airflow uses its own heuristic rules.
version_added: 2.6.0
type: string
example: ~
Expand Down Expand Up @@ -288,8 +288,8 @@ core:
default: "0"
default_task_retry_delay:
description: |
The number of seconds each task is going to wait by default between retries. Can be overridden at
dag or task level.
The number of seconds each task is going to wait by default between retries. Can be overridden at
dag or task level.
version_added: 2.4.0
type: integer
example: ~
Expand Down Expand Up @@ -681,7 +681,7 @@ logging:
description: |
The remote_task_handler_kwargs param is loaded into a dictionary and passed to __init__ of remote
task handler and it overrides the values provided by Airflow config. For example if you set
`delete_local_logs=False` and you provide ``{{"delete_local_copy": true}}``, then the local
`delete_local_logs=False` and you provide ``{"delete_local_copy": true}``, then the local
log files will be deleted after they are uploaded to remote location.
version_added: 2.6.0
type: string
Expand Down Expand Up @@ -804,24 +804,27 @@ logging:
Specify prefix pattern like mentioned below with stream handler TaskHandlerWithCustomFormatter
version_added: 2.0.0
type: string
example: "{{ti.dag_id}}-{{ti.task_id}}-{{execution_date}}-{{try_number}}"
example: "{ti.dag_id}-{ti.task_id}-{execution_date}-{try_number}"
is_template: true
default: ""
log_filename_template:
description: |
Formatting for how airflow generates file names/paths for each task run.
version_added: 2.0.0
type: string
example: ~
default: "dag_id={{{{ ti.dag_id }}}}/run_id={{{{ ti.run_id }}}}/task_id={{{{ ti.task_id }}}}/\
{{%% if ti.map_index >= 0 %%}}map_index={{{{ ti.map_index }}}}/{{%% endif %%}}\
attempt={{{{ try_number }}}}.log"
is_template: true
default: "dag_id={{ ti.dag_id }}/run_id={{ ti.run_id }}/task_id={{ ti.task_id }}/\
{%% if ti.map_index >= 0 %%}map_index={{ ti.map_index }}/{%% endif %%}\
attempt={{ try_number }}.log"
log_processor_filename_template:
description: |
Formatting for how airflow generates file names for log
version_added: 2.0.0
type: string
example: ~
default: "{{{{ filename }}}}.log"
is_template: true
default: "{{ filename }}.log"
dag_processor_manager_log_location:
description: |
Full path of dag_processor_manager logfile.
Expand Down Expand Up @@ -1071,7 +1074,7 @@ secrets:
The backend_kwargs param is loaded into a dictionary and passed to __init__ of secrets backend class.
See documentation for the secrets backend you are using. JSON is expected.
Example for AWS Systems Manager ParameterStore:
``{{"connections_prefix": "/airflow/connections", "profile_name": "default"}}``
``{"connections_prefix": "/airflow/connections", "profile_name": "default"}``
version_added: 1.10.10
type: string
sensitive: true
Expand Down Expand Up @@ -1825,7 +1828,7 @@ webserver:
default: "True"
auth_rate_limit:
description: |
Rate limit for authentication endpoints.
Rate limit for authentication endpoints.
version_added: 2.6.0
type: string
example: ~
Expand Down Expand Up @@ -2688,7 +2691,8 @@ elasticsearch:
version_added: 1.10.4
type: string
example: ~
default: "{{dag_id}}-{{task_id}}-{{run_id}}-{{map_index}}-{{try_number}}"
is_template: true
default: "{dag_id}-{task_id}-{run_id}-{map_index}-{try_number}"
end_of_log_mark:
description: |
Used to mark the end of a log stream for a task
Expand Down
5 changes: 5 additions & 0 deletions airflow/config_templates/config.yml.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,11 @@
"sensitive": {
"type": "boolean",
"description": "When true, this option is sensitive and can be specified using AIRFLOW__{section}___{name}__SECRET or AIRFLOW__{section}___{name}_CMD environment variables. See: airflow.configuration.AirflowConfigParser.sensitive_config_values"
},
"is_template": {
"type": "boolean",
"description": "When true, the {VARS} have no special meaning there - they won't be expanded with env vars/local/global variables.",
"default": false
}
},
"required": [
Expand Down
Loading

0 comments on commit faa8f54

Please sign in to comment.