Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store config description in Airflow configuration object #32669

Merged

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Jul 18, 2023

We would like to use the config.yml approach as our main source of truth
for airflow configuration. So far it has been split amongst multiple
files:

  • confg.yml -> descroption
  • default_airflow.cfg -> JINJA template to keep cofiguration, examples
    and description (used to generate airflow.cfg when airflow starts)
  • default_test.cfg -> storing test-only configuration used in some
    test cases - used to generate unittest.cfg
  • scripts/in_container/airflow_ci.cfg where dCI-specific configuration
    overwrote the unittest.cfg

This change consolidates it all into unified appraoch, where all
configuration information is retrieved from .yml files stored in
config_templates. No more additional template files processed by
JINJA, no more special CI versions of it, no more unittestdb.cfg file
where such configuration would be generated, no more unittestdb to
be used separately for tests.

  • The default_*.cfg files were not real configuration files, becuase
    they were really JINJA templates and it got peoeple confused when
    copying the files. This change leaves the file empty with the
    comment that instructs the user how they can get the default
    configuration.
  • The default_airflow.cfg is gone and instead, we have a way to
    show defaults via airflow config list --defaults command
  • Unittest config is generated on-the-flight using defaults stored
    in confg_templates. constituing a single place where they need
    to be changed for the tests to use them
  • internally, description of the configuration is stored in
    AirflowConfigurationParser and used by the parser to generate
    the default configuration when needed.
  • we got rid of {{{{ in templated config defaults by marking
    the templates with is_template and getting rid of processing
    those entries with regular formatting when generating the default
    values. This only concerns defaults from config.yml. Reading
    those configuration entries from file is unaffected.

This change aims to be 100% backwards compatible with the previous
implementation when it comes to functionality, even if internals
changed. It also does not add provider-specific changes that are
coming separately.

The only changes visible to the user are:

  • generated airflow.cfg is slightly more
    readable and displays names of variables that can be used to override
    each configuration (which is very useful for copy&pasting)

  • user are advised, instead of copying the default_airflow.cfg to use
    airflow config list --defaults to generate production config. This
    configuration has all the entries commented out, so that they can
    selectively uncomment and change the values they want. This is now
    promoted as "best practice" in the documentation.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Jul 18, 2023

This is the most important part extracted from #32604 (I will shortly rebase #32604) on top of this one. This one DOES NOT contain any provider-related changes. Those will come as finall installment of the provider config contributed changes and is a crucial refactor to be able to add provider-contributed configurations.

I rewrote a little what I have done initially after I run more thorough testing - I came back (and got friends with) using ConfigParser as "default" configuration storage (because of interpolation). I also think that having unit_test.cfg as source of defaults for tests is better and more readable than .yml as it was in the original PR, so I changed it back in this one.

The gist of this change remains the same as described in the original PR - make config.yaml the ONLY source of truth for configuration defaults, keep test config entirely in the sources of airflow - and do not genarate the unittest.cfg file on the filesystem but use test configuration from memory - thus making our tests for configuraiton repeatable and rerunnable, no matter what environment they are running it (breeze, local venv, pycharm).

I also included some comments from the review in #32604 by @jedcunningham @ferruzzi and @o-nikolas (some of them still will need to be addressed separately and I resolved all those that are applied).

@potiuk potiuk force-pushed the store-config-description-in-airflow-config-parser branch 5 times, most recently from 8b2a8fc to 49a47db Compare July 18, 2023 07:03
Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did one pass at the PR and have few nits. Nothing major

@potiuk potiuk force-pushed the store-config-description-in-airflow-config-parser branch 2 times, most recently from df0d88a to e12ea1f Compare July 18, 2023 18:21
@potiuk
Copy link
Member Author

potiuk commented Jul 18, 2023

Applied first round of comments.

@potiuk potiuk force-pushed the store-config-description-in-airflow-config-parser branch from e12ea1f to 4699c1c Compare July 18, 2023 18:22
@potiuk potiuk force-pushed the store-config-description-in-airflow-config-parser branch 2 times, most recently from b24d5d4 to d71812a Compare July 19, 2023 05:15
@potiuk
Copy link
Member Author

potiuk commented Jul 19, 2023

Applied second round of comments from @jedcunningham.

For those who did not try it yet, the nice part of this change (after @jedcunningham comment) is that we can start promoting the "good" pattern of only changing the configuration variables that users actually want to change.

Part of the change is to make the airflow config list much more versatile and useful. The default output will (as it did) contain all the configuration (including currently set variables) in a very simple plain format without extra comments and boilerplate as it used to:

Screenshot 2023-07-19 at 07 08 39

But this command is now a bit of "swiss-army-knife" of configuration:

Screenshot 2023-07-19 at 07 09 38

Especially the --defaults switch is particularly useful to generate default "production" grade configuration. The configuration will have only the built-in airflow defaults (so it does not include the env variables set locally) and it will contains all the descriptions, and generated corresponding variable name for all the configurations, but all the values are commented out by default - so that user can very selectively only uncomment and change those values they want.

We've been promoting this kind of behaviour as a good pattern (only set those values that you want to be changed from defaults) and with such generated config file it is, I think, perfect way of doing it, because the user can see all the descriptions and examples right in the place the values can be edited, but the values are actually commented out, so it requires deliberate effort to change them - and users will easily see which of the values they actually changed.

Screenshot 2023-07-19 at 07 16 09

@potiuk potiuk force-pushed the store-config-description-in-airflow-config-parser branch 2 times, most recently from 6e930dc to b7e990e Compare July 19, 2023 08:33
@potiuk potiuk closed this Jul 19, 2023
@potiuk potiuk force-pushed the store-config-description-in-airflow-config-parser branch from b7e990e to 0fbef49 Compare July 19, 2023 09:01
@potiuk potiuk reopened this Jul 19, 2023
@potiuk
Copy link
Member Author

potiuk commented Jul 19, 2023

Green now. Looking for some more reviews and finally merging to continue with config move to providers and executors move :)

@potiuk potiuk force-pushed the store-config-description-in-airflow-config-parser branch from 132bf9a to 698ca1f Compare July 19, 2023 16:15
Copy link
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at most of the PR, except mainly the changes in configuration.py

@potiuk potiuk force-pushed the store-config-description-in-airflow-config-parser branch from 30c88e4 to 9f00983 Compare July 20, 2023 04:37
We would like to use the config.yml approach as our main source of truth
for airflow configuration. So far it has been split amongst multiple
files:

  * confg.yml -> descroption
  * default_airflow.cfg -> JINJA template to keep cofiguration, examples
    and description (used to generate airflow.cfg when airflow starts)
  * default_test.cfg -> storing test-only configuration used in some
    test cases - used to generate unittest.cfg
  * scripts/in_container/airflow_ci.cfg where dCI-specific configuration
    overwrote the unittest.cfg

This change consolidates it all into unified appraoch, where all
configuration information is retrieved from .yml files stored in
config_templates. No more additional template files processed by
JINJA, no more special CI versions of it, no more unittestdb.cfg file
where such configuration would be generated, no more unittestdb to
be used separately for tests.

* The default_*.cfg files were not real configuration files, becuase
  they were really JINJA templates and it got peoeple confused when
  copying the files. This change leaves the file empty with the
  comment that instructs the user how they can get the default
  configuration.
* The default_airflow.cfg is gone and instead, we have a way to
  show defaults via `airflow config list --defaults` command
* Unittest config is generated on-the-flight using defaults stored
  in confg_templates. constituing a single place where they need
  to be changed for the tests to use them
* internally, description of the configuration is stored in
  AirflowConfigurationParser and used by the parser to generate
  the default configuration when needed.
* we got rid of `{{{{` in templated config defaults by marking
  the templates with ``is_template`` and getting rid of processing
  those entries with regular formatting when generating the default
  values. This only concerns defaults from config.yml. Reading
  those configuration entries from file is unaffected.

This change aims to be 100% backwards compatible with the previous
implementation when it comes to functionality, even if internals
changed. It also does not add provider-specific changes that are
coming separately.

The only changes visible to the user are:

* generated airflow.cfg is slightly more
  readable and displays names of variables that can be used to override
  each configuration (which is very useful for copy&pasting)

* user are advised, instead of copying the default_airflow.cfg to use
  `airflow config list --defaults` to generate production config. This
  configuration has all the entries commented out, so that they can
  selectively uncomment and change the values they want. This is now
  promoted as "best practice" in the documentation.
@potiuk potiuk force-pushed the store-config-description-in-airflow-config-parser branch from 9f00983 to 9191b33 Compare July 20, 2023 05:34
@potiuk
Copy link
Member Author

potiuk commented Jul 20, 2023

Random infra-related stability problems. Merging finally :)

@potiuk potiuk merged commit faa8f54 into apache:main Jul 20, 2023
@potiuk potiuk deleted the store-config-description-in-airflow-config-parser branch July 20, 2023 06:20
pateash pushed a commit to pateash/airflow that referenced this pull request Jul 23, 2023
We would like to use the config.yml approach as our main source of truth
for airflow configuration. So far it has been split amongst multiple
files:

  * confg.yml -> descroption
  * default_airflow.cfg -> JINJA template to keep cofiguration, examples
    and description (used to generate airflow.cfg when airflow starts)
  * default_test.cfg -> storing test-only configuration used in some
    test cases - used to generate unittest.cfg
  * scripts/in_container/airflow_ci.cfg where dCI-specific configuration
    overwrote the unittest.cfg

This change consolidates it all into unified appraoch, where all
configuration information is retrieved from .yml files stored in
config_templates. No more additional template files processed by
JINJA, no more special CI versions of it, no more unittestdb.cfg file
where such configuration would be generated, no more unittestdb to
be used separately for tests.

* The default_*.cfg files were not real configuration files, becuase
  they were really JINJA templates and it got peoeple confused when
  copying the files. This change leaves the file empty with the
  comment that instructs the user how they can get the default
  configuration.
* The default_airflow.cfg is gone and instead, we have a way to
  show defaults via `airflow config list --defaults` command
* Unittest config is generated on-the-flight using defaults stored
  in confg_templates. constituing a single place where they need
  to be changed for the tests to use them
* internally, description of the configuration is stored in
  AirflowConfigurationParser and used by the parser to generate
  the default configuration when needed.
* we got rid of `{{{{` in templated config defaults by marking
  the templates with ``is_template`` and getting rid of processing
  those entries with regular formatting when generating the default
  values. This only concerns defaults from config.yml. Reading
  those configuration entries from file is unaffected.

This change aims to be 100% backwards compatible with the previous
implementation when it comes to functionality, even if internals
changed. It also does not add provider-specific changes that are
coming separately.

The only changes visible to the user are:

* generated airflow.cfg is slightly more
  readable and displays names of variables that can be used to override
  each configuration (which is very useful for copy&pasting)

* user are advised, instead of copying the default_airflow.cfg to use
  `airflow config list --defaults` to generate production config. This
  configuration has all the entries commented out, so that they can
  selectively uncomment and change the values they want. This is now
  promoted as "best practice" in the documentation.
@ephraimbuddy ephraimbuddy added the type:improvement Changelog: Improvements label Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants