Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mechanism to suspend providers #30422

Merged
merged 1 commit into from
Apr 4, 2023
Merged

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Apr 1, 2023

As agreed in https://lists.apache.org/thread/g8b3k028qhzgw6c3yz4jvmlc67kcr9hj we introduce mechanism to suspend providers from our suite of providers when they are holding us back to older version of dependencies.

Provider's suspension is controlled from a single suspend flag in provider.yaml - this flag is used to generate
providers_dependencies.json in generated folders (provider is skipped if it has suspended flag set to true. This is enough to exclude provider from the extras of airflow and (automatically) from being used when CI image is build and constraints are being generated, as well as from provider documentation/generation.

Also several parts of the CI build use the flag to filter out such suspended provider from:

  • verification of provider.yaml files in pre-commit is skipped in terms of importing and checking if classes are defined and listed in the provider.yaml
  • the "tests" folders for providers are skipped automatically if the provider has "suspend" = true set
  • in case of PR that is aimed to modify suspended providers directory tree (when it is not a global provider refactor) selective checks will detect it and fail such PR with appropriate message suggesting to fix the reason for suspention first
  • documentation build is skipped for suspended providers
  • mypy static checks will skip suspended provider folders while we will still run ruff checks on them (unlike mypy ruff does not expect the libraries that it imports to be available and we are running ruff in a separate environment where no airflow dependencies are installed anyway

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Apr 2, 2023

I've run a number of tests and here is an example of how it works.

Example builds with those suspended (just picked some arbitrary / representative ones): yandex, influxdb, apache.hdfs, pinot.

< # This constraints file was automatically generated on 2023-04-01T16:28:00Z
---
> # This constraints file was automatically generated on 2023-04-02T14:57:11Z
110c110
< blinker==1.5
---
> blinker==1.6
129d128
< ciso8601==2.3.0
261d259
< influxdb-client==1.36.1
367d364
< pinotdb==0.4.14
441d437
< reactivex==4.0.4
466c462
< semver==2.13.0
---
> semver==3.0.0
  • Not building documentation for suspended providers:

https://github.com/apache/airflow/actions/runs/4589278601/jobs/8104136038#step:6:606

  • Pytest not collecting the tests from suspended providers:

https://github.com/apache/airflow/actions/runs/4589278601/jobs/8104136262#step:6:346

Running command ['pytest',
 '--ignore=tests/providers/yandex', '--ignore=tests/providers/influxdb', '--ignore=tests/providers/apache/pinot', '--ignore=tests/providers/apache/hdfs', 
'--ignore=tests/system/providers/yandex', '--ignore=tests/system/providers/influxdb', '--ignore=tests/system/providers/apache/pinot', '--ignore=tests/system/providers/apache/hdfs', 
'--ignore=tests/integration/providers/yandex', '--ignore=tests/integration/providers/influxdb', '--ignore=tests/integration/providers/apache/pinot', '--ignore=tests/integration/providers/apache/hdfs',
 '--collect-only', '-qqqq', '--disable-warnings', 'tests']
  • Suspended packages not being prepared (cc: @eladkal):

https://github.com/apache/airflow/actions/runs/4589278601/jobs/8104136217#step:7:8343

 Starting the tests with those pytest arguments: --verbosity=0 --strict-markers --durations=100
 --maxfail=50 --color=yes --junitxml=/files/test_result-Providers-mysql.xml 
--timeouts-order moi --setup-timeout=60 --execution-timeout=60 --teardown-timeout=60
--output=/files/warnings-Providers-mysql.txt --disable-warnings -rfEX
--ignore=tests/providers/yandex --ignore=tests/system/providers/yandex --ignore=tests/integration/providers/yandex
--ignore=tests/providers/influxdb --ignore=tests/system/providers/influxdb --ignore=tests/integration/providers/influxdb
--ignore=tests/providers/apache/pinot --ignore=tests/system/providers/apache/pinot --ignore=tests/integration/providers/apache/pinot
--ignore=tests/providers/apache/hdfs --ignore=tests/system/providers/apache/hdfs --ignore=tests/integration/providers/apache/hdfs
--with-db-init tests/providers

Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of changes in here but LGTM

As agreed in https://lists.apache.org/thread/g8b3k028qhzgw6c3yz4jvmlc67kcr9hj
we introduce mechanism to suspend providers from our suite of providers
when they are holding us back to older version of dependencies.

Provider's suspension is controlled from a single `suspend` flag in
`provider.yaml` - this flag is used to generate
providers_dependencies.json in generated folders (provider is skipped
if it has `suspended` flag set to `true`. This is enough to exclude
provider from the extras of airflow and (automatically) from being
used when CI image is build and constraints are being generated,
as well as from provider documentation/generation.

Also several parts of the CI build use the flag to filter out
such suspended provider from:

* verification of provider.yaml files in pre-commit is skipped
  in terms of importing and checking if classes are defined and
  listed in the provider.yaml
* the "tests" folders for providers are skipped automatically
  if the provider has "suspend" = true set
* in case of PR that is aimed to modify suspended providers directory
  tree (when it is not a global provider refactor) selective checks
  will detect it and fail such PR with appropriate message suggesting
  to fix the reason for suspention first
* documentation build is skipped for suspended providers
* mypy static checks will skip suspended provider folders while we
  will still run ruff checks on them (unlike mypy ruff does not
  expect the libraries that it imports to be available and we are
  running ruff in a separate environment where no airflow dependencies
  are installed anyway
@potiuk potiuk force-pushed the add-possibilty-to-suspend-providers branch from 77d95bf to b44b5be Compare April 4, 2023 18:21
@potiuk
Copy link
Member Author

potiuk commented Apr 4, 2023

A lot of changes in here but LGTM

Big number of those is adding "suspended: false", for explicitness :)

BTW. The policy is merged, time for implementation :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right place for this doc?
The provider folder is user facing this doc is more for developers of Airflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really user facing. It won't find it's way to any user artifacts (similarly for example provider.yaml files are not user visible). The idea is to make it close to the place where it will be lkely to be seen by all contributors who check-out airflow and want to understand what to do when they want to suspend/resume. This is why I put it in providers.

But If we have a better place I can move it elsewhere though - do you have a better proposal that fits it's (above) purpose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a better proposal that fits it's (above) purpose?

I have but that involves moving many of our dev READMEs into a dedicated folder leaving only CONTRIBUTING.MD in the main tree level. I will find some time to work on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW. My philosophy is that the docs for contributors, should be as close as possible to the code you contribute to, which increases chances of finding out what you need to find (contributors check out the code and look at the relevant directory structure in their IDE), where for the users, it can be far, because you anyhow produce a documentation that might be completely differently structured than the source (see airflow documentation for example). But I'd love to see what you propose :)

@potiuk potiuk merged commit d23a3bb into main Apr 4, 2023
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Apr 11, 2023
potiuk added a commit to potiuk/airflow that referenced this pull request Apr 20, 2023
When provider's summary index gets generated it should not
include suspended providers. This has been missed in apache#30422
potiuk added a commit that referenced this pull request Apr 20, 2023
)

When provider's summary index gets generated it should not
include suspended providers. This has been missed in #30422
potiuk added a commit to potiuk/airflow that referenced this pull request Apr 22, 2023
This is a follow-up after apache#30422 and apache#30763 - it turns out that
locally building index of providers failed when some providers
are suspended. It only impacts dev workflow locally.
potiuk added a commit that referenced this pull request Apr 22, 2023
This is a follow-up after #30422 and #30763 - it turns out that
locally building index of providers failed when some providers
are suspended. It only impacts dev workflow locally.
ephraimbuddy pushed a commit that referenced this pull request Apr 23, 2023
This is a follow-up after #30422 and #30763 - it turns out that
locally building index of providers failed when some providers
are suspended. It only impacts dev workflow locally.

(cherry picked from commit 29fb38c)
@potiuk potiuk deleted the add-possibilty-to-suspend-providers branch May 6, 2023 10:19
@alexott
Copy link
Contributor

alexott commented May 24, 2023

Maybe instead of adding suspended: false to all providers, it would be easier to use something like: provider.get("suspended", False) ?

@potiuk
Copy link
Member Author

potiuk commented May 24, 2023

I chose (pretty deliberately) to do it explicitly.

According to Python's ZEN "explicit is better than implicit" - and I chose to add "suspended" explicitly - I have not optimized for size of change. Part of the change was also to make it clear that any provider can be suspended and make it more "discoverable". It was more to optimize the "target state" than the "PR size" -> even if it was 107 files, it was actually an easy one to review. I do not really like implicit parameters like that because you have to look in two places to understand a) what is default value and b) that there is such an option at all.

@alexott - why do you think it's better/easier (other than size of commit) ?

@alexott
Copy link
Contributor

alexott commented May 24, 2023

It was mostly about size of the commit…

@potiuk
Copy link
Member Author

potiuk commented May 24, 2023

It was mostly about size of the commit…

One time effort. I prefer better target state than smaller size in this case - especailly that those are all "text" (or yaml in this case) files, so it really has no risk/review dificulty involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:celery provider:cncf-kubernetes Kubernetes provider related issues provider:common-sql provider:databricks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants