Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mechanism to suspend providers #30422

Merged
merged 1 commit into from
Apr 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@ jobs:
docs-filter: ${{ steps.selective-checks.outputs.docs-filter }}
skip-pre-commits: ${{ steps.selective-checks.outputs.skip-pre-commits }}
debug-resources: ${{ steps.selective-checks.outputs.debug-resources }}
suspended-providers-folders: ${{ steps.selective-checks.outputs.suspended-providers-folders }}
source-head-repo: ${{ steps.source-run-info.outputs.source-head-repo }}
pull-request-labels: ${{ steps.source-run-info.outputs.pr-labels }}
in-workflow-build: ${{ steps.source-run-info.outputs.in-workflow-build }}
Expand Down Expand Up @@ -884,6 +885,7 @@ jobs:
env:
RUNS_ON: "${{needs.build-info.outputs.runs-on}}"
PARALLEL_TEST_TYPES: "${{needs.build-info.output.parallel-test-types}}"
SUSPENDED_PROVIDERS_FOLDERS: "${{ needs.build-info.outputs.suspended-providers-folders }}"
PR_LABELS: "${{needs.build-info.outputs.pull-request-labels}}"
FULL_TESTS_NEEDED: "${{needs.build-info.outputs.full-tests-needed}}"
DEBUG_RESOURCES: "${{needs.build-info.outputs.debug-resources}}"
Expand Down Expand Up @@ -924,6 +926,7 @@ jobs:
env:
RUNS_ON: "${{needs.build-info.outputs.runs-on}}"
PARALLEL_TEST_TYPES: "${{needs.build-info.output.parallel-test-types}}"
SUSPENDED_PROVIDERS_FOLDERS: "${{ needs.build-info.outputs.suspended-providers-folders }}"
PR_LABELS: "${{needs.build-info.outputs.pull-request-labels}}"
FULL_TESTS_NEEDED: "${{needs.build-info.outputs.full-tests-needed}}"
DEBUG_RESOURCES: "${{needs.build-info.outputs.debug-resources}}"
Expand Down Expand Up @@ -976,6 +979,7 @@ jobs:
PR_LABELS: "${{needs.build-info.outputs.pull-request-labels}}"
FULL_TESTS_NEEDED: "${{needs.build-info.outputs.full-tests-needed}}"
PARALLEL_TEST_TYPES: "${{needs.build-info.output.parallel-test-types}}"
SUSPENDED_PROVIDERS_FOLDERS: "${{ needs.build-info.outputs.suspended-providers-folders }}"
DEBUG_RESOURCES: "${{needs.build-info.outputs.debug-resources}}"
BACKEND: "mysql"
PYTHON_MAJOR_MINOR_VERSION: "${{matrix.python-version}}"
Expand Down Expand Up @@ -1019,6 +1023,7 @@ jobs:
env:
RUNS_ON: "${{needs.build-info.outputs.runs-on}}"
PARALLEL_TEST_TYPES: "${{needs.build-info.output.parallel-test-types}}"
SUSPENDED_PROVIDERS_FOLDERS: "${{ needs.build-info.outputs.suspended-providers-folders }}"
PR_LABELS: "${{needs.build-info.outputs.pull-request-labels}}"
FULL_TESTS_NEEDED: "${{needs.build-info.outputs.full-tests-needed}}"
DEBUG_RESOURCES: "${{needs.build-info.outputs.debug-resources}}"
Expand Down Expand Up @@ -1061,6 +1066,7 @@ jobs:
env:
RUNS_ON: "${{needs.build-info.outputs.runs-on}}"
PARALLEL_TEST_TYPES: "${{needs.build-info.output.parallel-test-types}}"
SUSPENDED_PROVIDERS_FOLDERS: "${{ needs.build-info.outputs.suspended-providers-folders }}"
PR_LABELS: "${{needs.build-info.outputs.pull-request-labels}}"
PYTHON_MAJOR_MINOR_VERSION: "${{matrix.python-version}}"
FULL_TESTS_NEEDED: "${{needs.build-info.outputs.full-tests-needed}}"
Expand Down Expand Up @@ -1097,6 +1103,7 @@ jobs:
RUNS_ON: "${{needs.build-info.outputs.runs-on}}"
PARALLEL_TEST_TYPES: "${{needs.build-info.output.parallel-test-types}}"
PR_LABELS: "${{needs.build-info.outputs.pull-request-labels}}"
SUSPENDED_PROVIDERS_FOLDERS: "${{ needs.build-info.outputs.suspended-providers-folders }}"
FULL_TESTS_NEEDED: "${{needs.build-info.outputs.full-tests-needed}}"
DEBUG_RESOURCES: "${{needs.build-info.outputs.debug-resources}}"
BACKEND: "postgres"
Expand Down Expand Up @@ -1156,6 +1163,7 @@ jobs:
env:
RUNS_ON: "${{needs.build-info.outputs.runs-on}}"
PARALLEL_TEST_TYPES: "${{needs.build-info.output.parallel-test-types}}"
SUSPENDED_PROVIDERS_FOLDERS: "${{ needs.build-info.outputs.suspended-providers-folders }}"
PR_LABELS: "${{needs.build-info.outputs.pull-request-labels}}"
FULL_TESTS_NEEDED: "${{needs.build-info.outputs.full-tests-needed}}"
DEBUG_RESOURCES: "${{needs.build-info.outputs.debug-resources}}"
Expand Down Expand Up @@ -1197,6 +1205,7 @@ jobs:
env:
RUNS_ON: "${{needs.build-info.outputs.runs-on}}"
PARALLEL_TEST_TYPES: "Quarantined"
SUSPENDED_PROVIDERS_FOLDERS: "${{ needs.build-info.outputs.suspended-providers-folders }}"
PR_LABELS: "${{needs.build-info.outputs.pull-request-labels}}"
PYTHON_MAJOR_MINOR_VERSION: "${{needs.build-info.outputs.default-python-version}}"
DEBUG_RESOURCES: "${{needs.build-info.outputs.debug-resources}}"
Expand Down
23 changes: 14 additions & 9 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,7 @@ repos:
name: Checks setup extra packages
description: Checks if all the libraries in setup.py are listed in extra-packages-ref.rst file
language: python
files: ^setup\.py$|^docs/apache-airflow/extra-packages-ref\.rst$
files: ^setup\.py$|^docs/apache-airflow/extra-packages-ref\.rst$|^airflow/providers/.*/provider\.yaml$
pass_filenames: false
entry: ./scripts/ci/pre_commit/pre_commit_check_setup_extra_packages_ref.py
additional_dependencies: ['rich>=12.4.4']
Expand Down Expand Up @@ -350,7 +350,7 @@ repos:
name: Update extras in documentation
entry: ./scripts/ci/pre_commit/pre_commit_insert_extras.py
language: python
files: ^setup\.py$|^CONTRIBUTING\.rst$|^INSTALL$
files: ^setup\.py$|^CONTRIBUTING\.rst$|^INSTALL$|^airflow/providers/.*/provider\.yaml$
pass_filenames: false
additional_dependencies: ['rich>=12.4.4']
- id: check-extras-order
Expand Down Expand Up @@ -698,7 +698,7 @@ repos:
files: ^dev/breeze/.*$
pass_filenames: false
require_serial: true
additional_dependencies: ['click', 'rich>=12.4.4']
additional_dependencies: ['click', 'rich>=12.4.4', 'pyyaml']
- id: check-system-tests-present
name: Check if system tests have required segments of code
entry: ./scripts/ci/pre_commit/pre_commit_check_system_tests.py
Expand Down Expand Up @@ -844,10 +844,15 @@ repos:
name: Update output of breeze commands in BREEZE.rst
entry: ./scripts/ci/pre_commit/pre_commit_breeze_cmd_line.py
language: python
files: ^BREEZE\.rst$|^dev/breeze/.*$|^\.pre-commit-config\.yaml$
files: >
(?x)
^BREEZE\.rst$|^dev/breeze/.*$|
^\.pre-commit-config\.yaml$|
^scripts/ci/pre_commit/pre_commit_breeze_cmd_line.py$|
^generated/provider_dependencies.json$
require_serial: true
pass_filenames: false
additional_dependencies: ['rich>=12.4.4', 'rich-click>=1.5', 'inputimeout']
additional_dependencies: ['rich>=12.4.4', 'rich-click>=1.5', 'inputimeout', 'pyyaml']
- id: check-example-dags-urls
name: Check that example dags url include provider versions
entry: ./scripts/ci/pre_commit/pre_commit_update_example_dags_paths.py
Expand Down Expand Up @@ -894,31 +899,31 @@ repos:
entry: ./scripts/ci/pre_commit/pre_commit_mypy.py
files: ^dev/.*\.py$
require_serial: true
additional_dependencies: ['rich>=12.4.4', 'inputimeout']
additional_dependencies: ['rich>=12.4.4', 'inputimeout', 'pyyaml']
- id: run-mypy
name: Run mypy for core
language: python
entry: ./scripts/ci/pre_commit/pre_commit_mypy.py --namespace-packages
files: \.py$
exclude: ^.*/_vendor/|^airflow/migrations|^airflow/providers|^dev|^docs|^provider_packages|^tests/providers|^tests/system/providers
require_serial: true
additional_dependencies: ['rich>=12.4.4', 'inputimeout']
additional_dependencies: ['rich>=12.4.4', 'inputimeout', 'pyyaml']
- id: run-mypy
name: Run mypy for providers
language: python
entry: ./scripts/ci/pre_commit/pre_commit_mypy.py --namespace-packages
files: ^airflow/providers/.*\.py$|^tests/providers/\*\.py$|^tests/system/providers/\*\.py$
exclude: ^.*/_vendor/
require_serial: true
additional_dependencies: ['rich>=12.4.4', 'inputimeout']
additional_dependencies: ['rich>=12.4.4', 'inputimeout', 'pyyaml']
- id: run-mypy
name: Run mypy for /docs/ folder
language: python
entry: ./scripts/ci/pre_commit/pre_commit_mypy.py
files: ^docs/.*\.py$
exclude: ^docs/rtd-deprecation
require_serial: true
additional_dependencies: ['rich>=12.4.4', 'inputimeout']
additional_dependencies: ['rich>=12.4.4', 'inputimeout', 'pyyaml']
- id: check-provider-yaml-valid
name: Validate provider.yaml files
entry: ./scripts/ci/pre_commit/pre_commit_check_provider_yaml_files.py
Expand Down
11 changes: 11 additions & 0 deletions Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -890,6 +890,17 @@ EXTRA_PYTEST_ARGS=(
"-rfEX"
)

if [[ ${SUSPENDED_PROVIDERS_FOLDERS=} != "" ]]; then
for provider in ${SUSPENDED_PROVIDERS_FOLDERS=}; do
echo "Skipping tests for suspended provider: ${provider}"
EXTRA_PYTEST_ARGS+=(
"--ignore=tests/providers/${provider}"
"--ignore=tests/system/providers/${provider}"
"--ignore=tests/integration/providers/${provider}"
)
done
fi

if [[ "${TEST_TYPE}" == "Helm" ]]; then
_cpus="$(grep -c 'cpu[0-9]' /proc/stat)"
echo "Running tests with ${_cpus} CPUs in parallel"
Expand Down
5 changes: 5 additions & 0 deletions airflow/provider.yaml.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@
"type": "string"
}
},
"suspended": {
"description": "If set to true, the provider is suspended and it's not a candidate for release nor contributes dependencies to constraint calculations/CI image. Tests are excluded.",
"type:": "boolean"
},
"dependencies": {
"description": "Dependencies that should be added to the provider",
"type": "array",
Expand Down Expand Up @@ -326,6 +330,7 @@
"name",
"package-name",
"description",
"suspended",
"dependencies",
"versions"
]
Expand Down
150 changes: 150 additions & 0 deletions airflow/providers/SUSPENDING_AND_RESUMING_PROVIDERS.rst
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right place for this doc?
The provider folder is user facing this doc is more for developers of Airflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really user facing. It won't find it's way to any user artifacts (similarly for example provider.yaml files are not user visible). The idea is to make it close to the place where it will be lkely to be seen by all contributors who check-out airflow and want to understand what to do when they want to suspend/resume. This is why I put it in providers.

But If we have a better place I can move it elsewhere though - do you have a better proposal that fits it's (above) purpose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a better proposal that fits it's (above) purpose?

I have but that involves moving many of our dev READMEs into a dedicated folder leaving only CONTRIBUTING.MD in the main tree level. I will find some time to work on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW. My philosophy is that the docs for contributors, should be as close as possible to the code you contribute to, which increases chances of finding out what you need to find (contributors check out the code and look at the relevant directory structure in their IDE), where for the users, it can be far, because you anyhow produce a documentation that might be completely differently structured than the source (see airflow documentation for example). But I'd love to see what you propose :)

Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

.. http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.

Suspending providers
====================

As of April 2023, we have the possibility to suspend individual providers, so that they are not holding
back dependencies for Airflow and other providers. The process of suspending providers is described
in `description of the process <https://github.com/apache/airflow/blob/main/README.md#suspending-releases-for-providers>`_

Technically, suspending a provider is done by setting ``suspended : true``, in the provider.yaml of the
provider. This should be followed by committing the change and either automatically or manually running
pre-commit checks that will either update derived configuration files or ask you to update them manually.
Note that you might need to run pre-commit several times until all the static checks pass,
because modification from one pre-commit might impact other pre-commits.

If you have pre-commit installed, pre-commit will be run automatically on commit. If you want to run it
manually after commit, you can run it via ``breeze static-checks --last-commit`` some of the tests might fail
because suspension of the provider might cause changes in the dependencies, so if you see errors about
missing dependencies imports, non-usable classes etc., you will need to build the CI image locally
via ``breeze build-image --python 3.7 --upgrade-to-newer-dependencies`` after the first pre-commit run
and then run the static checks again.

If you want to be absolutely sure to run all static checks you can always do this via
``pre-commit run --all-files`` or ``breeze static-checks --all-files``.

Some of the manual modifications you will have to do (in both cases ``pre-commit`` will guide you on what
to do.

* You will have to run ``breeze setup regenerate-command-images`` to regenerate breeze help files
* you will need to update ``extra-packages-ref.rst`` and in some cases - when mentioned there explicitly -
``setup.py`` to remove the provider from list of dependencies.

What happens under-the-hood as the result, is that ``generated/providers.json`` file is updated with
the information about available providers and their dependencies and it is used by our tooling to
exclude suspended providers from all relevant parts of the build and CI system (such as building CI image
with dependencies, building documentation, running tests, etc.)


Additional changes needed for cross-dependent providers
=======================================================

Those steps above are usually enough for most providers that are "standalone" and not imported or used by
other providers (in most cases we will not suspend such providers). However some extra steps might be needed
for providers that are used by other providers, or that are part of the default PROD Dockerfile:

* Most of the tests for the suspended provider, will be automatically excluded by pytest collection. However,
in case a provider is dependent on by another provider, the relevant tests might fail to be collected or
run by ``pytest``. In such cases you should skip the whole test module failing to be collected by
adding ``pytest.importorskip`` at the top of the test module.
For example if your tests fail because they need to import ``apache.airflow.providers.google``
and you have suspended it, you should add this line at the top of the test module that fails.

Example failing collection after ``google`` provider has been suspended:

.. code-block:: txt

_____ ERROR collecting tests/providers/apache/beam/operators/test_beam.py ______
ImportError while importing test module '/opt/airflow/tests/providers/apache/beam/operators/test_beam.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.7/importlib/__init__.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/providers/apache/beam/operators/test_beam.py:25: in <module>
from airflow.providers.apache.beam.operators.beam import (
airflow/providers/apache/beam/operators/beam.py:35: in <module>
from airflow.providers.google.cloud.hooks.dataflow import (
airflow/providers/google/cloud/hooks/dataflow.py:32: in <module>
from google.cloud.dataflow_v1beta3 import GetJobRequest, Job, JobState, JobsV1Beta3AsyncClient, JobView
E ModuleNotFoundError: No module named 'google.cloud.dataflow_v1beta3'
_ ERROR collecting tests/providers/microsoft/azure/transfers/test_azure_blob_to_gcs.py _


The fix is to add this line at the top of the ``tests/providers/apache/beam/operators/test_beam.py`` module:

.. code-block:: python

pytest.importorskip("apache.airflow.providers.google")


* Some of the other providers might also just import unconditionally the suspended provider and they will
fail during provider verification step in CI. In this case you should turn the provider imports
into conditional imports. For example when import fails after ``amazon`` provider has been suspended:

.. code-block:: txt

Traceback (most recent call last):
File "/opt/airflow/scripts/in_container/verify_providers.py", line 266, in import_all_classes
_module = importlib.import_module(modinfo.name)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name, package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/local/lib/python3.7/site-packages/airflow/providers/mysql/transfers/s3_to_mysql.py", line 23, in <module>
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
ModuleNotFoundError: No module named 'airflow.providers.amazon'

or:

.. code-block:: txt

Error: The ``airflow.providers.microsoft.azure.transfers.azure_blob_to_gcs`` object in transfers list in
airflow/providers/microsoft/azure/provider.yaml does not exist or is not a module:
No module named 'gcloud.aio.storage'

The fix for that is to turn the feature into an optional provider feature (in the place where the excluded
``airflow.providers`` import happens:

.. code-block:: python

try:
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
except ImportError as e:
from airflow.exceptions import AirflowOptionalProviderFeatureException

raise AirflowOptionalProviderFeatureException(e)


* In case we suspend an important provider, which is part of the default Dockerfile you might want to
update the tests for PROD docker image in ``docker_tests/test_prod_image.py``.

* Some of the suspended providers might also fail ``breeze`` unit tests that expect a fixed set of providers.
Those tests should be adjusted (but this is not very likely to happen, because the tests are using only
the most common providers that we will not be likely to suspend).


Resuming providers
==================

Resuming providers is done by reverting the original change that suspended it. In case there are changes
needed to fix problems in the reverted provider, our CI will detect them and you will have to fix them
as part of the PR reverting the suspension.
1 change: 1 addition & 0 deletions airflow/providers/airbyte/provider.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ name: Airbyte
description: |
`Airbyte <https://airbyte.io/>`__

suspended: false
versions:
- 3.2.1
- 3.2.0
Expand Down
1 change: 1 addition & 0 deletions airflow/providers/alibaba/provider.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ name: Alibaba
description: |
Alibaba Cloud integration (including `Alibaba Cloud <https://www.alibabacloud.com//>`__).

suspended: false
versions:
- 2.3.0
- 2.2.0
Expand Down
1 change: 1 addition & 0 deletions airflow/providers/amazon/provider.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ name: Amazon
description: |
Amazon integration (including `Amazon Web Services (AWS) <https://aws.amazon.com/>`__).

suspended: false
versions:
- 7.4.0
- 7.3.0
Expand Down
1 change: 1 addition & 0 deletions airflow/providers/apache/beam/provider.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ name: Apache Beam
description: |
`Apache Beam <https://beam.apache.org/>`__.

suspended: false
versions:
- 4.3.0
- 4.2.0
Expand Down
1 change: 1 addition & 0 deletions airflow/providers/apache/cassandra/provider.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ name: Apache Cassandra
description: |
`Apache Cassandra <http://cassandra.apache.org/>`__.

suspended: false
versions:
- 3.1.1
- 3.1.0
Expand Down
1 change: 1 addition & 0 deletions airflow/providers/apache/drill/provider.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ name: Apache Drill
description: |
`Apache Drill <https://drill.apache.org/>`__.

suspended: false
versions:
- 2.3.2
- 2.3.1
Expand Down
1 change: 1 addition & 0 deletions airflow/providers/apache/druid/provider.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ name: Apache Druid
description: |
`Apache Druid <https://druid.apache.org/>`__.

suspended: false
versions:
- 3.3.1
- 3.3.0
Expand Down
Loading