-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split out providers into "standalone" python packages #33909
Comments
cc: @eladkal @uranusjr @Taragolis -> this captures some of the recent discussion from Slack: https://apache-airflow.slack.com/archives/CCPRP7943/p1692612215901809 At some point in time when (IF?) we have good idea how it can be done, we should bring it to the devlist. |
Another thought.... Maybe we do NOT have to solve the challenge of all-editable build for airflow for providers. I think we have a viable alternative....
Breeze could remain the "CI" driver and "All-editable" sources where latest source airflow + latest providers could be developed and tested (which is absolutely necessary BTW). Also cc: @mobuchowski and @hussein-awala as I know they are interested in DevEX and I think it would be great to do some discussion/brainstorming running about that and potential future changes to it in a bit more of an "interest group". Of course others are invited too. We could design together the future of Airflow dev-env for that and once we have the providers as "standalone" packages, we could then start a discussion "Does it make sense/ Do we want to split providers to a separate repo". I have no ready answer for that (lots of thoughts though). But I think this dicussion should happen after we have the providers as "standalone" packages in our main "monorepo". |
Honestly I’m not particularly keen on moving the packages to separate repositories since that’d just make contributing more difficult instead of easier. But making providers separate packages from the core Airflow could be useful. |
I tend to agree with that statement. But that discussion is yet to happen :). It's a bit of theoretical possibility and I wanted to only start it after we see potentially how separate packages inside airflow repo work for us. |
I like providers being standalone packages by default. In addition to the benefits described here, it would allow installing providers straight from github via pip. I agree with @uranusjr that being a monorepo instead of multiple separate repos is a net benefit, especially when also thinking about local development process, not only CI.
Ideally, there would also be no functional differences between Airflow running this way, and production image of Airflow - besides selection of installed/running providers. |
Agree. But this one is tricky as we rely on entrypoints (and this is what ProvidersManager and INSTALL_PROVIDERS_FROM_SOURCES attempts to workaround). But if we find another solution, I would gladly remove those hacks. That's why I was eyeying Hatch pypa/hatch#233 (hence my original comment about it). |
Body
As part of making it possible to move out of providers from Airflow core repository (we have not decided yet on it, we just want to make it possible) we should turn Airflow Providers into "real" packages.
Currently those packages are build "dynamically" -> https://github.com/apache/airflow/blob/main/dev/provider_packages/prepare_provider_packages.py is used as part of
breeze release-management prepare-provider-packages
to extract parts of the "airflow/providers/" sources dynamically, generate setup.* and pyproject.toml files dynamically and build the providers from those dynamically generated temporary folders.This has some disadvantages - for example it does not make reproducible builds possible, and it requires complex
breeze
command, CI image and the python script in the image to build the packages (CI image is used to make sure all dependencies are installed and to provide isolation and cleanup between builds, also it allows to isolate (security) the host from container building the packages whenbuilding the providers in case of builds from contributor forks).But this got us through last 3 years of releasing airflow and providers separately :).
With recent changes (#32604 upcoming #32048, the upcoming #33907 and a number of other changes already implemented in the past - we are quite close to make it possible to split out providers to "standalone" packages - where each provider is a separate "compliant with standard" package and has a complete independent directory where you can build the package without moving the sources around, using standard python tooling. This would require everything that relates to the providers to move to those directories (docs, tests are notably shared between airflow and providers in "docs" and "tests" and they should be moved around).
We should also eventually add automation of checking if there is anything left in core that refers to providers #11435.
The early draft POC attempts to add scripts to automate such migration and result of it can be seen here:
However, the challenge to solve for this one is to make it easy for contributors to contribute to airflow and providers together. We want to make it possible - simlarly as today to have an easy environment where you can edit both airflow and provider code and run tests, run airflow, run integration tests without extra hassle of installing and reinstalling the packages in editabkle mode.
Ideal workflow of the developer is where they can:
breeze
and be able to edit any sources locally in the host - and airflow in breeze should automatically pick the changes in both airlfow and providers when interpreter is startedThe way how to do it and choice of (ideally) standard python tooling to make such move is not yet determined and is open for discussion, POCs and proposals.
Note: Some of the modern tools from PyPA world - Hatch, flit are recently evolving and adding more features, which might likely make it possible to combine multiple packages from monorepo into a single development installation, and we would likely want to use one of the standard tools for that rather than develop our own. We might consider contributing to some of those tools to make them more suitable for us. Possibly we could combine several tools (and for example use flit for providers, and hatch for airflow to combine repos as hatch seems to be better suited and has a roadmap for monorepo/multi-project setup, while flit is slick and very focused - or so it seems).
Committer
The text was updated successfully, but these errors were encountered: