-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement provider versioning tools #13767
Conversation
707dc10
to
80f591d
Compare
80f591d
to
2a3a540
Compare
Hey - especially @kaxil and @ashb - you've been waiting for this one and I think it is finally ready for review. I've refactored the original code of 'prepare_provider_packages.py" to simplify it a bit and to prepare for removal of backports in a few month's time. It should still work fine for backports (we will see it soon in CI) - but I mainly focused on getting the workflow agreed before in #11425 (comment) I separated it out into two ommits:
I described all the process in the dev/readme\s as usual, but this time iI also explained how you can easily debug the process step by step by manually running all the scripts. You do not have to use
Breeze encapsulates all of that in two commands, and has support for "bulk" operations on multiple providers. So for the "release" process More about how the process looks like from the release manager perspective:
First, you can quickly see which packages should be released. When running This is what you will see for packages that do not have a new version set in This way you will see the summary of all those changes and providers that might be candidates for releases. The summary of changes has all the links to commits - you can click through from the terminal directly. Additionally, you will see information if all the provider classes are properly named (the step will fail here and tell you if they are not): For packages that already have a new version in As a release manager, you can re-run this after merging new commits and it will be correctly updated (and you will see the incremental diffs every time you update). it's safe to commit those updated index.rst files. At the end of such run, you will see a summary showing for which packages documentation was generated and for which it was skipped (or errors). Skipped is when no newer version was added to The documentation generated is added to index.rst in the "doc" folder, so when we release the docs, we can simply point to it from the README (rather than add the changelog to the package README). The documentation in the provider's index contains both - High Level CHANGELOG.rst content and detailed per-commit changelog. You can see example Google Provider 2.0 docs generated this way: http://gabby-cough.surge.sh/ Once you (as release manager) update versions of the providers in other provider.yaml files this
Similar to generating documentation, provider packages can be generated in-bulk:
The above will go through all providers and generate the providers which are ready to be released (i.e those that have updated version in provider.yaml (and hopefully documentation generated and committed in the previous step). You will see the information about skipped packages (and why they are skipped) and green confirmation for those packages that are generated: and neo4j (it has never been generated so this is the first time it is generated): At the end you will see the summary showing the generated/skipped/errored packages: I think it is quite robust already, and the same code is run in the CI, so I hope it will be green soon, in the meantime - that's quite a lot of code, so probably good time to start reviewing. Unfortunately the change is significant and it's hard to split it , the best split I could do is to separate out the "docs" from the code (which I did - in separate commits). But I am also open if you have other suggestions on how to split it (but i think it ain't easy). |
Also - here is the content of README.md file generated for Google 2.0.0 package (this is what will be available in PyPI). As discussed in #11425 the content is much shorter - it contains only the most important information and link to detailed release notes (basically to this page: http://gabby-cough.surge.sh/) . Note that those pages eventually will contain changelogss for all the previous versions as well (we will incrementally add new versions to the changelo - latest at the top).
|
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
The Workflow run is cancelling this PR. Building image for the PR has been cancelled |
The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason. |
af27fbf
to
8cef255
Compare
The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason. |
will take a look tomorrow |
|
||
{{ CROSS_PROVIDERS_DEPENDENCIES_TABLE_RST | safe }} | ||
|
||
{%- endif %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, this page has too much content and some are missing (but the missing ones do not need to be added now). I like the README files. which is available in the documentation of Google libraries. https://googleapis.dev/python/firestore/latest/index.html
It is standardized too, but contains all the information you need.
-
Starting at the top. So I think it's worth adding a description of the package itself. This information is available in the
provider.yaml
file, although so far the descriptions are not very. precise.h ttps://github.com/apache/airflow/blob/1602ec97c8d5bc7a7a8b42e850ac6c7a7030e47d/airflow/provider.yaml.schema.json#L13-L16
I hope that we will be able to prepare descriptions in the future gradually, but so far I have made a minimum step and added a link to the product page for each provider. -
The next step is to link to the documentation, if we want to use the same file in setup.py and in the documentation. This will make it easier to find the documentation when you are in pypi.
We have these links in the side menu, but this is quite a new feature and not everyone uses this menu. Especially on mobile devices, I start reading from the package description, not the side menu.
-
The division into a detailed changelog and a normal changelog is strange for me. As for me, the contents of the ADDITIONAL_INFO.rst file should be a normal documentation page and have the title "Migration guide" because this is exactly what this file contains. As this file is fully human written, I don't think other scripts need to process it as well.
-
I think the detailed changelog and migration guides worth moving to new subpages, because it is not important for the new user. This then will look similar to the Google documentation
.
When we move the guides to new pages, we will also be able to easily prepare the migration guide from Airflow 1.10, because we already have a table that facilitates these migrations.
- The links in the table use markdown syntax instead of rst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, this page has too much content and some are missing (but the missing ones do not need to be added now). I like the README files. which is available in the documentation of Google libraries. https://googleapis.dev/python/firestore/latest/index.html
It is standardized too, but contains all the information you need.
- Starting at the top. So I think it's worth adding a description of the package itself. This information is available in the
provider.yaml
file, although so far the descriptions are not very. precise.h ttps://github.com/apache/airflow/blob/1602ec97c8d5bc7a7a8b42e850ac6c7a7030e47d/airflow/provider.yaml.schema.json#L13-L16
Sure. Happy to add it. This will be easy to incorporate.
- The next step is to link to the documentation, if we want to use the same file in setup.py and in the documentation. This will make it easier to find the documentation when you are in pypi.
I thiink the PyPI and documentation should be different than documentation. PyPI should be more focused on how to install packages and what are the limitations/requirements, but the "full" documentation" should be reached (including the ChangeLog which I think should not be part of README as it will grow over tim). This is how I approached it and I think this is a better idea than to keep the same documentation index and README.
We have these links in the side menu, but this is quite a new feature and not everyone uses this menu. Especially on mobile devices, I start reading from the package description, not the side menu.
Yeah. I will modify setup.cfg to include few more links (Documentation especially).
- The division into a detailed changelog and a normal changelog is strange for me. As for me, the contents of the ADDITIONAL_INFO.rst file should be a normal documentation page and have the title "Migration guide" because this is exactly what this file contains. As this file is fully human written, I don't think other scripts need to process it as well.
ADDITIONAL_INFO is for backport providers. IMHO this README is something that will evolve evertime. ADDITIONAL_INFO was more about migrating from 1.0 to backports. Effectively CHANGELOG should be written by human as a summary of changes in each version. The assumption is that there will be very rare "migration needed" for those. Google provider is a bit different than the others, because we migrate python APIs. But in most other packages this will not be a migration guide but human description on what changed in each version. Which I think CHANGELOG is a great name for.
- I think the detailed changelog and migration guides worth moving to new subpages, because it is not important for the new user. This then will look similar to the Google documentation
Sure. The Detailed guide can be added as a separate page in the docs and linked to. I thought about it as well. Happy to hear why others think but I think we should have this:
-
README.md in PyPI describing gerneral "what it is about" + all things important to installing PyPI + link to the Provider "index" documentation
-
Provider's index documentation with the same information added + link to detailed commit changelog. It might contain more information (but each Provider is free to add it in the index.rst above the automatically generated part). I do not want to discuss "standard" for this documentation here - we can discuss it separately. In this change I only want to focus on automatically generated part.
-
Detailed changelog - this might be separate page (happy to name it differently if someone has a good proposal).
When we move the guides to new pages, we will also be able to easily prepare the migration guide from Airflow 1.10, because we already have a table that facilitates these migrations.
We already have it in Backport providers. I think Provider's documentation should be more "forward" looking rather than focuing in the migration from 1.10.
We might instead point out to backport's PYPI readme and mention that for migration from 1.10 people
- The links in the table use markdown syntax instead of rst.
I think I fixed that, but I can take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
human description on what changed in each version. Which I think CHANGELOG is a great name for.
I am afraid nobody will write it by hand. Google automatically generates CHANGELOG.md files. Everyone has similar references to Github.
https://github.com/googleapis/python-bigquery/blob/master/CHANGELOG.md
googleapis/python-bigquery@b0e074f
To generate the changelog, they use thegoogleapis/release-please tool, which analyzes the message and generates a changelog based on it.
Release Please assumes you are using Conventional Commit messages.
The most important prefixes you should have in mind are:
fix: which represents bug fixes, and correlates to a SemVer patch.
feat: which represents a new feature, and correlates to a SemVer minor.
feat!:, or fix!:, refactor!:, etc., which represent a breaking change (indicated by the !) and will result in a SemVer major.
In the first version, I think that we don't need the changes to be divided into several categories, although that would be super functional, we can limit ourselves to listing the changes. If we publish these packages regularly, each release will not have a very long list of changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like what you have suggested here Jarek.
- Concise README
- Link to commit changelog
- detailed changelog is the Updating guide right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaxil For now, I can see that we have two changelogs, one with only a list of breaking changes and the detailed changelog with a list of all commits.
http://gabby-cough.surge.sh/#changelog
We don't have any updating guide yet.
I only suggest changing the name from "Changelog" to "Upgrading guide" and from "Detailed changelog" to "Changelog", because I am concerned that we will be able to maintain the changelog if we have as many contributions as we do now. After time, I suspect we'll be creating a handwritten changelog by copying only the titles of a few changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment here @kaxil @mik-laj about the naming/change process.
I am not totally against naming it UPDATING.md, but I think we are confusing a bit airflow and providers release cycle. I think we should get the naming right and content of this "manually written" parts.
I believe most of provider releases will be incremental updates - new hooks/operators added basically or new (backwards-compatible) changes in the operators. The changes that we have now in Google Providers are exception rather than rule. We only need to add the "upgrading" guide because we make backwards-incompatible changes due to 2.0 APIs. In vast majority of cases people should be able to do 'pip install --upgrade apache-airflow-providers-<NN>
and there should be no need to follow any "updating" guide. This is the main premise of semver, which communicates breaking changes. The only time where we need to add "updating" information is when we go 1.0.0 -> 2.0.0 etc.
Now - this means that vast majority of changes will be simply "new operator "xxx" added". That's why I think CHANGELOG.rst is a good name, and this is also a good name for the Heading in the docs.
While idally indeed we should be able to automaticaly get it from semantic commits (and I am more and more convinced that we should switch to those), this is a decision that will time to discuss and vote and apply and it will take some time for people to adjust to, and we can do it as a follow-up. For now I think this will be very little overhead if we just agree between committers, that every time a new stuff is added to provider, we ask the PR author to add entry in the CHANELOG.md and provider.yaml:
- Add new version in provider.yaml patchelevel number (for bugfixes) or minor version (for features)
- Add a line in CHANGELOG.txt to describe the change added.
I think this makes it nicely distributed among the PR authors, and decreases a lot the work needed by release manager (@kaxil :) - then PR authors and commiters will take care about updating the versions and the role of release manager will be just to review whether everything was included. Remember that release manager will have to sometimes release 10-20 providers at a time and mostly providers that the release manager knows very little about, so the less work is put on the shoulders of that person, the better.
Let me know WDYT @kaxil @mik-laj and @ashb (I am mentioning @ashb here for the release management part mainly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also one small comment here.
Looking at https://apache-airflow.slack.com/archives/CCR6P6JRL/p1611225089006700 - this is why I think we should have the detailed commit log (as a separate document linked from the index). Users are sometimes aware of the PRs they are interested in and want to know in which version they were included. And it's not very easy to find it out. We will have a separate set of tags for each provider, so for some people, it would be great to have it in this nicely formatted document rather than having to run some complex git commands to find out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh we should definitely have a Changelog which contains the commit log whether it is automated or done manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make it separate page then next to index.rst and link to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will also add a few hyperlinks here and there, so that we will be able to freely jump between those different pages -
- README in PyPI,
- index.rst in docs
- commit changes in docs
Those will be interlinked with version numbers so that we will be able to jump from PYPI 2.0.0 package straight to detailed commit log for 2.0.0 and 2.0.0 Changelog (and all the cross-links between those).
--> | ||
|
||
|
||
# Package apache-airflow-providers-amazon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want this Readme, but I thought we will generate the new readme -- is that only during release ? Would be good to have it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Only during release. We do not really need them here. There is very little value in keeping them in the repo, they will be available in PyPI. But If you really think they are needed, I can add them back. I just thought - since we already have the information there in setup.py, dependencies.json , the value of having it in the repo is very little. But I can be convince otherwise..
airflow/providers/res.txt
Outdated
@@ -0,0 +1,61 @@ | |||
./salesforce | |||
./celery | |||
./amazon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah not sure what this file is about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug file :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file needs to be removed right?
The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason. |
This change implements per-provider versioning tools. Version of the providers is retrieved from provider.yaml file (top-level verion). Documentation is generated in the documentation folder rather than in sources and embedded in provider's index. Backport providers remain as they were until we delete all the backport references in April 2021 nd then the code can be simplified and the backport functionality can be removed then. When generating multiple providers, only those that have version that has no corresponding `providers-<PROVIDER>/<VERSION>` are generated. Other providers are skipped with warnings. Old documentation is removed and new CHANGELOG.rst have been prepared for all providers to accomodate to the new process (which is comming as a follow-up commit) Fixes: apache#13272, apache#13271, apache#13274, apache#13276, apache#13277, apache#13275, apache#13273
9461fb0
to
f79cf30
Compare
Fixed comments @turbaszek and rebased to latest master: I also run it in my fork's master to verify no typos etc. https://github.com/potiuk/airflow/actions/runs/527627362 |
The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason. |
f79cf30
to
28c96ee
Compare
This change implements per-provider versioning tools. Version of the providers is retrieved from provider.yaml file (top-level verion). Documentation is generated in the documentation folder rather than in sources and embedded in provider's index. Backport providers remain as they were until we delete all the backport references in April 2021 nd then the code can be simplified and the backport functionality can be removed then. When generating multiple providers, only those that have version that has no corresponding `providers-<PROVIDER>/<VERSION>` are generated. Other providers are skipped with warnings. Old documentation is removed and new CHANGELOG.rst have been prepared for all providers to accomodate to the new process (which is comming as a follow-up commit) Fixes: #13272, #13271, #13274, #13276, #13277, #13275, #13273 (cherry picked from commit ac2f72c)
This change implements per-provider versioning tools. Version of the providers is retrieved from provider.yaml file (top-level verion). Documentation is generated in the documentation folder rather than in sources and embedded in provider's index. Backport providers remain as they were until we delete all the backport references in April 2021 nd then the code can be simplified and the backport functionality can be removed then. When generating multiple providers, only those that have version that has no corresponding `providers-<PROVIDER>/<VERSION>` are generated. Other providers are skipped with warnings. Old documentation is removed and new CHANGELOG.rst have been prepared for all providers to accomodate to the new process (which is comming as a follow-up commit) Fixes: #13272, #13271, #13274, #13276, #13277, #13275, #13273 (cherry picked from commit ac2f72c)
This change implements per-provider versioning tools. Version of the providers is retrieved from provider.yaml file (top-level verion). Documentation is generated in the documentation folder rather than in sources and embedded in provider's index. Backport providers remain as they were until we delete all the backport references in April 2021 nd then the code can be simplified and the backport functionality can be removed then. When generating multiple providers, only those that have version that has no corresponding `providers-<PROVIDER>/<VERSION>` are generated. Other providers are skipped with warnings. Old documentation is removed and new CHANGELOG.rst have been prepared for all providers to accomodate to the new process (which is comming as a follow-up commit) Fixes: #13272, #13271, #13274, #13276, #13277, #13275, #13273 (cherry picked from commit ac2f72c)
This change implements per-provider versioning tools. Version of the
providers is retrieved from provider.yaml file (top-level verion).
Documentation is generated in the documentation folder rather than
in sources and embedded in provider's index. Backport providers
remain as they were until we delete all the backport references in
April 2021 nd then the code can be simplified and the
backport functionality can be removed then.
When generating multiple providers, only those that have version
that has no corresponding
providers-<PROVIDER>/<VERSION>
aregenerated. Other providers are skipped with warnings.
Fixes: #13272, #13271, #13274, #13276, #13277, #13275, #13273
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.