Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add provider for Apache Kafka #30175

Merged
merged 14 commits into from
Apr 21, 2023
Merged

Add provider for Apache Kafka #30175

merged 14 commits into from
Apr 21, 2023

Conversation

dylanbstorey
Copy link
Contributor

@dylanbstorey dylanbstorey commented Mar 17, 2023


Pulls in a series of integrations to Kafka from airflow-provider-kafka (https://pypi.org/project/airflow-provider-kafka/) to core airflow.

I am the maintainer of that package and have permission to submit to OSS Airflow.

^ Add meaningful description above

@boring-cyborg
Copy link

boring-cyborg bot commented Mar 17, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@kaxil kaxil changed the title Kafka Provider Add provider for Apache Kafka Mar 17, 2023
@josh-fell
Copy link
Contributor

How does a user define an Airflow Connection to use these hooks?

@dylanbstorey
Copy link
Contributor Author

dylanbstorey commented Mar 18, 2023

How does a user define an Airflow Connection to use these hooks?

The config dict is the primary mechanism for describing the connection. What I'd really like is to make a connection that is "just" the config dictionary so people can use that instead. Beyond "read the docs" if you've got a few minutes to discuss what that would look like I'd gladly take the help.

The README.md and example_dags folder in the parent repo shows how to do this, however i'm not 100% certain what the standard is for incorporating this information into Airflow doc site.

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do more thorough review next week but in the meantime Im putting a request change to prevent merge before I/someone can review docs related to the provider and the file names

  1. connections.rst / operators.rst
  2. operators/_.py file name preferably be on service level and not action level. The action is the operator itself. That way we can have several operators organized in a single file.

@dylanbstorey dylanbstorey requested a review from ashb as a code owner March 18, 2023 18:24
@dylanbstorey dylanbstorey marked this pull request as draft March 18, 2023 18:26
@dylanbstorey dylanbstorey marked this pull request as ready for review March 24, 2023 12:42
@dylanbstorey dylanbstorey requested review from eladkal and removed request for ashb and potiuk March 29, 2023 16:36
@potiuk
Copy link
Member

potiuk commented Mar 30, 2023

FYI: I just started discussion on our devlist about formalising policies about approving new providers https://lists.apache.org/thread/xf4wn1hw08cs725fp962252y7mmptk3b

My proposal is to be pretty receptive to providers like Kafka (I think we need it). One question there which I have a preference but I will see what others think is to expect from providers like Kafka to also have integration tests implemented (we already have them for mongo, pinot. trino, cassandra, celery and kerberos - so it seems appropriate to also have them for Kafka).

@dylanbstorey -> what do you think about adding integration tests for the Kafka provider.

In short:

The nice thing about it is that it also makes it easier for the author to be sure that things work as expected - with real services, and when there are regressions, it makes it much easier for anyone to start such integration and fix problems, without having to set up everything from scratch (also it allows to test if the integration works with newever version of Kafka for example - usually by just changing a version of the integration docker image used and making a PR).

Ultimately, this is a way also for the author to be able to perform a hand-over of such provider to the community and make it easier for others to step-in and fix problems.

I am happy to help with implementing those tests. So @dylanbstorey - WDYT? Would that be too much to ask?

@kaxil
Copy link
Member

kaxil commented Mar 30, 2023

For other reviewers on why this is needed: This Provider gets ~180k monthly downloads.

https://pypistats.org/packages/airflow-provider-kafka

@dylanbstorey
Copy link
Contributor Author

@potiuk

Your note gave me a nice chuckle, I had already started down this path a bit but got "distracted" with some of the failing tests instead.

So my short answer (and i'll add it to the dev mailing list is), I think that requiring integration tests will raise the bar on entry, require additional work/support from the community, and ultimately will support a higher quality code base.

I'd glady accept your help here. Thanks.

@eladkal eladkal closed this Apr 7, 2023
@eladkal eladkal reopened this Apr 7, 2023
Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good to go, no block from me.

I can work on refactoring the get_conn part after the PR is merged, as it seems like an optimization rather than a convention, given that no one but me has commented on it.

@hussein-awala
Copy link
Member

Need to merge main and resolve the conflicts

@dylanbstorey
Copy link
Contributor Author

@hussein-awala - already did it at your request.

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets merge when CI is green

Copy link
Contributor

@josh-fell josh-fell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small comments and making sure test_connection still works, but LGTM otherwise. Lots of diligent, hard work here and it shows.

@potiuk
Copy link
Member

potiuk commented Apr 21, 2023

And you need to rebuild the breeze help images.

@potiuk
Copy link
Member

potiuk commented Apr 21, 2023

I restarted the last failing MSSQL test (flaky) but for all practical purposes, it's green. Merging.

@potiuk potiuk merged commit 522661b into apache:main Apr 21, 2023
@potiuk
Copy link
Member

potiuk commented Apr 21, 2023

🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉

@josh-fell
Copy link
Contributor

Huge. Congrats Dylan 🚀

@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants