-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add provider for Apache Kafka #30175
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
How does a user define an Airflow Connection to use these hooks? |
The config dict is the primary mechanism for describing the connection. What I'd really like is to make a connection that is "just" the config dictionary so people can use that instead. Beyond "read the docs" if you've got a few minutes to discuss what that would look like I'd gladly take the help. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do more thorough review next week but in the meantime Im putting a request change to prevent merge before I/someone can review docs related to the provider and the file names
connections.rst
/operators.rst
- operators/_.py file name preferably be on service level and not action level. The action is the operator itself. That way we can have several operators organized in a single file.
FYI: I just started discussion on our devlist about formalising policies about approving new providers https://lists.apache.org/thread/xf4wn1hw08cs725fp962252y7mmptk3b My proposal is to be pretty receptive to providers like Kafka (I think we need it). One question there which I have a preference but I will see what others think is to expect from providers like Kafka to also have integration tests implemented (we already have them for mongo, pinot. trino, cassandra, celery and kerberos - so it seems appropriate to also have them for Kafka). @dylanbstorey -> what do you think about adding integration tests for the Kafka provider. In short:
The nice thing about it is that it also makes it easier for the author to be sure that things work as expected - with real services, and when there are regressions, it makes it much easier for anyone to start such integration and fix problems, without having to set up everything from scratch (also it allows to test if the integration works with newever version of Kafka for example - usually by just changing a version of the integration docker image used and making a PR). Ultimately, this is a way also for the author to be able to perform a hand-over of such provider to the community and make it easier for others to step-in and fix problems. I am happy to help with implementing those tests. So @dylanbstorey - WDYT? Would that be too much to ask? |
For other reviewers on why this is needed: This Provider gets ~180k monthly downloads. |
Your note gave me a nice chuckle, I had already started down this path a bit but got "distracted" with some of the failing tests instead. So my short answer (and i'll add it to the dev mailing list is), I think that requiring integration tests will raise the bar on entry, require additional work/support from the community, and ultimately will support a higher quality code base. I'd glady accept your help here. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good to go, no block from me.
I can work on refactoring the get_conn
part after the PR is merged, as it seems like an optimization rather than a convention, given that no one but me has commented on it.
Need to merge main and resolve the conflicts |
@hussein-awala - already did it at your request. |
Pulls in a series of integrations to Kafka from airflow-provider-kafka (https://pypi.org/project/airflow-provider-kafka/) to core airflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets merge when CI is green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small comments and making sure test_connection
still works, but LGTM otherwise. Lots of diligent, hard work here and it shows.
Co-authored-by: Josh Fell <[email protected]>
Pulls in a series of integrations to Kafka from airflow-provider-kafka (https://pypi.org/project/airflow-provider-kafka/) to core airflow.
And you need to rebuild the breeze help images. |
I restarted the last failing MSSQL test (flaky) but for all practical purposes, it's green. Merging. |
🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 |
Huge. Congrats Dylan 🚀 |
Pulls in a series of integrations to Kafka from
airflow-provider-kafka
(https://pypi.org/project/airflow-provider-kafka/) to core airflow.I am the maintainer of that package and have permission to submit to OSS Airflow.
^ Add meaningful description above