-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove taskflow code from Airflow intro example #26486
Conversation
How about in all the prominent "first time" places we promote both versions and clearly distinguish them and even explain who is the target for both ? I personally think writing taskflow api is easier/better for some groups of users and writing "classic" operators is easier/better for some other groups. And we do not know who reads this when they first see Airflow. So rather than choosing one or the other, maybe we should simply say: * Example here.
Example here I think this would be much better and paint better picture of Airlfow in the mind of both groups of users and make Airflow feel much more modern (I have a feeling that promoting classic operators in the first page makes it more "old-school" and when we deliberately explain that both ways are equivalent and targetting different kinds of users, it's not only showing the 'modern" face but also showing that we delibereately gave our users the choice and even helped them to make the choice. |
I thought a bit about that - and I think we could even visually distinguish them - use different theme for TaskFlow examples and diferent theme for Classic examples - this could decrease the cognitive load on users whou would have to otherwise parse those to understand which "realm" they come from., but by visually separating the example and possibly even showing them side-by side, we could achieve much more universal message. |
I don't think the intro page is the right place for introducing multiple ways to do one thing. An intro page is for an "Airflow-curious" person, that just wants to grasp what Airflow is all about. The code sample should exemplify that Airflow can run tasks using different technologies (Bash & Python in this case), but is not there to demonstrate all ways one can write code in Airflow. I suggest introducing the TaskFlow API at a later stage in the docs. |
Just wondering - shouldn't the "Airfllow Curious" person know that there are two ways - largely equivalent in terms of what you can achive but vastly different in terms of who is doing it ? Shouldn't that "curious" person not be aware about taskflow? I think we risk that that person stops right there and gets put off by the "old, not too pythonic" style that they're used to from elsewhere (mainly data science world). There are many ways you can do it - inclufing tabs switching between different variants - to not cause too much of "cognitive overload". For example https://cloud.google.com/secret-manager/docs/create-secret#secretmanager-quickstart-python. I think this page should - at the very least mention task flow and have a link to "and this is the 'functional' way of doing things" (with the strong preference of having both approaches largely equivalent) |
I expect an "Airflow curious" person to understand what Airflow is after reading the home page: an orchestration tool that allows you to manage the execution of tasks using different technologies. I don't think implementation details are relevant on this page. |
Why do we show the Classic DAG then? It's the same kind of implementation detail. All what I am saying is that showing detail like this suggests that this is the way of doing things. And the user might suspect we put thise details on the first page because we think it is important for the user to know that you can write the DAGs in this way. Is it not important to tell the user it can be done in a different way at least ? |
My take: Intro should be focused on what you can do and how easily you can do it in the highest level possible and with the minimal skill set required from users. This is the basic of the basic. I'd like to point that there are Airflow users who knows basic Python and have no idea what decorators are. Personaly, I don't want to discuorage them or ask them to go to Python docs read about decorators and then come back to Airflow docs. |
At the end of the day, Airflow is a framework that you use to write code. So I think it's relevant to show a code sample to convey "here's how a workflow might look". But the main page is not the place to demonstrate all possible options, so any code sample should be as simple as possible. So
Not on this page, no. |
I feel we should stick to a single style here, classic or taskflow. I'd be equally happy to have hello be a taskflow task too - I don't think we gain a ton showing we can run arbitrary bash here. I think a simple hello world showing task dependencies is enough. I do like the classic DAG context manager though (but that's my bias showing 🙃). |
I'm hesitant to show the TaskFlow style in this example because decorators are typically not learned on day 1 of Python. The home page should be understandable to all audiences. Support for TaskFlow style operators is still limited. You can't run Bash code using the TaskFlow API, causing this example to be unnecessarily complex when converted to the TaskFlow API because you have to mix concepts: "standard" style operators and "taskflow" style operators to achieve the same. Also, I think the beauty of the example in this PR is that it demonstrates the ability to run tasks using different technologies which is one of Airflow's key strengths. If you were to convert this example to a full TaskFlow API style example, you'd have two PythonOperators. Additionally, this proposed (or put back) code sample demonstrates nicely in the UI: you clearly see one BashOperator, and one PythonOperator. Using the TaskFlow operators displays Not trying to bring down the TaskFlow API but I think it's better introduced elsewhere. |
I thought a lot about it (while running yesterday - 18km is long enough to clear the head). And I am still not really convinced we should do it this way. I am even willing to try to attempt to make a small POC of "tab" approach where we show both "Classic" approach and "TaskFlow" one when we click on a separate tab (and even apply it to many examples that @josh-fell had already converted to the TaskFlow). Looking at the comments in this thread I tried to convince myself that not showing at all TaskFlow there is a good idea, but - I could not. Why I think so? There are few reasons, let me elaborate on my thinking: I think there are many more users that use currently mostly the Classic approach. And this is fine. And it will stay like that. And it gives huge competitve edge over other solutions that work the other way round - using decorator's approach as a primary way of doing things. Airflow has literally thousands of operators/sensors/transfers that work out-of-the-box and you can treat them mostly like black-box). And we want to attract those users for sure (however for those users declarative approach would be even better and this is something that we will hopefully have some day as another way of writing DAGs. Some day, possibly not that far in the future). And I fully agree when those users open the Airlfow "main" page, showing the "classic" approach is way better. However I think we do not only want to attract those kinds of users, we also want to attract those that know and love Python well and they have gone (similarly like us) through the recent Python 2.7 -> Python 3.7+ and they are eager to have more flexibility, less boilerplate, more functional/immutable approach - all those things that make your Python code slick and neat and much easier to read and reason about actually (and those are things that make you so enthusiastic about Python that idea of going back to "old days" might put you off). Those users are often using other solutions possibly, that they can write their DAGs more "pythonically". And we want to promote Airflow to those users, showing the more "modern" and "slick" face of Airflow. And I think if I were using such a solution currently and "Airlfow" would become an option for me, I would likely take a look at the main page and try to figure out if Airflow is for me - without going any deeper. And (trying to put myself in the shoes of such person) - I coudl be put off, if it was not clear from the main page that you can do things in a more "slick" way. Have you actually tried to find out that Airflow has a task flow using the current documentation? It's referred in a few places. There is a "Task Flow Tutorial" but It's not quite prominent enought that a busy person who does a first pass evaluation of a tool like Airlfow would find easily. And for many of those "power users" this might end up with "meh, not worth it" after seeing the example. I think it is something that we cannot forget that not everyone there knows what Airflow is about. They've heard something maybe, maybe they read a blog post listing Airlfow as a "must have" tool or maybe they saw it in some "Modern Data Stack" diagrams among around a million other logos. But we should assume they came to the main page of airlfow docs to learn what Airlfow is. And they might stop right there. Unfortunately - we do not know who is looking. I think if you consider someone who does not know Airflow already - they will end up on the main page to learn what Airflow is - REGARDLESS if they are power Python users or newbies. We have no way to check it, but I think this is what will happen for both types of users. And not giving a clue that Airlfow DAGs can be written in task flow way is a risk that they will leave the page with that impression in their mind - that it works using "kinda dated approach to define DAGs". What's even more - those Power users are more likely than Python newbies to spread the news. There are various ways information about a tool spreads, but one of the most powerful ways is that your local respected Python Guru says "Yeah, I looked at it and Airflow is popular but it does not really have a modern way of writing your DAGs" vs. "Yeah I have not used it but from a quick look it seems to be popular and ALSO suports the modern way of writing your DAGs". This is what I am afraid of when it comes to the "first imipression". There is this very true statement - "You never get a second chance to make the First impression". And what I am really about is to give a chance to make good first impression for both kinds of users. One more comment that I want to add and something that we should take into account. I think making airflow "main" page showing only the "classic' way, make our docs extremely inconistent in the way how we treat "new users". For example when new people want to learn about all kinds of PythonOperators including ShortCircuit operator - they will ONLY learn about the TaskFlow approach. There are not even examples there on how you can use the "classic version" of those any more. Given the popularity of Python operators, this is far more confusing for users who do not know how to use decorators. Imagine you just looked at Airlfow is, you saw the main page with "classic" approach, and then you want to try your first Python operator and look at the examples and you see the decorators. Confusing? Like crazy. I think no matter what approach we take, but whatever we do - we should do it consistently. Like Now. The "current" 2.4.0 where classic and taskflow apprach are mixed at least have a chance to prepare the user that there is something like TaskFlow. And I think working out an approach where you can somehow address both kind of users is important to make a conscious decision that we apply consistently. cc: @rossturk, I know you've been thinking a lot about the main page of Airlfow and with your "newcomer" view you can probably provide a bit more of a "fresh" look/view. Would love to hear your opinion on that one. |
BTW. I think this has been addressed in Airlfow 2.4.0 |
We might consider not using lambda then too. That said, even if folks didn't understand how decorators actually work, I still think they will get the gist of what's going on - it's pretty easy to grok what happens I think. |
Actually, yes I do. I understand perfectly fine there's a taskflow API. I think you're not understanding the context. As I said, this is the first page a reader will see when opening the documentation. This introduces a narrative: what is Airflow, what can it do, why would I use Airflow? Etc. The main page is not the place to show a user all possible ways to write a DAG. Not everybody is a full time Python engineer and most people that will read this page are just starting to understand what Airflow is about, so examples should be as simple as possible. Have you ever had a training, read a book, or heard a conference talk where multiple concepts were introduced in the first minute? No. People process new information in steps, you can't introduce 10 new concepts in one go. You explain new concepts gradually. For this reason, I think it's a bad idea to show both a "regular" operator and a "taskflow" operator. Assuming the reader is new to Airflow, there's already a DAG, operators, and dependencies. You want to display as few as possible concepts to introduce an example. The "regular" operators easily show what it's running: BashOperator runs Bash, PythonOperator runs Python. TaskFlow is different from regular operators, it uses decorators and I don't think it's immediately clear why you have to call a TaskFlow task, i.e. The introduction to Airflow should focus on what Airflow is and display an absolute bare-minimum example of what Airflow code might look like. It should not demonstrate all possible ways to write a DAG. |
I do understand context. And I don't want to show "all possible ways". I just want to avoid people judging book (wrongly) by it's cover. Because this is what most people do (judge book by the cover) and we should IMHO make sure that we are not wrongly judged. This is my only point. |
And let me explain this. I also think the current way it is presented is wrong. But I think we disagree on the way it shoudl be fixed. I think we should hide the taskflow behind the extra "tab" that user will have to click to reveal the task flow version. All I am asking for is to have it discoverable in the first page that there is another way and be able to see it without going to a separate page. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
So, it is too bad that we are now stuck with a confirmed bad situation because we seem not to be able to decide between option A and option B. I suggest accepting the change by @BasPH (it's an improvement) and if @potiuk wants to do a follow up that can be a subsequent PR (also an improvement). Let's get out of the bad situation first and let's not let get perfect get in the way of good. For the discussion: I think we should assume familiarity with Python for people coming to Airflow. Afterall, if you would like to use Airflow it requires python code to write a pipeline. As Python has a much larger ecosystem than Airflow and the "classic" style is basically a DSL on top of Python. That's an extra cognitive load as I basically need to learn another language. Therefore, I am in favour of exposing Taskflow as the default as it is more natural to Pythonistas. The classic approach is, as mentioned, a DSL and an opinionated way of doing certain things. This is great if you want a certain kind of standardization or want to limit what people can do (hey maybe we should even have a policy that only allows certain operators and disables taskflow for those use cases!). (And maybe this should be on the mailinglist :-P) @BasPH can you please rebase? |
I think we actually differ in assesment whether the current situation is bad or not. It's not at all obvious. I personally think the current situation is not bad. And yes, absolutely. I think it should be discussed in the devlist whether we should change it and in which direction before we do any action. I searched the root cause of this. Converting most of our examples to Functional examples have been done based on this issue #9415 (which I was not an author nor even supporter BTW :) . And many of the changes done there were approved/proposed by vairous people. I could not find any deliberate decision on doing it. I think we are just about to improve our documenatation structure: #27235 and this is perfect opportunity to discuss (in the devlist) what should be general approach for taskflow/classic examples in the documentation. Changing status quo for one exmple without deliberate decision on our approach without discussing in devlist how we should generally approach it is repeating the same mistake where we implemented changes to many examples without thinking if it's good or bad. Let's not repeat the same mistake again - let's just agree on the direction we want to take before we make any change. |
@potiuk quoting yourself That discussion is where to focus in the future, but that should not stop us from improving the now. #27235 is great! But we should also not let its scope creep. That PR is about the structure and does not concern the content and how it is presented. Follow ups will happen to that PR. |
'And let me explain this. I also think the current way it is presented is wrong' -> yep I was referring to the fact that we do the same thing in multiple other places - almost the same prominent (or even more) as the "main" page. We just have not decided what we want to to do. I am ok for now to switch everything non-taskflow to classic if we are unsure decide what to do next. Let someone proposes it on the devlist and ask for lasy consensus on it and proposes next steps. But we should agree to it and diligently review it and apply our docs to fix it, not fix "one instance" of it even if we want to fix the "main" page. I am ALL for consistency, The problem with this fix is that it changes it in one place while leaving way worse "taskflow" / "classic" mix approach in others. Just quick one: https://airflow.apache.org/docs/apache-airflow/stable/tutorial/pipeline.html - this is the tutorial where we explain how to build the pipeline. And this is FAR more inconsistent than what you see on the main page. The main page is at least consistent that it only shows the taskflow approach. Fixing it in the intro, does not solve the problem that was raised in this PR. I do not know if there are more - but it's likely.
All that even without explaining "classic" vs. "taskflow" This is the regular "buiilding pipeline" tutorial. The first document the user gets to when you want to do something. If anyone is concerned about not surprising the user, this place (and likely few other places) should be corrected. Not the "intro" page where we are just showing "concept" of how airflow works. @bolkedebruin - do you seriously think we are solving the problem by changing the intro and leaving the tutorial with mix of concepts ? Really? In what way? |
ecb744f
to
3ff4e1b
Compare
3ff4e1b
to
443d6de
Compare
I've rebased. @bolkedebruin good to merge? @potiuk feel free to propose an alternative, but for now, I suggest merging this PR to:
|
As you wish - I still don't think that this is only part of the problem and changing it here changes literally nothing, because we are mixing both styles everywhere, but if you feel better about it - no problem. Note - I never "requested changes" here, I just stated my opjnion that this does not treat the root cause of the problem which is that we have hell of a mess when it comes to our examples and no consistent approach for it and that we are not bringing consistency and not removing confusion by this change - instead, at most, we just don't give a chance to a user who wants to find out what airflow is by a quick glance, to understand that they can write DAGs in much more modern way as well. But if I am just one person who thinks that way - I am certainly not dying on that hill. I just hope that the refactor of the structure will happen sooner rather than later and that we will be able to get a better chance to change the perception of our docs as well. I just read a comment about Airflow docs that it is "freaking abysmal" and well, to be honest, it kinda is. Hopefully we can make it consistent in both - structure and approach and consistent message to the users, rather than keep on the current "band-aid over band-aid approach". I just have a rule that whenever I see a problem I try to fix it "in general" rather than in single place when I know there are other places with the same problem - because it pretty much always leads to even more mess, but I do not have the bandwidh to do it now unfortunately. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
This PR reverts a change made in #25657 to the Airflow docs home page. I understand the desire to push TaskFlow usage but don't think this is the right place. This is the very first piece of Airflow code a reader will see, and mixing multiple concepts adds unnecessary complexity. This example should be as simple as possible.
Also removed
as dag
now that it's not needed anymore since Airflow 2.4.0.Before:
After:
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.