-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read only endpoint for XCom #8134 #9170
Conversation
e13159b
to
530a5e9
Compare
@ashb It looks good, but I don't know if it suits our code style. We use explicit joins instead of relying on ORM generated more often. Generating joints by ORM together with the use of multi-column primary keys can lead to spaghetti. Each model will have a relationship with each model because many values are repeated. We only use ORM-based relationships in one place - DagModel.tags It fits here because we can easily retrieve the object(DagModel) and metadata. (DagTag). In other places, we need careful control to ensure good performance. I also prepared an example. query = session.query(XCom)
query = query.filter(and_(XCom.dag_id == dag_id,
XCom.task_id == task_id,
XCom.key == xcom_key))
query = query.join(DR, and_(XCom.dag_id == DR.dag_id, XCom.execution_date == DR.execution_date))
query = query.filter(DR.run_id == dag_run_id)
q_object = query.one_or_none() |
@mik-laj I agree we don't currently use relationships; I guess I am proposing that we currently start using them. The queries generated is the same/similar. Using explicit relationships feels like a good idea to me, as it documents "yes, this is an 'official' relationship between these two tables, not just something that happens to have the same named columns". And the example query I had in my commit message was more complex than we needed. This works: ti = s.query(TI).filter_by(
dag_id='example_dag1', task_id='print_date2'
).join(TI.dag_run).filter_by(
run_id='scheduled__2020-06-08T11:00:00+00:00'
).one_or_none() |
@ashb Personally, I prefer explicite joins. Most of our contributors are SQL experts, so the current approach for them is more natural. We can start using them, but I think it's worth trying to migrate some of the existing code to find out the real limitations and problems. We may not have the shortest code now, but it is readable by our contributors and trusted. We already had a similar situation in the project. We wanted to start using assertions, fixtures and other new solutions for writing the test. We had discussions and votes on whether this is a good idea. Few people who tried to use it in practice. I wanted to use it for writing the API and gave up. Based on this experience, I think it's best to introduce a new idea by trying to rewrite part of the code. Otherwise, the idea you are proposing affects many users, so it will be better when we discuss it on the mailing list. |
Maybe experts at making SQL do what they need, but given the number of n+1 queries you had to fix, maybe not expert at using SQL efficiently 😉 |
This lets us select like this: ``` ( s.query(DagRun) .filter_by(dag_id=='example_dag1', run_id=='scheduled__2020-06-08T11:00:00+00:00') .join(DagRun.task_instances) .with_entities(TaskInstance).filter_by(task_id='print_date2') ``` This lets us easily go from the URL arguments of the API to the task instance(s)
Co-authored-by: Ash Berlin-Taylor <[email protected]> Co-authored-by: Kamil Breguła <[email protected]>
Closes #8134
Make sure to mark the boxes below before creating PR: [x]
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.