Read only endpoint for XCom #8134 #9170

randr97 · 2020-06-07T18:23:52Z

Closes #8134

Make sure to mark the boxes below before creating PR: [x]

Description above provides context of the change
Unit tests coverage for changes (not needed for documentation changes)
Target Github ISSUE in description if exists
Commits follow "How to write a good git commit message"
Relevant documentation is updated including usage instructions.
I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

airflow/api_connexion/openapi/v1.yaml

airflow/api_connexion/schemas/xcom_schema.py

ashb · 2020-06-09T12:59:54Z

@mik-laj Are you happy with c973a07?

mik-laj · 2020-06-10T00:41:27Z

@ashb It looks good, but I don't know if it suits our code style. We use explicit joins instead of relying on ORM generated more often. Generating joints by ORM together with the use of multi-column primary keys can lead to spaghetti. Each model will have a relationship with each model because many values are repeated. We only use ORM-based relationships in one place - DagModel.tags It fits here because we can easily retrieve the object(DagModel) and metadata. (DagTag). In other places, we need careful control to ensure good performance.

I also prepared an example.
16f8299

    query = session.query(XCom)
    query = query.filter(and_(XCom.dag_id == dag_id,
                              XCom.task_id == task_id,
                              XCom.key == xcom_key))
    query = query.join(DR, and_(XCom.dag_id == DR.dag_id, XCom.execution_date == DR.execution_date))
    query = query.filter(DR.run_id == dag_run_id)

    q_object = query.one_or_none()

ashb · 2020-06-10T10:26:29Z

It looks good, but I don't know if it suits our code style. We use explicit joins instead of relying on ORM generated more often.

@mik-laj I agree we don't currently use relationships; I guess I am proposing that we currently start using them. The queries generated is the same/similar. Using explicit relationships feels like a good idea to me, as it documents "yes, this is an 'official' relationship between these two tables, not just something that happens to have the same named columns".

And the example query I had in my commit message was more complex than we needed. This works:

ti = s.query(TI).filter_by(
    dag_id='example_dag1', task_id='print_date2'
).join(TI.dag_run).filter_by(
    run_id='scheduled__2020-06-08T11:00:00+00:00'
).one_or_none()

mik-laj · 2020-06-10T11:04:50Z

@ashb Personally, I prefer explicite joins. Most of our contributors are SQL experts, so the current approach for them is more natural. We can start using them, but I think it's worth trying to migrate some of the existing code to find out the real limitations and problems. We may not have the shortest code now, but it is readable by our contributors and trusted.

We already had a similar situation in the project. We wanted to start using assertions, fixtures and other new solutions for writing the test. We had discussions and votes on whether this is a good idea. Few people who tried to use it in practice. I wanted to use it for writing the API and gave up.
There were two main problems. assert statements The assertion expression displays much less readable error messages. Fixtures are not compatible with provide_session decorator.

Based on this experience, I think it's best to introduce a new idea by trying to rewrite part of the code. Otherwise, the idea you are proposing affects many users, so it will be better when we discuss it on the mailing list.

ashb · 2020-06-10T12:56:43Z

Most of our contributors are SQL experts

Maybe experts at making SQL do what they need, but given the number of n+1 queries you had to fix, maybe not expert at using SQL efficiently 😉

airflow/api_connexion/endpoints/xcom_endpoint.py

tests/api_connexion/endpoints/test_xcom_endpoint.py

tests/api_connexion/schemas/test_xcom_schema.py

airflow/api_connexion/endpoints/xcom_endpoint.py

airflow/models/dagrun.py

airflow/api_connexion/endpoints/xcom_endpoint.py

tests/api_connexion/endpoints/test_xcom_endpoint.py

airflow/api_connexion/endpoints/xcom_endpoint.py

tests/api_connexion/schemas/test_xcom_schema.py

airflow/api_connexion/schemas/xcom_schema.py

This lets us select like this: ``` ( s.query(DagRun) .filter_by(dag_id=='example_dag1', run_id=='scheduled__2020-06-08T11:00:00+00:00') .join(DagRun.task_instances) .with_entities(TaskInstance).filter_by(task_id='print_date2') ``` This lets us easily go from the URL arguments of the API to the task instance(s)

Co-authored-by: Ash Berlin-Taylor <[email protected]> Co-authored-by: Kamil Breguła <[email protected]>

boring-cyborg bot added the area:API Airflow's REST/HTTP API label Jun 7, 2020

randr97 marked this pull request as draft June 7, 2020 18:24

mik-laj reviewed Jun 7, 2020

View reviewed changes

airflow/api_connexion/openapi/v1.yaml Outdated Show resolved Hide resolved

mik-laj reviewed Jun 7, 2020

View reviewed changes

airflow/api_connexion/openapi/v1.yaml Outdated Show resolved Hide resolved

randr97 force-pushed the REST_API_XCOMS branch 2 times, most recently from e13159b to 530a5e9 Compare June 7, 2020 20:52

randr97 requested a review from mik-laj June 7, 2020 20:54

mik-laj reviewed Jun 9, 2020

View reviewed changes

airflow/api_connexion/schemas/xcom_schema.py Outdated Show resolved Hide resolved

randr97 force-pushed the REST_API_XCOMS branch from 16f8299 to c7ca028 Compare June 13, 2020 19:25

randr97 requested a review from mik-laj June 13, 2020 19:27

randr97 marked this pull request as ready for review June 13, 2020 19:27

randr97 force-pushed the REST_API_XCOMS branch from c7ca028 to e0a02b7 Compare June 13, 2020 19:30