Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow Lineage DAG serializer #2735

Closed
pmbrull opened this issue Feb 12, 2022 · 4 comments · Fixed by #2738 or #2749
Closed

Airflow Lineage DAG serializer #2735

pmbrull opened this issue Feb 12, 2022 · 4 comments · Fixed by #2738 or #2749
Assignees
Labels
bug Something isn't working

Comments

@pmbrull
Copy link
Collaborator

pmbrull commented Feb 12, 2022

Affected module
Lineage Backend

Describe the bug
There are some tasks that fail during the backend lineage processing due to an internal error on Airflow when calling serialize_dag.

Let's check what is going on in order to not lose information from any task during the processing.

[2022-02-12, 19:35:32 UTC] {utils.py:291} INFO - Parsing Lineage for OpenMetadata
[2022-02-12, 19:35:32 UTC] {openmetadata.py:113} ERROR - Traceback (most recent call last):
  File "/Users/pmbrull/projects/OpenMetadata/venv/lib/python3.9/site-packages/airflow/serialization/serialized_objects.py", line 826, in serialize_dag
    serialize_dag["tasks"] = [cls._serialize(task) for _, task in dag.task_dict.items()]
  File "/Users/pmbrull/projects/OpenMetadata/venv/lib/python3.9/site-packages/airflow/serialization/serialized_objects.py", line 826, in <listcomp>
    serialize_dag["tasks"] = [cls._serialize(task) for _, task in dag.task_dict.items()]
  File "/Users/pmbrull/projects/OpenMetadata/venv/lib/python3.9/site-packages/airflow/serialization/serialized_objects.py", line 304, in _serialize
    return SerializedBaseOperator.serialize_operator(var)
  File "/Users/pmbrull/projects/OpenMetadata/venv/lib/python3.9/site-packages/airflow/serialization/serialized_objects.py", line 574, in serialize_operator
    serialize_op['params'] = cls._serialize_params_dict(op.params)
  File "/Users/pmbrull/projects/OpenMetadata/venv/lib/python3.9/site-packages/airflow/serialization/serialized_objects.py", line 447, in _serialize_params_dict
    if f'{v.__module__}.{v.__class__.__name__}' == 'airflow.models.param.Param':
AttributeError: 'str' object has no attribute '__module__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pmbrull/projects/OpenMetadata/venv/lib/python3.9/site-packages/airflow_provider_openmetadata/lineage/openmetadata.py", line 109, in send_lineage
    parse_lineage_to_openmetadata(
  File "/Users/pmbrull/projects/OpenMetadata/venv/lib/python3.9/site-packages/airflow_provider_openmetadata/lineage/utils.py", line 294, in parse_lineage_to_openmetadata
    dag_properties = get_properties(dag, SerializedDAG.serialize_dag, ALLOWED_FLOW_KEYS)
  File "/Users/pmbrull/projects/OpenMetadata/venv/lib/python3.9/site-packages/airflow_provider_openmetadata/lineage/utils.py", line 99, in get_properties
    props: Dict[str, str] = {key: value for (key, value) in serializer(obj).items()}
  File "/Users/pmbrull/projects/OpenMetadata/venv/lib/python3.9/site-packages/airflow/serialization/serialized_objects.py", line 847, in serialize_dag
    raise SerializationError(f'Failed to serialize DAG {dag.dag_id!r}: {e}')
airflow.exceptions.SerializationError: Failed to serialize DAG 'lineage_tutorial': 'str' object has no attribute '__module__'

To Reproduce

Screenshots or steps to reproduce

Expected behavior
A clear and concise description of what you expected to happen.

Version:

  • OS: [e.g. iOS]
  • Python version:
  • OpenMetadata version: [e.g. 0.8] main
  • OpenMetadata Ingestion package version: [e.g. openmetadata-ingestion[docker]==XYZ] main

Additional context
Add any other context about the problem here.

@pmbrull pmbrull added the bug Something isn't working label Feb 12, 2022
@pmbrull pmbrull self-assigned this Feb 12, 2022
@pmbrull
Copy link
Collaborator Author

pmbrull commented Feb 12, 2022

Found this apache/airflow#20875.

Looks like the issue has been fixed in main but we will need to wait until the next release. This also means that the backend will only work safely in airflow > 2.2.3 @harshach

@pmbrull
Copy link
Collaborator Author

pmbrull commented Feb 12, 2022

I'll try to prepare a workaround to get even if it is only part of the info

@harshach
Copy link
Collaborator

@pmbrull we should also support the 1.x with lineage backend work. The earlier version of lineage backend will work with 1.x and 2.x .Can you please make sure we are supporting both.
Also is this bug causing to miss some details or we are not able to look at any tasks or their details?

@pmbrull
Copy link
Collaborator Author

pmbrull commented Feb 13, 2022

@pmbrull we should also support the 1.x with lineage backend work. The earlier version of lineage backend will work with 1.x and 2.x .Can you please make sure we are supporting both. Also is this bug causing to miss some details or we are not able to look at any tasks or their details?

Hi, @harshach, this error makes the lineage backend crash when extracting the task information, so the task remains unprocessed. I'll work on finding a workaround in case we can skip the serialiser and obtain the properties from somewhere else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants