-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Databricks Provider _get_databricks_task_id only cleanses task id #44250
Labels
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
PR: #43106 will also fix this issue |
LefterisXefteris
pushed a commit
to LefterisXefteris/airflow
that referenced
this issue
Jan 5, 2025
…e#44960) This PR introduces the ability for users to explicitly specify databricks_task_key as a parameter for the DatabricksNotebookOperator. If databricks_task_key is not provided, a default value is generated using the hash of the dag_id and task_id. Key Changes: Users can now define databricks_task_key explicitly. When not provided, the key defaults to a deterministic hash based on dag_id and task_id. Fixes: apache#41816 Fixes: apache#44250 related: apache#43106
agupta01
pushed a commit
to agupta01/airflow
that referenced
this issue
Jan 6, 2025
…e#44960) This PR introduces the ability for users to explicitly specify databricks_task_key as a parameter for the DatabricksNotebookOperator. If databricks_task_key is not provided, a default value is generated using the hash of the dag_id and task_id. Key Changes: Users can now define databricks_task_key explicitly. When not provided, the key defaults to a deterministic hash based on dag_id and task_id. Fixes: apache#41816 Fixes: apache#44250 related: apache#43106
got686-yandex
pushed a commit
to got686-yandex/airflow
that referenced
this issue
Jan 30, 2025
…e#44960) This PR introduces the ability for users to explicitly specify databricks_task_key as a parameter for the DatabricksNotebookOperator. If databricks_task_key is not provided, a default value is generated using the hash of the dag_id and task_id. Key Changes: Users can now define databricks_task_key explicitly. When not provided, the key defaults to a deterministic hash based on dag_id and task_id. Fixes: apache#41816 Fixes: apache#44250 related: apache#43106
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Apache Airflow Provider(s)
databricks
Versions of Apache Airflow Providers
apache-airflow-providers-databricks==6.13.*
Apache Airflow version
2.10.2
Operating System
Debian GNU/Linux 12 (bookworm)
Deployment
Astronomer
Deployment details
No response
What happened
_get_databricks_task_id only cleanses the task id, ref:
airflow/providers/src/airflow/providers/databricks/plugins/databricks_workflow.py
Line 67 in a924284
airflow/providers/src/airflow/providers/databricks/operators/databricks.py
Line 1077 in a924284
However, the dag_id may also contain
.
- so the replacement of.
with__
should be applied to the whole string, not just the task id portion, else periods placed in the dag name results in errors such as:(as the invalid chars are getting silently stripped by databricks, so the task key on the databricks side is
myairflowdagwithperiods__my_airflow_task
rather thanmy.airflow.dag.with.periods__my_airflow_task
)What you think should happen instead
The replacement of
.
with__
should be applied to the whole task key / run name string, not just the task id portionHow to reproduce
Use the affected operator(s) e.g. DatabricksNotebookOperator on a DAG which contains
.
in the dag_idAnything else
Every time
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: