You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Acceptance Criteria:
Find a Python library that supports Asynchronous implementation for RedshiftDataOperator if the official library does not support it.Document possible options and selection reasons for a particular library in this GitHub issue via a Summary comment.
Ensure that connection is set up and working.
The text was updated successfully, but these errors were encountered:
phanikumv
changed the title
[SPIKE/RESEARCH] RedshiftDataOperator
[SPIKE/RESEARCH] Async RedshiftDataOperator
Jun 6, 2022
The RedshiftDataOperator has the same objective as of RedshiftSQLOperator to submit SQL statement for execution to the Redshift cluster. The difference between the two is that the RedshiftSQLOperator needs the postgres endpoint connection to be created for the Redshift cluster (the default connection name being redshift_default), whereas, RedshiftDataOperator does not need any additional connection (postgres endpoint of the cluster) to be created and it uses the AWS connection (default connection name aws_default) itself together with boto library to connect to the Redshift cluster.
The execution time of the RedshiftDataOperator operator varies based on the SQL statement submitted, meaning it will take as much time as the Redshift cluster would need to run the SQL statement. This is the same in case of the RedshiftSQLOperator. In my opinion, it qualifies to have an async version. We have have the async version of RedshiftSQLOperator and hence believe that we should also implement it for RedshiftDataOperator. The RedshiftDataOperator return the query ID for the submitted SQL and we can query the status of this query ID for polling asynchronously.
Airflow reference PR where the RedshiftDataOperator was added: apache/airflow#19137
This PR includes all the discussions on why this operator was added and what is the fundamental difference between RedshiftDataOperator and RedshiftSQLOperator
Acceptance Criteria:
Find a Python library that supports Asynchronous implementation for RedshiftDataOperator if the official library does not support it.Document possible options and selection reasons for a particular library in this GitHub issue via a Summary comment.
Ensure that connection is set up and working.
The text was updated successfully, but these errors were encountered: