[SPIKE/RESEARCH] Async RedshiftDataOperator #417

phanikumv · 2022-06-06T11:42:16Z

Acceptance Criteria:
Find a Python library that supports Asynchronous implementation for RedshiftDataOperator if the official library does not support it.Document possible options and selection reasons for a particular library in this GitHub issue via a Summary comment.

Ensure that connection is set up and working.

pankajkoti · 2022-06-21T05:23:44Z

The RedshiftDataOperator has the same objective as of RedshiftSQLOperator to submit SQL statement for execution to the Redshift cluster. The difference between the two is that the RedshiftSQLOperator needs the postgres endpoint connection to be created for the Redshift cluster (the default connection name being redshift_default), whereas, RedshiftDataOperator does not need any additional connection (postgres endpoint of the cluster) to be created and it uses the AWS connection (default connection name aws_default) itself together with boto library to connect to the Redshift cluster.

The execution time of the RedshiftDataOperator operator varies based on the SQL statement submitted, meaning it will take as much time as the Redshift cluster would need to run the SQL statement. This is the same in case of the RedshiftSQLOperator. In my opinion, it qualifies to have an async version. We have have the async version of RedshiftSQLOperator and hence believe that we should also implement it for RedshiftDataOperator. The RedshiftDataOperator return the query ID for the submitted SQL and we can query the status of this query ID for polling asynchronously.

pankajkoti · 2022-06-21T05:26:14Z

Airflow reference PR where the RedshiftDataOperator was added: apache/airflow#19137

This PR includes all the discussions on why this operator was added and what is the fundamental difference between RedshiftDataOperator and RedshiftSQLOperator

pankajkoti · 2022-06-22T07:34:50Z

Conclusion: Implement the RedshiftDataOperator async operator.

phanikumv changed the title ~~[SPIKE/RESEARCH] RedshiftDataOperator~~ [SPIKE/RESEARCH] Async RedshiftDataOperator Jun 6, 2022

phanikumv added area/async Deferrable/async operators research Requires research or investigation labels Jun 6, 2022

phanikumv assigned pankajkoti Jun 20, 2022

pankajkoti closed this as completed Jun 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPIKE/RESEARCH] Async RedshiftDataOperator #417

[SPIKE/RESEARCH] Async RedshiftDataOperator #417

phanikumv commented Jun 6, 2022 •

edited

Loading

pankajkoti commented Jun 21, 2022

pankajkoti commented Jun 21, 2022

pankajkoti commented Jun 22, 2022

[SPIKE/RESEARCH] Async RedshiftDataOperator #417

[SPIKE/RESEARCH] Async RedshiftDataOperator #417

Comments

phanikumv commented Jun 6, 2022 • edited Loading

pankajkoti commented Jun 21, 2022

pankajkoti commented Jun 21, 2022

pankajkoti commented Jun 22, 2022

phanikumv commented Jun 6, 2022 •

edited

Loading