-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update S3ToRedshift Operator docs to indicate multiple key functionality #28705
Conversation
4b2f62a
to
00906f4
Compare
@@ -42,7 +42,7 @@ class S3ToRedshiftOperator(BaseOperator): | |||
:param schema: reference to a specific schema in redshift database | |||
:param table: reference to a specific table in redshift database | |||
:param s3_bucket: reference to a specific S3 bucket | |||
:param s3_key: reference to a specific S3 key | |||
:param s3_key: reference either to a specific S3 key or a set of keys or folders sharing that prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does “a set of keys” mean? Does it has to be a Python set, or is the term being used more liberally? If the latter case I think collection is a more common term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose a better wording would be:
:param s3_key: key prefix that selects single or multiple objects from S3
@@ -207,6 +208,18 @@ def delete_security_group(sec_group_id: str, sec_group_name: str): | |||
) | |||
# [END howto_transfer_s3_to_redshift] | |||
|
|||
# [START howto_transfer_s3_to_redshift_multiple_keys] | |||
transfer_s3_to_redshift_multiple = S3ToRedshiftOperator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dont forget to add it to the chain
command
00906f4
to
aa38594
Compare
Thank you all for the suggestions for the changes. I have updated the wording of the documentation in airflow/providers/amazon/aws/transfers/s3_to_redshift.py, and have added the requested change to the system test. Let me know if any other changes are needed. |
As shown in issue #27957, the current documentation for the
S3ToRedshift
Operator seems to indicate that only one key from S3 can be transferred to Redshift. However, as elaborated here and in the aws docs here, theCOPY
command from S3 to Redshift automatically looks for all keys that matches the given prefix, and then copies all of them to redshift.For this PR, I wanted to update the docs to make this clear to Airflow users. I also added a system test that displays this functionality.