-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redshift - Copy to temp table, prune table based on date, utilization of init_copy and post_copy #1314
Conversation
Can you either keep the PRs seperate or close #1313? |
@@ -103,6 +103,15 @@ def copy_options(self): | |||
""" | |||
return '' | |||
|
|||
def prune_table(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstrings would be nice :)
It would be great if you can find somebody who uses redshift in luigi to review. Thanks :) |
I closed PR #1313 and will include updates here. Luigi redshift users are quiet and unknown. Mortar Data uses some redshift functionality in their implementation of luigi, but they seem to have been inactive in the luigi community for awhile. |
As rdbms.py is a common module for which is to be implemented (not executed itself), I do not see how it needs its own tests. Postgres.py and Redshift.py both implement Rdbms.py; thus, their unit tests indirectly test rdbms.py. |
copy_options = '' | ||
sql = ["insert into dummy_table select * from stage_dummy_table;"] | ||
prune_date = 'current_date - 30' | ||
prune_column = 'dumb_date' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks weird to me. Is prune_column
(as well as the other 2) a method or a property? I mean, is it myobj.prune_column
or myobj.prune_column()
that's supposed to be a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intend them to be properties...optional properties. If I specify @Property over them in redshift.py, will that allow them to be optional properties with 'None' as their default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tarrasch Do you have any suggestions on how to specify prune_* as optional properties such that they can be used like the test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just do:
@property
def prune_table(self):
return None
Or maybe even just prune_table = None
? Hmm...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tarrasch If the prune_*
variables should be marked as properties, shouldn't table_type
and table_attributes
also be marked as such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Good point, maybe it's better to stay consistent. Do something clever. :)
@dlstadther, oh ok. I didn't know that redshift inherited rdbms. If you think all code paths are tested. I'm fine with that :) |
I'm currently using Luigi with Redshift, so I'll try these out when I get a chance. |
@@ -15,7 +15,7 @@ | |||
# limitations under the License. | |||
# | |||
""" | |||
A common module for postgres like databases, such as postgres or redshift | |||
A common module for posgres like databases, such as postgres or redshift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh? Introducing a spell error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how that happened. I will fix and merge commits.
…unt of records based on date), init copy, post copy (also added to rdbms.py) Added docstrings to prune variables. Updated do_prune() to raise an exception when only a subset of prune variables are implemented (needs to be all or none). Updated docstring accordingly. Added decorator @Property to prune_*, table_type, and table_attributes.
486aef4
to
f5700be
Compare
+bump (PRs are getting a little stagnate...I understand Luigi is undergoing a change in management, but the development must go on!) Comments, questions, concerns, etc. are welcome. |
Redshift - Copy to temp table, prune table based on date, utilization of init_copy and post_copy
@erikbern Thanks for the quick action! Greatly appreciated! |
yes no problem – I'm happy to merge most stuff that goes into contrib right away :) just forgot about it let me check if other things are pending |
Added support for redshift copy to temp table, prune table (delete some amount of records based on date), init copy, post copy (also added to rdbms.py)
Also includes the rdbms.py post_copy update found in PR #1313