Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redshift - Copy to temp table, prune table based on date, utilization of init_copy and post_copy #1314

Merged
merged 1 commit into from
Nov 6, 2015

Conversation

dlstadther
Copy link
Collaborator

Added support for redshift copy to temp table, prune table (delete some amount of records based on date), init copy, post copy (also added to rdbms.py)

Also includes the rdbms.py post_copy update found in PR #1313

@Tarrasch
Copy link
Contributor

Can you either keep the PRs seperate or close #1313?

@@ -103,6 +103,15 @@ def copy_options(self):
"""
return ''

def prune_table(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings would be nice :)

@Tarrasch
Copy link
Contributor

It would be great if you can find somebody who uses redshift in luigi to review. Thanks :)

@dlstadther
Copy link
Collaborator Author

I closed PR #1313 and will include updates here.

Luigi redshift users are quiet and unknown. Mortar Data uses some redshift functionality in their implementation of luigi, but they seem to have been inactive in the luigi community for awhile.

@dlstadther
Copy link
Collaborator Author

As rdbms.py is a common module for which is to be implemented (not executed itself), I do not see how it needs its own tests. Postgres.py and Redshift.py both implement Rdbms.py; thus, their unit tests indirectly test rdbms.py.

copy_options = ''
sql = ["insert into dummy_table select * from stage_dummy_table;"]
prune_date = 'current_date - 30'
prune_column = 'dumb_date'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird to me. Is prune_column (as well as the other 2) a method or a property? I mean, is it myobj.prune_column or myobj.prune_column() that's supposed to be a string?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend them to be properties...optional properties. If I specify @Property over them in redshift.py, will that allow them to be optional properties with 'None' as their default value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tarrasch Do you have any suggestions on how to specify prune_* as optional properties such that they can be used like the test case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just do:

@property
def prune_table(self):
    return None

Or maybe even just prune_table = None? Hmm...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tarrasch If the prune_* variables should be marked as properties, shouldn't table_type and table_attributes also be marked as such?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Good point, maybe it's better to stay consistent. Do something clever. :)

@Tarrasch
Copy link
Contributor

@dlstadther, oh ok. I didn't know that redshift inherited rdbms. If you think all code paths are tested. I'm fine with that :)

@moandcompany
Copy link
Contributor

I'm currently using Luigi with Redshift, so I'll try these out when I get a chance.

@@ -15,7 +15,7 @@
# limitations under the License.
#
"""
A common module for postgres like databases, such as postgres or redshift
A common module for posgres like databases, such as postgres or redshift
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh? Introducing a spell error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how that happened. I will fix and merge commits.

…unt of records based on date), init copy, post copy (also added to rdbms.py)

Added docstrings to prune variables. Updated do_prune() to raise an exception when only a subset of prune variables are implemented (needs to be all or none). Updated docstring accordingly.

Added decorator @Property to prune_*, table_type, and table_attributes.
@dlstadther
Copy link
Collaborator Author

+bump

(PRs are getting a little stagnate...I understand Luigi is undergoing a change in management, but the development must go on!)

Comments, questions, concerns, etc. are welcome.

erikbern added a commit that referenced this pull request Nov 6, 2015
Redshift - Copy to temp table, prune table based on date, utilization of init_copy and post_copy
@erikbern erikbern merged commit 95d30b6 into spotify:master Nov 6, 2015
@dlstadther
Copy link
Collaborator Author

@erikbern Thanks for the quick action! Greatly appreciated!

@dlstadther dlstadther deleted the rw-redshift branch November 6, 2015 15:45
@erikbern
Copy link
Contributor

erikbern commented Nov 6, 2015

yes no problem – I'm happy to merge most stuff that goes into contrib right away :) just forgot about it

let me check if other things are pending

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants