-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
README updates for 0.5.0 release #72
Conversation
Current coverage is
|
@@ -7,6 +7,8 @@ A library to load data into Spark SQL DataFrames from Amazon Redshift, and write | |||
Redshift tables. Amazon S3 is used to efficiently transfer data in and out of Redshift, and | |||
JDBC is used to automatically trigger the appropriate `COPY` and `UNLOAD` commands on Redshift. | |||
|
|||
This library is more suited to ETL than interactive queries. If you plan to perform many queries against Redshift tables then we recommend caching / saving those tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marmbrus, is this wording okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the following? I tried to justify why its better for ETL, and I hate caching (users are bad at figuring out if their data is going to fit) so I left that part out.
This library is more suited to ETL than interactive queries, since large amounts of data could be extracted to S3 for each query execution. If you plan to perform many queries against the same Redshift tables then we recommend saving the extracted data in a format such as parquet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Will update.
cross publish to 2.12
This patch updates the README for the 0.5.0 release.