Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README updates for 0.5.0 release #72

Closed
wants to merge 5 commits into from
Closed

README updates for 0.5.0 release #72

wants to merge 5 commits into from

Conversation

JoshRosen
Copy link
Contributor

This patch updates the README for the 0.5.0 release.

  • Fixes incorrect Python examples (fixes Python documentation error #56).
  • Clarify use cases for ETL vs. interactive queries.
  • Provide verbatim examples that users can copy for setting up credentials.

@JoshRosen JoshRosen added this to the 0.5.0 milestone Sep 4, 2015
@codecov-io
Copy link

Current coverage is 94.95%

Merging #72 into master will not affect coverage as of 280270b

@@            master     #72   diff @@
======================================
  Files           11      11       
  Stmts          436     436       
  Branches       100     100       
  Methods          0       0       
======================================
  Hit            414     414       
  Partial          0       0       
  Missed          22      22       

Review entire Coverage Diff as of 280270b

Powered by Codecov. Updated on successful CI builds.

@JoshRosen JoshRosen changed the title [WIP] README updates for 0.5.0 release README updates for 0.5.0 release Sep 8, 2015
@@ -7,6 +7,8 @@ A library to load data into Spark SQL DataFrames from Amazon Redshift, and write
Redshift tables. Amazon S3 is used to efficiently transfer data in and out of Redshift, and
JDBC is used to automatically trigger the appropriate `COPY` and `UNLOAD` commands on Redshift.

This library is more suited to ETL than interactive queries. If you plan to perform many queries against Redshift tables then we recommend caching / saving those tables.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marmbrus, is this wording okay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the following? I tried to justify why its better for ETL, and I hate caching (users are bad at figuring out if their data is going to fit) so I left that part out.

This library is more suited to ETL than interactive queries, since large amounts of data could be extracted to S3 for each query execution. If you plan to perform many queries against the same Redshift tables then we recommend saving the extracted data in a format such as parquet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Will update.

@JoshRosen JoshRosen closed this in 7b402e3 Sep 9, 2015
@JoshRosen JoshRosen deleted the readme-updates branch September 9, 2015 00:05
munk pushed a commit to ActionIQ-OSS/spark-redshift that referenced this pull request May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python documentation error
3 participants