README updates for 0.5.0 release #72

JoshRosen · 2015-09-04T21:24:43Z

This patch updates the README for the 0.5.0 release.

Fixes incorrect Python examples (fixes Python documentation error #56).
Clarify use cases for ETL vs. interactive queries.
Provide verbatim examples that users can copy for setting up credentials.

codecov-io · 2015-09-04T21:57:54Z

Current coverage is `94.95%`

Merging #72 into master will not affect coverage as of 280270b

@@            master     #72   diff @@
======================================
  Files           11      11       
  Stmts          436     436       
  Branches       100     100       
  Methods          0       0       
======================================
  Hit            414     414       
  Partial          0       0       
  Missed          22      22

Review entire Coverage Diff as of 280270b

Powered by Codecov. Updated on successful CI builds.

JoshRosen · 2015-09-08T22:37:55Z

README.md

@@ -7,6 +7,8 @@ A library to load data into Spark SQL DataFrames from Amazon Redshift, and write
 Redshift tables. Amazon S3 is used to efficiently transfer data in and out of Redshift, and
 JDBC is used to automatically trigger the appropriate `COPY` and `UNLOAD` commands on Redshift.

+This library is more suited to ETL than interactive queries. If you plan to perform many queries against Redshift tables then we recommend caching / saving those tables.


@marmbrus, is this wording okay?

Maybe the following? I tried to justify why its better for ETL, and I hate caching (users are bad at figuring out if their data is going to fit) so I left that part out.

This library is more suited to ETL than interactive queries, since large amounts of data could be extracted to S3 for each query execution. If you plan to perform many queries against the same Redshift tables then we recommend saving the extracted data in a format such as parquet.

SGTM. Will update.

cross publish to 2.12

Update Python and Scala examples

5a5365c

JoshRosen added the enhancement label Sep 4, 2015

JoshRosen added this to the 0.5.0 milestone Sep 4, 2015

JoshRosen added 3 commits September 8, 2015 15:17

Add cite for "Hadoop limitations"

89df6c6

Add Redshift note.

f1aba64

Expand S3 credentials section

3b450e4

JoshRosen changed the title ~~[WIP] README updates for 0.5.0 release~~ README updates for 0.5.0 release Sep 8, 2015

JoshRosen reviewed Sep 8, 2015
View reviewed changes

Update ETL paragraph.

9e7534c

JoshRosen closed this in 7b402e3 Sep 9, 2015

JoshRosen deleted the readme-updates branch September 9, 2015 00:05

munk pushed a commit to ActionIQ-OSS/spark-redshift that referenced this pull request May 4, 2021

Merge pull request databricks#72 from joprice/crossPublish

e8b354a

cross publish to 2.12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README updates for 0.5.0 release #72

README updates for 0.5.0 release #72

JoshRosen commented Sep 4, 2015

codecov-io commented Sep 4, 2015

JoshRosen Sep 8, 2015

marmbrus Sep 8, 2015

JoshRosen Sep 8, 2015

README updates for 0.5.0 release #72

README updates for 0.5.0 release #72

Conversation

JoshRosen commented Sep 4, 2015

codecov-io commented Sep 4, 2015

Current coverage is 94.95%

JoshRosen Sep 8, 2015

Choose a reason for hiding this comment

marmbrus Sep 8, 2015

Choose a reason for hiding this comment

JoshRosen Sep 8, 2015

Choose a reason for hiding this comment

Current coverage is `94.95%`