Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1566] consolidate programming guide, and general doc updates #896

Closed
wants to merge 21 commits into from

Conversation

mateiz
Copy link
Contributor

@mateiz mateiz commented May 27, 2014

This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:

  • A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
  • New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
  • Spark-submit guide moved to a separate page and expanded slightly
  • Various cleanups of the menu system, security docs, and others
  • Updated look of title bar to differentiate the docs from previous Spark versions

You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.


# Launching Applications with spark-submit

Once a user application is bundled, it can be launched using the `bin/spark-submit` script
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence needs a period. Wasn't part of your change but I just noticed it.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15236/

@pwendell
Copy link
Contributor

Hey @mateiz looks great. I added two small comments which I think were both typo's unrelated to your patch.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15237/

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15238/

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15239/

@@ -103,7 +103,7 @@ the `make-distribution.sh` script included in a Spark source tarball/checkout.
## Using a Mesos Master URL

The Master URLs for Mesos are in the form `mesos://host:5050` for a single-master Mesos
cluster, or `zk://host:2181` for a multi-master Mesos cluster using ZooKeeper.
cluster, or `mesos://zk://host:2181` for a multi-master Mesos cluster using ZooKeeper.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this change was wrong because the MESOS_REGEX at https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1437 is """(mesos|zk)://.*""".r but after reading through the code this change is actually correct.

I'm not sure that mesos://zk://host:2181 is a valid URL though. JDBC for example uses jdbc:postgresql:// like in jdbc:postgresql://localhost/test rather than jdbc://postgresql://... Issue for a later time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd leave this for another time. In general none of these are URLs anyway, perhaps they could be called URIs, but I don't even want to get into the difference ;)

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15245/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15259/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15269/

@mateiz
Copy link
Contributor Author

mateiz commented May 29, 2014

@pwendell is it okay to merge this as is or do you want to look at it more?

@asfgit asfgit closed this in c8bf413 May 30, 2014
asfgit pushed a commit that referenced this pull request May 30, 2014
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:

* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions

You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.

Author: Matei Zaharia <[email protected]>

Closes #896 from mateiz/1.0-docs and squashes the following commits:

03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
(cherry picked from commit c8bf413)

Signed-off-by: Patrick Wendell <[email protected]>
@pwendell
Copy link
Contributor

Thanks Matei, I've merged this!

pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
This PR improves and organizes the config option page
and makes a few other changes to config docs. See a preview here:
http://people.apache.org/~pwendell/config-improvements/configuration.html

The biggest changes are:
1. The configs for the standalone master/workers were moved to the
standalone page and out of the general config doc.
2. SPARK_LOCAL_DIRS was missing from the standalone docs.
3. Expanded discussion of injecting configs with spark-submit, including an
example.
4. Config options were organized into the following categories:
- Runtime Environment
- Shuffle Behavior
- Spark UI
- Compression and Serialization
- Execution Behavior
- Networking
- Scheduling
- Security
- Spark Streaming

Author: Patrick Wendell <[email protected]>

Closes apache#880 from pwendell/config-cleanup and squashes the following commits:

93f56c3 [Patrick Wendell] Feedback from Matei
6f66efc [Patrick Wendell] More feedback
16ae776 [Patrick Wendell] Adding back header section
d9c264f [Patrick Wendell] Small fix
e0c1728 [Patrick Wendell] Response to Matei's review
27d57db [Patrick Wendell] Reverting changes to index.html (covered in apache#896)
e230ef9 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
a374369 [Patrick Wendell] Line wrapping fixes
fdff7fc [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
3289ea4 [Patrick Wendell] Pulling in changes from apache#856
106ee31 [Patrick Wendell] Small link fix
f7e79bc [Patrick Wendell] Re-organizing config options.
54b184d [Patrick Wendell] Adding standalone configs to the standalone page
592e94a [Patrick Wendell] Stash
29b5446 [Patrick Wendell] Better discussion of spark-submit in configuration docs
2d719ef [Patrick Wendell] Small fix
4af9e07 [Patrick Wendell] Adding SPARK_LOCAL_DIRS docs
204b248 [Patrick Wendell] Small fixes
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:

* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions

You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.

Author: Matei Zaharia <[email protected]>

Closes apache#896 from mateiz/1.0-docs and squashes the following commits:

03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This PR improves and organizes the config option page
and makes a few other changes to config docs. See a preview here:
http://people.apache.org/~pwendell/config-improvements/configuration.html

The biggest changes are:
1. The configs for the standalone master/workers were moved to the
standalone page and out of the general config doc.
2. SPARK_LOCAL_DIRS was missing from the standalone docs.
3. Expanded discussion of injecting configs with spark-submit, including an
example.
4. Config options were organized into the following categories:
- Runtime Environment
- Shuffle Behavior
- Spark UI
- Compression and Serialization
- Execution Behavior
- Networking
- Scheduling
- Security
- Spark Streaming

Author: Patrick Wendell <[email protected]>

Closes apache#880 from pwendell/config-cleanup and squashes the following commits:

93f56c3 [Patrick Wendell] Feedback from Matei
6f66efc [Patrick Wendell] More feedback
16ae776 [Patrick Wendell] Adding back header section
d9c264f [Patrick Wendell] Small fix
e0c1728 [Patrick Wendell] Response to Matei's review
27d57db [Patrick Wendell] Reverting changes to index.html (covered in apache#896)
e230ef9 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
a374369 [Patrick Wendell] Line wrapping fixes
fdff7fc [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
3289ea4 [Patrick Wendell] Pulling in changes from apache#856
106ee31 [Patrick Wendell] Small link fix
f7e79bc [Patrick Wendell] Re-organizing config options.
54b184d [Patrick Wendell] Adding standalone configs to the standalone page
592e94a [Patrick Wendell] Stash
29b5446 [Patrick Wendell] Better discussion of spark-submit in configuration docs
2d719ef [Patrick Wendell] Small fix
4af9e07 [Patrick Wendell] Adding SPARK_LOCAL_DIRS docs
204b248 [Patrick Wendell] Small fixes
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:

* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions

You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.

Author: Matei Zaharia <[email protected]>

Closes apache#896 from mateiz/1.0-docs and squashes the following commits:

03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs
0779508 [Matei Zaharia] tweak
ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks
1bf4112 [Matei Zaharia] Review comments
4414f88 [Matei Zaharia] tweaks
d04e979 [Matei Zaharia] Fix some old links to Java guide
a34ed33 [Matei Zaharia] tweak
541bb3b [Matei Zaharia] miscellaneous changes
fcefdec [Matei Zaharia] Moved submitting apps to separate doc
61d72b4 [Matei Zaharia] stuff
181f217 [Matei Zaharia] migration guide, remove old language guides
e11a0da [Matei Zaharia] Add more API functions
6a030a9 [Matei Zaharia] tweaks
8db0ae3 [Matei Zaharia] Added key-value pairs section
318d2c9 [Matei Zaharia] tweaks
1c81477 [Matei Zaharia] New section on basics and function syntax
e38f559 [Matei Zaharia] Actually added programming guide to Git
a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout
3b6a876 [Matei Zaharia] More CSS tweaks
01ec8bf [Matei Zaharia] More CSS tweaks
e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
wangyum added a commit that referenced this pull request May 26, 2023
* [CARMEL-5873] Upgrade Parquet to 1.12.2 (#896)

* [SPARK-36726] Upgrade Parquet to 1.12.1

### What changes were proposed in this pull request?

Upgrade Apache Parquet to 1.12.1

### Why are the changes needed?

Parquet 1.12.1 contains the following bug fixes:
- PARQUET-2064: Make Range public accessible in RowRanges
- PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream`
- PARQUET-2052: Integer overflow when writing huge binary using dictionary encoding
- PARQUET-1633: Fix integer overflow
- PARQUET-2054: fix TCP leaking when calling ParquetFileWriter.appendFile
- PARQUET-2072: Do Not Determine Both Min/Max for Binary Stats
- PARQUET-2073: Fix estimate remaining row count in ColumnWriteStoreBase
- PARQUET-2078: Failed to read parquet file after writing with the same

In particular PARQUET-2078 is a blocker for the upcoming Apache Spark 3.2.0 release.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests + a new test for the issue in SPARK-36696

Closes #33969 from sunchao/upgrade-parquet-12.1.

Authored-by: Chao Sun <[email protected]>
Signed-off-by: DB Tsai <[email protected]>

(cherry picked from commit a927b08)

* [SPARK-34542][BUILD] Upgrade Parquet to 1.12.0

### What changes were proposed in this pull request?

Parquet 1.12.0 New Feature
- PARQUET-41 - Add bloom filters to parquet statistics
- PARQUET-1373 - Encryption key management tools
- PARQUET-1396 - Example of using EncryptionPropertiesFactory and DecryptionPropertiesFactory
- PARQUET-1622 - Add BYTE_STREAM_SPLIT encoding
- PARQUET-1784 - Column-wise configuration
- PARQUET-1817 - Crypto Properties Factory
- PARQUET-1854 - Properties-Driven Interface to Parquet Encryption

Parquet 1.12.0 release notes:
https://github.com/apache/parquet-mr/blob/apache-parquet-1.12.0/CHANGES.md

### Why are the changes needed?

- Bloom filters to improve filter performance
- ZSTD enhancement

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit test.

Closes #31649 from wangyum/SPARK-34542.

Lead-authored-by: Yuming Wang <[email protected]>
Co-authored-by: Yuming Wang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit cbffc12)

Co-authored-by: Chao Sun <[email protected]>

* [HADP-44647] Parquet file based kms client for encryption keys (#897)

* [HADP-44647] Parquet file based kms client for encryption keys (#82)

create/write parquet encryption table.
```
set spark.sql.parquet.encryption.key.file=/path/to/key/file;

create table parquet_encryption(a int, b int, c int)
using parquet
options (
'parquet.encryption.column.keys' 'columnKey1: a, b; columnKey2: c',
'parquet.encryption.footer.key' 'footerKey');
```

read parquet encryption table;

```
set spark.sql.parquet.encryption.key.file=/path/to/key/file;

select ... from parquet_encryption ...
```

Will raise another pr for default footerKey.

* [HADP-44647][FOLLOWUP] Reuse the kms instance for same key file (#84)

* Fix

Co-authored-by: fwang12 <[email protected]>

Co-authored-by: Chao Sun <[email protected]>
Co-authored-by: fwang12 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants