-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-1637: Clean up examples for 1.0 #571
Conversation
techaddict
commented
Apr 27, 2014
- Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
- Move Python examples into examples/src/main/python
- Update docs to reflect these changes
….apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
@@ -22,10 +22,10 @@ import org.apache.spark.SparkContext._ | |||
import org.apache.spark.util.collection.OpenHashMap | |||
import scala.collection.JavaConversions.mapAsScalaMap | |||
|
|||
private[streaming] | |||
private[spark] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this, needed by RawNetworkGrep and KafkaWordCount. There has to be better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't use any private methods in examples. The usage of RawTextHelper in KafkaWordCount
is very simple. warmUp
is used in RawNetworkGrep
, is it necessary? @tdas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the RawTextHelper was essentially the way to provide data for running benchmarks. However, I havent quite maintained it, so that example code has been languishing in unknown state. I am inclined to keep it there as it is, as long as it hidden behind private[*]. Though I dont see the logic behind changing it from private[streaming] to private[spark]
Can one of the admins verify this patch? |
Jenkins, add to whitelist and test this please |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
@techaddict FYI, I moved main methods to |
One question: is |
@mengxr ya exactly, thats why i commented on the private[spark] change. I think better way would be to make the examples use there own methods, in case of private methods. |
If Apache allows to use a name space not starting with |
@mengxr I'm pretty sure we are only allowed to release classes in the org.apache namespace (unfortunately). |
I know this is a little weird, but either |
@mengxr IMO we should re-implement the necessary private functions in the examples, they are not too many just 2-3 methods. It will make using/modifying the examples easier for users. |
I have a general question about what should file and package organization of the examples be. For example, for Spark Streaming, is |
@tdas I'm a little confused. Which one do you prefer, |
Oops, sorry for the confusion. I meant "latter" instead of "former". That is, I prefer
|
Bringing @marmbrus into this discussion as he had added the Spark SQL examples in |
No strong opinions here other than we should be consistent. |
Okay. As I said earlier, I find it more consistent to keep it as "org.apache.spark.streaming.examples.*" |
@mengxr so what changes should i make other than streaming one suggested by tdas. |
@techaddict I don't see anything necessary except the one using |
@mengxr what do you suggest how should we resolve the private[spark] problem ? I'm short on idea. |
Build triggered. |
Build started. |
I would say using |
@mengxr @47ef86c392badc58052a0414115e49c2970b31eb looks good ? |
Build finished. All automated tests passed. |
All automated tests passed. |
@techaddict It looks good to me. |
@mateiz ready for merging |
No, I have strong feelings against "org.apache.spark.examples.streaming.*" This is entirely inconsistent with rest of the API. |
@tdas @techaddict let's not move them IMO. There are two reasons: First, when you browse the folder tree, you want to get into spark/examples and immediately see subfolders, instead of wondering "should I go into spark/examples or spark/streaming or spark/mllib or whatever". Second, it makes it less likely for us to use private[streaming] APIs in there. |
BTW the other point of this was to make them consistent across packages, and MLlib and all the Python examples are under examples/ already. |
@mateiz So this is good to merge. ?
|
@mateiz I dont think that decision is that big a deal. Its fairly obvious if one sees a streaming folder inside spark/ that those are streaming examples. It feels more inconsistent to have |
Talked with TD offline and he was okay with changing the package name, so I've merged this in. Thanks Sandeep! |
- [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib) - [x] Move Python examples into examples/src/main/python - [x] Update docs to reflect these changes Author: Sandeep <[email protected]> This patch had conflicts when merged, resolved by Committer: Matei Zaharia <[email protected]> Closes #571 from techaddict/SPARK-1637 and squashes the following commits: 47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples 8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples 5f96121 [Sandeep] Move Python examples into examples/src/main/python 0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
- [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib) - [x] Move Python examples into examples/src/main/python - [x] Update docs to reflect these changes Author: Sandeep <[email protected]> This patch had conflicts when merged, resolved by Committer: Matei Zaharia <[email protected]> Closes #571 from techaddict/SPARK-1637 and squashes the following commits: 47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples 8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples 5f96121 [Sandeep] Move Python examples into examples/src/main/python 0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib) (cherry picked from commit a000b5c) Signed-off-by: Matei Zaharia <[email protected]>
…e PR apache#571 was last updated. This resulted in Compilation Errors.
…e PR #571 was last updated. This resulted in Compilation Errors. cc @mateiz project not compiling currently. Author: Sandeep <[email protected]> Closes #673 from techaddict/SPARK-1637-HOTFIX and squashes the following commits: b512f4f [Sandeep] [SPARK-1637][HOTFIX] There are some Streaming examples added after the PR #571 was last updated. This resulted in Compilation Errors.
…e PR #571 was last updated. This resulted in Compilation Errors. cc @mateiz project not compiling currently. Author: Sandeep <[email protected]> Closes #673 from techaddict/SPARK-1637-HOTFIX and squashes the following commits: b512f4f [Sandeep] [SPARK-1637][HOTFIX] There are some Streaming examples added after the PR #571 was last updated. This resulted in Compilation Errors. (cherry picked from commit fdae095) Signed-off-by: Patrick Wendell <[email protected]>
SPARK-1072 Use binary search when needed in RangePartioner Author: Holden Karau <[email protected]> Closes apache#571 and squashes the following commits: f31a2e1 [Holden Karau] Swith to using CollectionsUtils in Partitioner 4c7a0c3 [Holden Karau] Add CollectionsUtil as suggested by aarondav 7099962 [Holden Karau] Add the binary search to only init once 1bef01d [Holden Karau] CR feedback a21e097 [Holden Karau] Use binary search if we have more than 1000 elements inside of RangePartitioner
- [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib) - [x] Move Python examples into examples/src/main/python - [x] Update docs to reflect these changes Author: Sandeep <[email protected]> This patch had conflicts when merged, resolved by Committer: Matei Zaharia <[email protected]> Closes apache#571 from techaddict/SPARK-1637 and squashes the following commits: 47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples 8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples 5f96121 [Sandeep] Move Python examples into examples/src/main/python 0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
…e PR apache#571 was last updated. This resulted in Compilation Errors. cc @mateiz project not compiling currently. Author: Sandeep <[email protected]> Closes apache#673 from techaddict/SPARK-1637-HOTFIX and squashes the following commits: b512f4f [Sandeep] [SPARK-1637][HOTFIX] There are some Streaming examples added after the PR apache#571 was last updated. This resulted in Compilation Errors.
Surfacing decoders on KafkaInputDStream
The new role "install-docker" install docker-ce by using get.docker.com script, it support multiple OS and arch, and keep consistent docker-ce version, mark role "install-docker-ce" as deprecated, and apply new role in all of docker-machine jobs. Update tag of docker-machine job from latest-release to 18.06 in order to avoid confusions. Related-Bug: theopenlab/openlab#308