Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2514] [mllib] Random RDD generator #1520

Closed
wants to merge 13 commits into from
Closed

Conversation

dorx
Copy link
Contributor

@dorx dorx commented Jul 22, 2014

Utilities for generating random RDDs.

RandomRDD and RandomVectorRDD are created instead of using sc.parallelize(range:Range) because Range objects in Scala can only have size <= Int.MaxValue.

The object RandomRDDGenerators can be transformed into a generator class to reduce the number of auxiliary methods for optional arguments.

@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA tests have started for PR 1520. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16942/consoleFull

@dorx
Copy link
Contributor Author

dorx commented Jul 22, 2014

@falaki @jkbradley @mengxr

@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA results for PR 1520:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait DistributionGenerator extends Pseudorandom with Serializable {
class UniformGenerator() extends DistributionGenerator {
class StandardNormalGenerator() extends DistributionGenerator {
class PoissonGenerator(val mean: Double) extends DistributionGenerator {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16942/consoleFull

trait DistributionGenerator extends Pseudorandom with Serializable {

/**
* @return An i.i.d sample as a Double from an underlying distribution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change @return to Returns. Otherwise the summary will be empty in the generated docs.

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA tests have started for PR 1520. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17060/consoleFull

import org.apache.spark.util.random.{XORShiftRandom, Pseudorandom}

/**
* Trait for random number generators that generate i.i.d values from a distribution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.i.d -> i.i.d. and in other places

@mengxr
Copy link
Contributor

mengxr commented Jul 25, 2014

@dorx Besides comments, could you mark distribution generators and methods that requires distribution generators @Experimental? Part of the reason is that we don't have the API in Python and whether we should implement the same in Python is not clear.

@SparkQA
Copy link

SparkQA commented Jul 25, 2014

QA tests have started for PR 1520. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17197/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 25, 2014

QA results for PR 1520:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait DistributionGenerator extends Pseudorandom with Serializable {
class UniformGenerator extends DistributionGenerator {
class StandardNormalGenerator extends DistributionGenerator {
class PoissonGenerator(val mean: Double) extends DistributionGenerator {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17197/consoleFull

@dorx
Copy link
Contributor Author

dorx commented Jul 25, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 25, 2014

QA tests have started for PR 1520. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17205/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 25, 2014

QA results for PR 1520:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait DistributionGenerator extends Pseudorandom with Serializable {
class UniformGenerator extends DistributionGenerator {
class StandardNormalGenerator extends DistributionGenerator {
class PoissonGenerator(val mean: Double) extends DistributionGenerator {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17205/consoleFull

@mengxr
Copy link
Contributor

mengxr commented Jul 27, 2014

LGTM. Merged into master. Thanks for adding random RDD generators!!

@asfgit asfgit closed this in 81fcdd2 Jul 27, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Utilities for generating random RDDs.

RandomRDD and RandomVectorRDD are created instead of using `sc.parallelize(range:Range)` because `Range` objects in Scala can only have `size <= Int.MaxValue`.

The object `RandomRDDGenerators` can be transformed into a generator class to reduce the number of auxiliary methods for optional arguments.

Author: Doris Xin <[email protected]>

Closes apache#1520 from dorx/randomRDD and squashes the following commits:

01121ac [Doris Xin] reviewer comments
6bf27d8 [Doris Xin] Merge branch 'master' into randomRDD
a8ea92d [Doris Xin] Reviewer comments
063ea0b [Doris Xin] Merge branch 'master' into randomRDD
aec68eb [Doris Xin] newline
bc90234 [Doris Xin] units passed.
d56cacb [Doris Xin] impl with RandomRDD
92d6f1c [Doris Xin] solution for Cloneable
df5bcff [Doris Xin] Merge branch 'generator' into randomRDD
f46d928 [Doris Xin] WIP
49ed20d [Doris Xin] alternative poisson distribution generator
7cb0e40 [Doris Xin] fix for data inconsistency
8881444 [Doris Xin] RandomRDDGenerator: initial design
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants