-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2514] [mllib] Random RDD generator #1520
Conversation
Looking for feedback on design decisions. Very rough draft and untested.
QA tests have started for PR 1520. This patch merges cleanly. |
QA results for PR 1520: |
trait DistributionGenerator extends Pseudorandom with Serializable { | ||
|
||
/** | ||
* @return An i.i.d sample as a Double from an underlying distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change @return
to Returns
. Otherwise the summary will be empty in the generated docs.
QA tests have started for PR 1520. This patch merges cleanly. |
import org.apache.spark.util.random.{XORShiftRandom, Pseudorandom} | ||
|
||
/** | ||
* Trait for random number generators that generate i.i.d values from a distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.i.d
-> i.i.d.
and in other places
@dorx Besides comments, could you mark distribution generators and methods that requires distribution generators |
QA tests have started for PR 1520. This patch merges cleanly. |
QA results for PR 1520: |
Jenkins, retest this please. |
QA tests have started for PR 1520. This patch merges cleanly. |
QA results for PR 1520: |
LGTM. Merged into master. Thanks for adding random RDD generators!! |
Utilities for generating random RDDs. RandomRDD and RandomVectorRDD are created instead of using `sc.parallelize(range:Range)` because `Range` objects in Scala can only have `size <= Int.MaxValue`. The object `RandomRDDGenerators` can be transformed into a generator class to reduce the number of auxiliary methods for optional arguments. Author: Doris Xin <[email protected]> Closes apache#1520 from dorx/randomRDD and squashes the following commits: 01121ac [Doris Xin] reviewer comments 6bf27d8 [Doris Xin] Merge branch 'master' into randomRDD a8ea92d [Doris Xin] Reviewer comments 063ea0b [Doris Xin] Merge branch 'master' into randomRDD aec68eb [Doris Xin] newline bc90234 [Doris Xin] units passed. d56cacb [Doris Xin] impl with RandomRDD 92d6f1c [Doris Xin] solution for Cloneable df5bcff [Doris Xin] Merge branch 'generator' into randomRDD f46d928 [Doris Xin] WIP 49ed20d [Doris Xin] alternative poisson distribution generator 7cb0e40 [Doris Xin] fix for data inconsistency 8881444 [Doris Xin] RandomRDDGenerator: initial design
Utilities for generating random RDDs.
RandomRDD and RandomVectorRDD are created instead of using
sc.parallelize(range:Range)
becauseRange
objects in Scala can only havesize <= Int.MaxValue
.The object
RandomRDDGenerators
can be transformed into a generator class to reduce the number of auxiliary methods for optional arguments.