Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17315][SparkR] Kolmogorov-Smirnov test SparkR wrapper #14881

Closed
wants to merge 4 commits into from

Conversation

junyangq
Copy link
Contributor

What changes were proposed in this pull request?

This PR tries to add Kolmogorov-Smirnov Test wrapper to SparkR. This wrapper implementation only supports one sample test against normal distribution.

How was this patch tested?

R unit test.

@SparkQA
Copy link

SparkQA commented Aug 30, 2016

Test build #64665 has finished for PR 14881 at commit d4b1459.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

#' @details
#' For more details, see
#' \href{http://spark.apache.org/docs/latest/mllib-statistics.html#hypothesis-testing}{
#' MLlib: Hypothesis Testing}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put this in @Seealso? That seems to be the typical way to add link in our doc

@SparkQA
Copy link

SparkQA commented Sep 1, 2016

Test build #64794 has finished for PR 14881 at commit 0a8ff80.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 2, 2016

Test build #64865 has finished for PR 14881 at commit caeb91e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

LGTM

@asfgit asfgit closed this in abb2f92 Sep 3, 2016
ghost pushed a commit to dbtsai/spark that referenced this pull request Sep 22, 2016
…test summary

## What changes were proposed in this pull request?
apache#14881 added Kolmogorov-Smirnov Test wrapper to SparkR. I found that ```print.summary.KSTest``` was implemented inappropriately and result in no effect.
Running the following code for KSTest:
```Scala
data <- data.frame(test = c(0.1, 0.15, 0.2, 0.3, 0.25, -1, -0.5))
df <- createDataFrame(data)
testResult <- spark.kstest(df, "test", "norm")
summary(testResult)
```
Before this PR:
![image](https://cloud.githubusercontent.com/assets/1962026/18615016/b9a2823a-7d4f-11e6-934b-128beade355e.png)
After this PR:
![image](https://cloud.githubusercontent.com/assets/1962026/18615014/aafe2798-7d4f-11e6-8b99-c705bb9fe8f2.png)
The new implementation is similar with [```print.summary.GeneralizedLinearRegressionModel```](https://github.com/apache/spark/blob/master/R/pkg/R/mllib.R#L284) of SparkR and [```print.summary.glm```](https://svn.r-project.org/R/trunk/src/library/stats/R/glm.R) of native R.

BTW, I removed the comparison of ```print.summary.KSTest``` in unit test, since it's only wrappers of the summary output which has been checked. Another reason is that these comparison will output summary information to the test console, it will make the test output in a mess.

## How was this patch tested?
Existing test.

Author: Yanbo Liang <[email protected]>

Closes apache#15139 from yanboliang/spark-17315.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants