Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16727][SparkR] Fix expected test output of describe and summary functions #14357

Closed
wants to merge 1 commit into from

Conversation

junyangq
Copy link
Contributor

What changes were proposed in this pull request?

Fix expected test output of describe and summary functions. String columns are not summarized in the output.

How was this patch tested?

SparkR unit test.

@shivaram
Copy link
Contributor

@junyangq Any idea how the tests were passing on Jenkins before this fix ?

@shivaram
Copy link
Contributor

I think this is related to 142df48

cc @dongjoon-hyun

@SparkQA
Copy link

SparkQA commented Jul 26, 2016

Test build #62852 has finished for PR 14357 at commit f7d02ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Hi, @shivaram.
Sorry, but what is the problem?

@dongjoon-hyun
Copy link
Member

String columns works for min and max summarization.

@junyangq
Copy link
Contributor Author

@shivaram I'm also curious...The issue arised when I ran the test locally.

@dongjoon-hyun
Copy link
Member

> collect(describe(read.json('examples/src/main/resources/people.json')))
  summary                age    name
1   count                  2       3
2    mean               24.5    <NA>
3  stddev 7.7781745930520225    <NA>
4     min                 19    Andy
5     max                 30 Michael

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 26, 2016

In fact, count, min, and max are supported officially for string in all Scala/Python/R.

@dongjoon-hyun
Copy link
Member

Hi, @junyangq .
Could you give me some pointer to understand the current context? :)

@junyangq
Copy link
Contributor Author

junyangq commented Jul 26, 2016

@dongjoon-hyun That makes sense. That's why I feel confused about the output on my local machine...

> collect(describe(read.json('examples/src/main/resources/people.json')))
  summary                age                                                    
1   count                  2
2    mean               24.5
3  stddev 7.7781745930520225
4     min                 19
5     max                 30

@dongjoon-hyun
Copy link
Member

I see. Which version is it?

@dongjoon-hyun
Copy link
Member

I see. That seems 2.0 branch.

@dongjoon-hyun
Copy link
Member

Hi, @shivaram .
It's due to the branch difference.
The PR is merged to the master only 17 days ago.

Thanks - merging in master.

@junyangq
Copy link
Contributor Author

I think I ran the test under the master branch though... Let me double check :)

@dongjoon-hyun
Copy link
Member

Thank you, @junyangq .

@junyangq
Copy link
Contributor Author

I merged the most recent master branch, rebuilt and installed the package, but the test failed at the same place. @dongjoon-hyun

@dongjoon-hyun
Copy link
Member

Oh, let me try this time. Thank you for double-checking, @junyangq .

@dongjoon-hyun
Copy link
Member

@junyangq . Currently, you are at the most recent master build, right?
And, SparkR does not show the name column as a result of describe command.
I'm wondering the result of spark-shell on your systems. Could you run the following command in spark-shell?

scala> spark.read.json("examples/src/main/resources/people.json").describe().show()

@junyangq
Copy link
Contributor Author

Hmm... It doesn't show the name column either.

@dongjoon-hyun
Copy link
Member

At the most recent master build, I did the following things and all tests are passed.

$ git clean -fdx
$ ./build/sbt -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive-thriftserver -Phive -Psparkr package streaming-kafka-0-8-assembly/assembly streaming-flume-assembly/assembly streaming-kinesis-asl-assembly/assembly
$ R/install-dev.sh 
$ R/run-tests.sh 
...
DONE ===========================================================================
Tests passed.

@dongjoon-hyun
Copy link
Member

I think you did something wrong. :)
Could you close this PR?

@dongjoon-hyun
Copy link
Member

Actually, you can see the Jenkins log, too. There is no problem with the current R testsuite.
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62858/consoleFull

FYI, I'm using JDK 1.8.0_102 and Jenkins is using JDK 1.7.x.

@junyangq
Copy link
Contributor Author

Yeah sure, but just wondering if the clean and additional arguments something we should normally do?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 26, 2016

Never. I did that in order to make it sure. I don't do it frequently.
It is for you. :)

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 26, 2016

If you use -fdx, it removes IDE (like Intellij) settings, too.

@junyangq
Copy link
Contributor Author

I see. Thank you for pointing that out :) I'll close the PR.

@junyangq junyangq closed this Jul 26, 2016
@dongjoon-hyun
Copy link
Member

It's my pleasure. See you later around Apache Spark. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants