[SPARK-6007][SQL] Add numRows param in DataFrame.show() #4767

jackylk · 2015-02-25T13:20:13Z

It is useful to let the user decide the number of rows to show in DataFrame.show

SparkQA · 2015-02-25T13:22:37Z

Test build #27950 has started for PR 4767 at commit 981be52.

This patch merges cleanly.

SparkQA · 2015-02-25T15:00:59Z

Test build #27950 has finished for PR 4767 at commit 981be52.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-25T15:01:03Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27950/
Test PASSed.

rxin · 2015-02-25T20:21:30Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala

   * @group basic
   */
-  def show(): Unit = println(showString())
+  def show(numRows: Int = 20): Unit = println(showString(numRows))


this won't work in Java. you need to overload show to provide two shows.

I added a test case in Java, it seems working. Is there any problem?

did you try just calling show?

I tried

DataFrame df = context.table("testData"); df.show(10); df.show(1000); df.select("*").show(30);

Is that what you mean?

can you try df.show()?

in either case, I wouldn't not want to rely on Scala's generated code for default param values for bytecode level binary compatibility. Please implement two show methods, one with an int param and one without. Thanks.

SparkQA · 2015-02-26T04:12:37Z

Test build #27978 has started for PR 4767 at commit d7acc18.

This patch merges cleanly.

SparkQA · 2015-02-26T05:03:03Z

Test build #27983 has started for PR 4767 at commit bb54537.

This patch merges cleanly.

rxin · 2015-02-26T05:06:42Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala

+  /**
+   * Displays the [[DataFrame]] in a tabular form. (For Java compatibility)
+   */
+  def show(): Unit = println(showString(20))


what I mean is...

/** * Displays the [[DataFrame]] in a tabular form. For example: * {{{ * year month AVG('Adj Close) MAX('Adj Close) * 1980 12 0.503218 0.595103 * 1981 01 0.523289 0.570307 * 1982 02 0.436504 0.475256 * 1983 03 0.410516 0.442194 * 1984 04 0.450090 0.483521 * ... * }}} * @param numRows Number of rows to show * @group basic */ def show(numRows: Int): Unit = println(showString(numRows)) /** * Displays the top 20 rows of the [[DataFrame]] in a tabular form. */ def show(): Unit = show(20)

i.e don't include "for java compatibility" in the user facing doc, and don't include the default param value.

Thanks a lot

SparkQA · 2015-02-26T05:50:50Z

Test build #27978 has finished for PR 4767 at commit d7acc18.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-26T05:50:54Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27978/
Test PASSed.

SparkQA · 2015-02-26T05:57:36Z

Test build #27986 has started for PR 4767 at commit 7cdbe91.

This patch merges cleanly.

SparkQA · 2015-02-26T06:47:31Z

Test build #27983 has finished for PR 4767 at commit bb54537.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-26T06:47:36Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27983/
Test PASSed.

rxin · 2015-02-26T06:49:35Z

python/pyspark/sql/dataframe.py

        """
-        Print the first 20 rows.
+        Print the first n rows.

        >>> df
        DataFrame[age: int, name: string]


one more thing. can you add one more docstring test to test n = 1?

SparkQA · 2015-02-26T07:46:06Z

Test build #27986 has finished for PR 4767 at commit 7cdbe91.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-26T07:46:11Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27986/
Test PASSed.

SparkQA · 2015-02-26T13:47:40Z

Test build #28003 has started for PR 4767 at commit a0e0f4b.

This patch merges cleanly.

SparkQA · 2015-02-26T15:31:11Z

Test build #28003 has finished for PR 4767 at commit a0e0f4b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-26T15:31:15Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28003/
Test PASSed.

rxin · 2015-02-26T18:43:18Z

Thanks. I've merged this.

It is useful to let the user decide the number of rows to show in DataFrame.show Author: Jacky Li <[email protected]> Closes #4767 from jackylk/show and squashes the following commits: a0e0f4b [Jacky Li] fix testcase 7cdbe91 [Jacky Li] modify according to comment bb54537 [Jacky Li] for Java compatibility d7acc18 [Jacky Li] modify according to comments 981be52 [Jacky Li] add numRows param in DataFrame.show() (cherry picked from commit 2358657) Signed-off-by: Reynold Xin <[email protected]>

add numRows param in DataFrame.show()

981be52

rxin reviewed Feb 25, 2015
View reviewed changes

modify according to comments

d7acc18

for Java compatibility

bb54537

rxin reviewed Feb 26, 2015
View reviewed changes

modify according to comment

7cdbe91

rxin reviewed Feb 26, 2015
View reviewed changes

fix testcase

a0e0f4b

asfgit closed this in 2358657 Feb 26, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-6007][SQL] Add numRows param in DataFrame.show() #4767

[SPARK-6007][SQL] Add numRows param in DataFrame.show() #4767

jackylk commented Feb 25, 2015

SparkQA commented Feb 25, 2015

SparkQA commented Feb 25, 2015

AmplabJenkins commented Feb 25, 2015

rxin Feb 25, 2015

jackylk Feb 26, 2015

rxin Feb 26, 2015

jackylk Feb 26, 2015

rxin Feb 26, 2015

SparkQA commented Feb 26, 2015

SparkQA commented Feb 26, 2015

rxin Feb 26, 2015

rxin Feb 26, 2015

jackylk Feb 26, 2015

SparkQA commented Feb 26, 2015

AmplabJenkins commented Feb 26, 2015

SparkQA commented Feb 26, 2015

SparkQA commented Feb 26, 2015

AmplabJenkins commented Feb 26, 2015

rxin Feb 26, 2015

SparkQA commented Feb 26, 2015

AmplabJenkins commented Feb 26, 2015

SparkQA commented Feb 26, 2015

SparkQA commented Feb 26, 2015

AmplabJenkins commented Feb 26, 2015

rxin commented Feb 26, 2015

[SPARK-6007][SQL] Add numRows param in DataFrame.show() #4767

[SPARK-6007][SQL] Add numRows param in DataFrame.show() #4767

Conversation

jackylk commented Feb 25, 2015

SparkQA commented Feb 25, 2015

SparkQA commented Feb 25, 2015

AmplabJenkins commented Feb 25, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 26, 2015

SparkQA commented Feb 26, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Feb 26, 2015

AmplabJenkins commented Feb 26, 2015

SparkQA commented Feb 26, 2015

SparkQA commented Feb 26, 2015

AmplabJenkins commented Feb 26, 2015

Choose a reason for hiding this comment

SparkQA commented Feb 26, 2015

AmplabJenkins commented Feb 26, 2015

SparkQA commented Feb 26, 2015

SparkQA commented Feb 26, 2015

AmplabJenkins commented Feb 26, 2015

rxin commented Feb 26, 2015