Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6007][SQL] Add numRows param in DataFrame.show() #4767

Closed
wants to merge 5 commits into from

Conversation

jackylk
Copy link
Contributor

@jackylk jackylk commented Feb 25, 2015

It is useful to let the user decide the number of rows to show in DataFrame.show

@SparkQA
Copy link

SparkQA commented Feb 25, 2015

Test build #27950 has started for PR 4767 at commit 981be52.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 25, 2015

Test build #27950 has finished for PR 4767 at commit 981be52.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27950/
Test PASSed.

* @group basic
*/
def show(): Unit = println(showString())
def show(numRows: Int = 20): Unit = println(showString(numRows))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't work in Java. you need to overload show to provide two shows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test case in Java, it seems working. Is there any problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you try just calling show?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried

     DataFrame df = context.table("testData");
     df.show(10);
     df.show(1000);
     df.select("*").show(30);

Is that what you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try df.show()?

in either case, I wouldn't not want to rely on Scala's generated code for default param values for bytecode level binary compatibility. Please implement two show methods, one with an int param and one without. Thanks.

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #27978 has started for PR 4767 at commit d7acc18.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #27983 has started for PR 4767 at commit bb54537.

  • This patch merges cleanly.

/**
* Displays the [[DataFrame]] in a tabular form. (For Java compatibility)
*/
def show(): Unit = println(showString(20))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I mean is...

   /**
    * Displays the [[DataFrame]] in a tabular form. For example:
    * {{{
    *   year  month AVG('Adj Close) MAX('Adj Close)
    *   1980  12    0.503218        0.595103
    *   1981  01    0.523289        0.570307
    *   1982  02    0.436504        0.475256
    *   1983  03    0.410516        0.442194
    *   1984  04    0.450090        0.483521
    *   ...
    * }}}
    * @param numRows Number of rows to show
    * @group basic
    */
  def show(numRows: Int): Unit = println(showString(numRows))

  /**
   * Displays the top 20 rows of the [[DataFrame]] in a tabular form.
   */
  def show(): Unit = show(20)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e don't include "for java compatibility" in the user facing doc, and don't include the default param value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #27978 has finished for PR 4767 at commit d7acc18.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27978/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #27986 has started for PR 4767 at commit 7cdbe91.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #27983 has finished for PR 4767 at commit bb54537.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27983/
Test PASSed.

"""
Print the first 20 rows.
Print the first n rows.

>>> df
DataFrame[age: int, name: string]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more thing. can you add one more docstring test to test n = 1?

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #27986 has finished for PR 4767 at commit 7cdbe91.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27986/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #28003 has started for PR 4767 at commit a0e0f4b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #28003 has finished for PR 4767 at commit a0e0f4b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28003/
Test PASSed.

@rxin
Copy link
Contributor

rxin commented Feb 26, 2015

Thanks. I've merged this.

asfgit pushed a commit that referenced this pull request Feb 26, 2015
It is useful to let the user decide the number of rows to show in DataFrame.show

Author: Jacky Li <[email protected]>

Closes #4767 from jackylk/show and squashes the following commits:

a0e0f4b [Jacky Li] fix testcase
7cdbe91 [Jacky Li] modify according to comment
bb54537 [Jacky Li] for Java compatibility
d7acc18 [Jacky Li] modify according to comments
981be52 [Jacky Li] add numRows param in DataFrame.show()

(cherry picked from commit 2358657)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 2358657 Feb 26, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants