-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6007][SQL] Add numRows param in DataFrame.show() #4767
Conversation
Test build #27950 has started for PR 4767 at commit
|
Test build #27950 has finished for PR 4767 at commit
|
Test PASSed. |
* @group basic | ||
*/ | ||
def show(): Unit = println(showString()) | ||
def show(numRows: Int = 20): Unit = println(showString(numRows)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't work in Java. you need to overload show to provide two shows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test case in Java, it seems working. Is there any problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you try just calling show?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried
DataFrame df = context.table("testData");
df.show(10);
df.show(1000);
df.select("*").show(30);
Is that what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you try df.show()?
in either case, I wouldn't not want to rely on Scala's generated code for default param values for bytecode level binary compatibility. Please implement two show methods, one with an int param and one without. Thanks.
Test build #27978 has started for PR 4767 at commit
|
Test build #27983 has started for PR 4767 at commit
|
/** | ||
* Displays the [[DataFrame]] in a tabular form. (For Java compatibility) | ||
*/ | ||
def show(): Unit = println(showString(20)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I mean is...
/**
* Displays the [[DataFrame]] in a tabular form. For example:
* {{{
* year month AVG('Adj Close) MAX('Adj Close)
* 1980 12 0.503218 0.595103
* 1981 01 0.523289 0.570307
* 1982 02 0.436504 0.475256
* 1983 03 0.410516 0.442194
* 1984 04 0.450090 0.483521
* ...
* }}}
* @param numRows Number of rows to show
* @group basic
*/
def show(numRows: Int): Unit = println(showString(numRows))
/**
* Displays the top 20 rows of the [[DataFrame]] in a tabular form.
*/
def show(): Unit = show(20)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e don't include "for java compatibility" in the user facing doc, and don't include the default param value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot
Test build #27978 has finished for PR 4767 at commit
|
Test PASSed. |
Test build #27986 has started for PR 4767 at commit
|
Test build #27983 has finished for PR 4767 at commit
|
Test PASSed. |
""" | ||
Print the first 20 rows. | ||
Print the first n rows. | ||
|
||
>>> df | ||
DataFrame[age: int, name: string] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more thing. can you add one more docstring test to test n = 1?
Test build #27986 has finished for PR 4767 at commit
|
Test PASSed. |
Test build #28003 has started for PR 4767 at commit
|
Test build #28003 has finished for PR 4767 at commit
|
Test PASSed. |
Thanks. I've merged this. |
It is useful to let the user decide the number of rows to show in DataFrame.show Author: Jacky Li <[email protected]> Closes #4767 from jackylk/show and squashes the following commits: a0e0f4b [Jacky Li] fix testcase 7cdbe91 [Jacky Li] modify according to comment bb54537 [Jacky Li] for Java compatibility d7acc18 [Jacky Li] modify according to comments 981be52 [Jacky Li] add numRows param in DataFrame.show() (cherry picked from commit 2358657) Signed-off-by: Reynold Xin <[email protected]>
It is useful to let the user decide the number of rows to show in DataFrame.show