-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13553][SPARK-13554][SQL] Migrates basic inspection and typed relational operations from DataFrame to Dataset #11431
Conversation
if n.outerPointer.isEmpty && | ||
n.cls.isMemberClass && | ||
!Modifier.isStatic(n.cls.getModifiers) => | ||
n.cls.getEnclosingClass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is included in PR #11421. Without this fix, we can't use case classes defined in SQLTestData
as Dataset element type.
f38c016
to
ee9c432
Compare
Test build #52187 has finished for PR 11431 at commit
|
Test build #52188 has finished for PR 11431 at commit
|
Can you rebase to get rid of the merged commit? |
…lational Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
@liancheng I just pushed two commits to resolve the conflicts. |
Test build #52191 has finished for PR 11431 at commit
|
val sum = weights.sum | ||
val normalizedCumWeights = weights.map(_ / sum).scanLeft(0.0d)(_ + _) | ||
normalizedCumWeights.sliding(2).map { x => | ||
new Dataset(sqlContext, Sample(x(0), x(1), withReplacement = false, seed, sorted)()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to pass encoder into newly created Datasets at here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there's an implicit encoder defined in the constructor of Dataset.
|
||
/** | ||
* Returns all column names and their data types as an array. | ||
* @since 2.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the API is moved from DataFrame, should we also copy the @since
? cc @rxin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put 2.0 since it didn't exist on dataset before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since DataFrame will be an alias of Dataset, what will the doc for DataFrame looks like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we copy the versions, It's also weird that see a method of Dataset (1.3) is introduced before Dataset is introduced (1.6).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I'd just have it as 2.0.
Closing this in favor of #11443. |
What changes were proposed in this pull request?
This PR migrates basic inspection and typed relational operations from DataFrame to Dataset. This is the first step of unifying DataFrame and Dataset API.
TODO
explode
operations.How was this patch tested?
Corresponding test cases are migrated from
DataFrameSuite
toDatasetSuite
. These newly added test cases all share the same "df-to-ds" prefix so that we can easily execute them under SBT using:This prefix will be removed after migrating all the DataFrame operations.