Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SQL] Various DataFrame DSL update. #4260

Closed
wants to merge 4 commits into from
Closed

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Jan 29, 2015

  1. Added foreach, foreachPartition, flatMap to DataFrame.
  2. Added col() in dsl.
  3. Support renaming columns in toDataFrame.
  4. Support type inference on arrays (in addition to Seq).
  5. Updated mllib to use the new DSL.

1. Added foreach, foreachPartition, flatMap to DataFrame.
2. Added col() in dsl.
3. Support renaming columns in toDataFrame.
4. Support type inference on arrays (in addition to Seq).
@rxin
Copy link
Contributor Author

rxin commented Jan 29, 2015

cc @mengxr for mllib changes

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26285 has started for PR 4260 at commit 62608c4.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26285 has finished for PR 4260 at commit 62608c4.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26285/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26287 has started for PR 4260 at commit d31fcd2.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26294 has started for PR 4260 at commit fab3ccc.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26287 has finished for PR 4260 at commit d31fcd2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26287/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26294/
Test FAILed.

@shaneknapp
Copy link
Contributor

i will kick this off again once i restart jenkins.

@shaneknapp
Copy link
Contributor

jenkins, test this please.

1 similar comment
@shaneknapp
Copy link
Contributor

jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26299 has started for PR 4260 at commit 73466c1.

  • This patch merges cleanly.

.select($"*", callUDF(predict, Column(map(scoreCol))).as(map(predictionCol)))
val predictFunction: Double => Double = (score) => { if (score > t) 1.0 else 0.0 }
dataset
.select($"*", callUDF(scoreFunction, col(map(featuresCol))).as(map(scoreCol)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: The word col might be used as matrix column index in ML algorithms.

This line is still not straightforward to read. I'm thinking of something like the following

val scoreFunc = UDF((score: Double) => {if (score > t) 1.0 else 0.0})
dataset.select($"*", scoreFunc(col(map(featuresCol))).as(map(scoreCol))

@mengxr
Copy link
Contributor

mengxr commented Jan 29, 2015

@rxin The ALS join code is much easier to read now:) I hope UDFs can be used as functions instead of an argument of callUDF. Besides, it would be nice to interpret strings as columns by default and force using Literal for constant strings. (I'm limited by the scope of ML usages.) Those could be done in follow-up PRs. The ML changes in this PR look good to me.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26299 has finished for PR 4260 at commit 73466c1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26299/
Test FAILed.

@asfgit asfgit closed this in 5ad78f6 Jan 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants