-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SQL] Various DataFrame DSL update. #4260
Conversation
rxin
commented
Jan 29, 2015
- Added foreach, foreachPartition, flatMap to DataFrame.
- Added col() in dsl.
- Support renaming columns in toDataFrame.
- Support type inference on arrays (in addition to Seq).
- Updated mllib to use the new DSL.
1. Added foreach, foreachPartition, flatMap to DataFrame. 2. Added col() in dsl. 3. Support renaming columns in toDataFrame. 4. Support type inference on arrays (in addition to Seq).
cc @mengxr for mllib changes |
Test build #26285 has started for PR 4260 at commit
|
Test build #26285 has finished for PR 4260 at commit
|
Test FAILed. |
Test build #26287 has started for PR 4260 at commit
|
Test build #26294 has started for PR 4260 at commit
|
Test build #26287 has finished for PR 4260 at commit
|
Test FAILed. |
Test FAILed. |
i will kick this off again once i restart jenkins. |
jenkins, test this please. |
1 similar comment
jenkins, test this please. |
Test build #26299 has started for PR 4260 at commit
|
.select($"*", callUDF(predict, Column(map(scoreCol))).as(map(predictionCol))) | ||
val predictFunction: Double => Double = (score) => { if (score > t) 1.0 else 0.0 } | ||
dataset | ||
.select($"*", callUDF(scoreFunction, col(map(featuresCol))).as(map(scoreCol))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: The word col
might be used as matrix column index in ML algorithms.
This line is still not straightforward to read. I'm thinking of something like the following
val scoreFunc = UDF((score: Double) => {if (score > t) 1.0 else 0.0})
dataset.select($"*", scoreFunc(col(map(featuresCol))).as(map(scoreCol))
@rxin The ALS join code is much easier to read now:) I hope UDFs can be used as functions instead of an argument of |
Test build #26299 has finished for PR 4260 at commit
|
Test FAILed. |