-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12247] [ML] [DOC] Documentation for spark.ml's ALS and collaborative filtering in general #10411
Conversation
Test build #48109 has finished for PR 10411 at commit
|
cc @thunterdb |
Test build #48130 has finished for PR 10411 at commit
|
Test build #48174 has finished for PR 10411 at commit
|
Test build #48176 has finished for PR 10411 at commit
|
Test build #48234 has finished for PR 10411 at commit
|
Test build #48308 has finished for PR 10411 at commit
|
DataFrame rawPredictions = model.transform(test); | ||
DataFrame predictions = rawPredictions | ||
.withColumn("rating", rawPredictions.col("rating").cast(DataTypes.DoubleType)) | ||
.withColumn("prediction", rawPredictions.col("prediction").cast(DataTypes.DoubleType)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a better way to do this, input welcome.
pinging @thunterdb and @jkbradley |
Test build #50995 has finished for PR 10411 at commit
|
Jenkins, retest this please. |
Test build #51032 has finished for PR 10411 at commit
|
…aborative-filtering
Test build #51051 has finished for PR 10411 at commit
|
@srowen @coderxiang Do you have time to review this PR? |
import sqlContext.implicits._ | ||
|
||
// $example on$ | ||
val ratings = sc.textFile("data/mllib/als/sample_movielens_ratings.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this file was removed though right? is it because we can't distribute even a sample of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, the one removed is sample_movielens_movies.txt
as it was only used in MovieLens.scala
which has been removed, cf the discussion in the jira.
@srowen thanks for the review, will make the necessary changes. |
It is common in many real-world use cases to only have access to *implicit feedback* (e.g. views, | ||
clicks, purchases, likes, shares etc.). The approach used in `spark.mllib` to deal with such data is taken | ||
from [Collaborative Filtering for Implicit Feedback Datasets](http://dx.doi.org/10.1109/ICDM.2008.22). | ||
Essentially, instead of trying to model the matrix of ratings directly, this approach treats the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen tried to take your remarks into account, I don't know if it's clearer now though.
Test build #51242 has finished for PR 10411 at commit
|
I'm OK merging this |
@BenFradet yeah I like your last edit. If you're willing to make that change and the sentence fragment change I'll merge |
Great, I'll do that later today. |
Test build #51318 has finished for PR 10411 at commit
|
Merged to master |
This documents the implementation of ALS in
spark.ml
with example code in scala, java and python.