Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2843][MLLIB] add a section about regularization parameter in ALS #2064

Closed
wants to merge 3 commits into from
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/mllib-collaborative-filtering.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,17 @@ level of confidence in observed user preferences, rather than explicit ratings g
model then tries to find latent factors that can be used to predict the expected preference of a
user for an item.

### Scaling of the regularization parameter

Since v1.1, we scale the regularization parameter `lambda` in solving each least squares problem by
the number of ratings the user generated in updating user factors,
or the number of ratings the product received in updating product factors.
This approach is named "ALS-WR" and introduced in the paper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This technique was used by many groups in the Netflix Prize (see section 5 of this paper for some more references: http://www.cs.toronto.edu/~rsalakhu/papers/weighted_tc.pdf). Perhaps this sentence should be changed to "This approach is discussed in the following paper..."

"[Large-Scale Parallel Collaborative Filtering for the Netflix Prize](http://dx.doi.org/10.1007/978-3-540-68880-8_32)".
It makes `lambda` less dependent on the scale of the dataset.
So we can apply the best parameter learned from a sampled subset to the full dataset
and expect similar performance.

## Examples

<div class="codetabs">
Expand Down