Skip to content

Commit

Permalink
SPARK-2085: [MLlib] Apply user-specific regularization instead of uni…
Browse files Browse the repository at this point in the history
…form regularization in ALS

The current implementation of ALS takes a single regularization parameter and apply it on both of the user factors and the product factors. This kind of regularization can be less effective while user number is significantly larger than the number of products (and vice versa). For example, if we have 10M users and 1K product, regularization on user factors will dominate. Following the discussion in [this thread](http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tt2567.html#a2704), the implementation in this PR will regularize each factor vector by #ratings * lambda.

Author: Shuo Xiang <[email protected]>

Closes apache#1026 from coderxiang/als-reg and squashes the following commits:

93dfdb4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into als-reg
b98f19c [Shuo Xiang] merge latest master
52c7b58 [Shuo Xiang] Apply user-specific regularization instead of uniform regularization in Alternating Least Squares (ALS)
  • Loading branch information
Shuo Xiang authored and mengxr committed Jun 13, 2014
1 parent 1c04652 commit a6e0afd
Showing 1 changed file with 7 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,9 @@ class ALS private (
val tempXtX = DoubleMatrix.zeros(triangleSize)
val fullXtX = DoubleMatrix.zeros(rank, rank)

// Count the number of ratings each user gives to provide user-specific regularization
val numRatings = Array.fill(numUsers)(0)

// Compute the XtX and Xy values for each user by adding products it rated in each product
// block
for (productBlock <- 0 until numProductBlocks) {
Expand All @@ -519,6 +522,7 @@ class ALS private (
if (implicitPrefs) {
var i = 0
while (i < us.length) {
numRatings(us(i)) += 1
// Extension to the original paper to handle rs(i) < 0. confidence is a function
// of |rs(i)| instead so that it is never negative:
val confidence = 1 + alpha * abs(rs(i))
Expand All @@ -534,6 +538,7 @@ class ALS private (
} else {
var i = 0
while (i < us.length) {
numRatings(us(i)) += 1
userXtX(us(i)).addi(tempXtX)
SimpleBlas.axpy(rs(i), x, userXy(us(i)))
i += 1
Expand All @@ -550,9 +555,10 @@ class ALS private (
// Compute the full XtX matrix from the lower-triangular part we got above
fillFullMatrix(userXtX(index), fullXtX)
// Add regularization
val regParam = numRatings(index) * lambda
var i = 0
while (i < rank) {
fullXtX.data(i * rank + i) += lambda
fullXtX.data(i * rank + i) += regParam
i += 1
}
// Solve the resulting matrix, which is symmetric and positive-definite
Expand Down

0 comments on commit a6e0afd

Please sign in to comment.