Skip to content
/ cpmf Public

Collections of parallel optimization on matrix factorization

License

Notifications You must be signed in to change notification settings

ysk24ok/cpmf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cpmf: Collection of Parallel Matrix Factorization

Prerequisite

required

piconjson is needed to parse config.json.

$ git clone https://github.com/kazuho/picojson.git vendor/picojson

optional

If you want to use MassiveThreads as a task parallel library, install it by the following command.

$ git clone https://github.com/massivethreads/massivethreads.git vendor/massivethreads
$ cd vendor/massivethreads
$ ./configure --prefix=/usr/local
$ make && make install

When you change PREFIX from /usr/local, be sure to also change MYTH_PATH in Makefile.

Converting MovieLens data

Use scripts/convert_movielens.py to convert MovieLens data format to cpmf format.

To convert MovieLens 100K Dataset,

$ python scripts/convert_movielens.py PATH/ml-100k/u.data > input/ml-100k

To convert MovieLens 1M dataset,

$ python scripts/convert_movielens.py PATH/ml-1m/ratings.dat --separator :: > input/ml-1m

To convert MovieLens 10M dataset

$ python scripts/convert_movielens.py PATH/ml-10M100K/ratings.dat --separator :: > input/ml-10m

Parallel methods

Users can designate the parallel method by DPARALLEL in Makefile.

FPSGD

In FPSGD, the rating matrix is divided into many blocks and multiple threads work on blocks not to share the same row or column.

If you want to use FPSGD method, specify DPARALLEL = -DFPSGD.

  • Reference

    Y.Zhuang, W-S.Chin and Y-C.Juan and C-J.Lin, "A fast parallel SGD for matrix factorization in shared memory systems", RecSys'13, paper

dcMF (by Intel Cilk or MassiveThreads)

dcMF is our proposing way to parallelize matrix factorization by recursively dividing the rating matrix into 4 smaller blocks and dynamically assigning the created tasks to threads.

If you want to use dcMF, specify DPARALLEL = -DTP_BASED.

To decide which task parallel library to use, you should set as follows: DTP = -DTP_CILK for Intel Cilk or DTP = -DTP_MYTH for MassiveThreads.

  • Reference

    Y. Nishioka, and K. Taura. "Scalable task-parallel SGD on matrix factorization in multicore architectures." Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International. paper

How to use

Just make and run!

$ make
$ ./mf train config.json

About

Collections of parallel optimization on matrix factorization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published