The motivation of this project is to build a recommendation engine that is Easy to use. This python package has been implemented using Python 3.6
config
: Contains all files related to settings of the projectdataset
: Contains code that will help download and pre-process datasetsmodels
: Contains code that allow us to define our recommendation algorithmtransforms
: Contains code that allow us to perform various transformations on dataset. E.g. Stopword removal
- Create virutal environment
virtualenv ~/env/recommenendations -p /usr/local/bin/python3.6
{Note: You may want to change path to your Python 3.6 binary} - Activate virtual env
source ~/env/recommendations/bin/activate
- Clone this repository
- Install dependencies
pip install -r requirements
from models.content import CountBased
from datasets.content import NewsDataset
# Sample only first 200 records
dataset = NewsDataset(200)
# Create a recommendation engine, train it with our data
recommender = CountBased()
recommender.train(dataset.get_instances())
print("Total dimensions of features:", len(recommender.transform.vocabulary))
# Get recommendation for Document with ID 182
recommender.predict(182)
# Disk persistance is also supported
recommender.save_to_disk()
# Load from disk all pre-computed value
recommender.load_from_disk()
To run benchmarking tools use python tools/benchmark_time_required.py
this should generate a file {or append to already file} called time_required.log
- Collaborative Filtering
- Content Based
- Graph Based
The code that has been written has been based off on the MoveLens dataset. Few resources that we've used for building this include Collaborative Filtering Recommendation System and Programming Collective Intelligence