GitHub - c-koster/movielens-wikidata

📽 Movie Recommendation with item features from the Wikidata Knowledge Graph

Lately I have been spending a lot of time working with the Wikidata knowledge graph. Wikidata is an open knowledge base that contains structured data about entities, incluing attributes and relationships between concepts, products, books, places, events, etc.

Movies are great examples of linkable entities—almost all of the titles in the Movielens 100k dataset are linkable to a QID in wikidata. For example, the movie Fargo (1996) can be mapped (or linked) to the wikidata ID Q222720, and facts about the movie can be found at this url.

from recommenders.datasets.wikidata import find_wikidata_id
find_wikidata_id('Fargo (1996)') 
# gets us Q222720. find some facts about it at: https://www.wikidata.org/wiki/Q222720

My goal for this project was to extract movie item features from Wikidata and see if this knowledge would improve recommendation performance. Specifically, this repo contains an extraction script for movie features and a training script which uses these features to improve ranking for a hybrid recommender system.

extract.ipynb extracts movie features in two steps. first it searches the movie title for a matching wikidata entry. Next, using the wikidata query service, it extracts the following features from the wikidata knowledge base using SPARQL queries:
- MPA rating
- Bechdel and Mako Mori tests (passes/fails)
- rotten tomatoes score ranging from 0.00 to 1.00
- movie duration in minutes
- country of origin (there are sometimes multiple) delimited with |
- count of academy awards
- count of academy award nominations
train.ipynb includes a hybrid matrix factorization model. I picked LightFM as it is easy to add custom features.
data contains the csv file that comes out of the extract notebook.

Results

Results are listed in a table below.

model	description	precision	recall
baseline	LightFM model with no user or item features	0.148	0.045
grouplens	LightFM model with user and item features from grouplens	0.163	0.062
wiki	LightFM model with grouplens user/item features and item features extracted from wikidata	0.171	0.064

Notes

I only had 1682 movies in the 100k dataset and it was infeasible to include actors/directors as features (if I made a column for each actor and director I would have had many more features than movies).
While writing queries, i focused on extracting features that might be useful for a US audience (e.g. MPA ratings, nominations & award counts from a US-based awards ceremony). This is because most users in the grouplens dataset have american postal codes.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
README.md		README.md
extract.ipynb		extract.ipynb
train.ipynb		train.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📽 Movie Recommendation with item features from the Wikidata Knowledge Graph

Contents

Results

Notes

About

Releases

Packages

Languages

c-koster/movielens-wikidata

Folders and files

Latest commit

History

Repository files navigation

📽 Movie Recommendation with item features from the Wikidata Knowledge Graph

Contents

Results

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages