Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large refactor #1086

Merged
merged 67 commits into from
Jun 29, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
c47fe4d
rename and refactor :boom:
miguelgfierro Apr 16, 2020
5551d4a
rename and refactor :boom:
miguelgfierro Apr 16, 2020
f6b0453
refact
miguelgfierro Apr 27, 2020
6db4148
scenarios
miguelgfierro Apr 27, 2020
485c1f1
retail
miguelgfierro Apr 27, 2020
02bec05
retail
miguelgfierro Apr 27, 2020
9c35983
retail
miguelgfierro Apr 27, 2020
3e3756c
retail
miguelgfierro Apr 27, 2020
7864b8e
retail
miguelgfierro Apr 27, 2020
44772c7
comments @yueguoguo
miguelgfierro May 21, 2020
d79f878
Merge branch 'staging' into miguel/burn_and_destroy
miguelgfierro May 21, 2020
c6c20c5
Merge branch 'staging' into miguel/burn_and_destroy
miguelgfierro May 28, 2020
5db328f
advance
miguelgfierro Jun 4, 2020
8eb19fa
advance
miguelgfierro Jun 4, 2020
f3ddcae
advance
miguelgfierro Jun 4, 2020
f01dcb6
review
miguelgfierro Jun 4, 2020
c1baf1e
Merge branch 'staging' into miguel/burn_and_destroy
miguelgfierro Jun 11, 2020
1e78d52
scenarios
miguelgfierro Jun 11, 2020
60d9587
structure change
miguelgfierro Jun 11, 2020
7f44a9d
glossary
miguelgfierro Jun 11, 2020
61923c7
:boom:
miguelgfierro Jun 11, 2020
78986b7
readme
miguelgfierro Jun 12, 2020
36ed9e6
rewrite of retail readme for readability.
Jun 14, 2020
5b007f7
format
Jun 14, 2020
e1a5f51
glossary
miguelgfierro Jun 15, 2020
33c6e5e
:doc:
miguelgfierro Jun 15, 2020
40560c3
:doc:
miguelgfierro Jun 15, 2020
65dd13c
:doc:
miguelgfierro Jun 15, 2020
573e004
Update README.md
wutaomsft Jun 15, 2020
f42e8f5
wip
miguelgfierro Jun 15, 2020
63930e5
Merge branch 'miguel/burn_and_destroy' of github.com:microsoft/recomm…
miguelgfierro Jun 15, 2020
a442096
glossary
miguelgfierro Jun 15, 2020
47f9d25
glossary
miguelgfierro Jun 15, 2020
f572dcf
kg
miguelgfierro Jun 16, 2020
4930065
fix links
miguelgfierro Jun 16, 2020
97672a9
readme
miguelgfierro Jun 16, 2020
f022427
fix paths
miguelgfierro Jun 16, 2020
a46b18f
fix paths
miguelgfierro Jun 16, 2020
f156c0b
fix paths
miguelgfierro Jun 16, 2020
4f77506
rename
miguelgfierro Jun 16, 2020
5dd3f68
fix :bug: and paths
miguelgfierro Jun 16, 2020
123a737
tests
miguelgfierro Jun 17, 2020
14d7c50
fixing tests
miguelgfierro Jun 17, 2020
1ee91fa
:bug:
miguelgfierro Jun 17, 2020
d90c9a3
:bug:
miguelgfierro Jun 18, 2020
39705c1
typo
miguelgfierro Jun 18, 2020
e1bbd2a
fix :bug: test lightfm
miguelgfierro Jun 19, 2020
9d7c661
papers
miguelgfierro Jun 19, 2020
44b4843
papers
miguelgfierro Jun 19, 2020
da7cdbf
typo
miguelgfierro Jun 19, 2020
a6e441e
fixed :bug: with pymanopt
miguelgfierro Jun 22, 2020
57b0c8a
long tail
miguelgfierro Jun 22, 2020
c0185c1
spark
miguelgfierro Jun 22, 2020
b0f8a59
ignore
miguelgfierro Jun 22, 2020
841fc49
mmlspark lgb criteo
miguelgfierro Jun 22, 2020
871ef72
:bug:
miguelgfierro Jun 22, 2020
1881066
java8
miguelgfierro Jun 22, 2020
24b6ba9
benchmark
miguelgfierro Jun 22, 2020
4e9263a
retail
miguelgfierro Jun 23, 2020
16baaed
spark 2.4.3
miguelgfierro Jun 23, 2020
fd1eb0b
Update README.md
anargyri Jun 25, 2020
a281478
lightgcn
miguelgfierro Jun 25, 2020
d4a5244
fix :bug: in readme
miguelgfierro Jun 25, 2020
845964a
readms
miguelgfierro Jun 25, 2020
d5ae933
update authors
miguelgfierro Jun 25, 2020
f4c1f4d
merge staging
miguelgfierro Jun 29, 2020
930427f
:bug:
miguelgfierro Jun 29, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ reco_*.yaml
*.dat
*.csv
*.zip
*.7z
.vscode/
u.item
ml-100k/
Expand All @@ -150,7 +151,8 @@ ml-20m/
*.ckpt*
*.png
*.jpg
*.gif
*.jpeg
*.gif
*.model
*.mml
*.mml
nohup.out
19 changes: 10 additions & 9 deletions AUTHORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,10 @@ They have admin access to the repo and provide support reviewing issues and pull
* Reco utils metrics computations
* Tests for Surprise
* Model selection notebooks (AzureML for SVD, NNI)
* **[Jeremy Reynolds](https://github.com/jreynolds01)**
* Reference architecture
* **[Jianxun Lian](https://github.com/Leavingseason)**
* xDeepFM algorithm
* DKN algorithm
* Review, development and optimization of MSRA algorithms.
* **[Jun Ki Min](https://github.com/loomlike)**
* ALS notebook
* Wide & Deep algorithm
Expand All @@ -27,10 +29,6 @@ They have admin access to the repo and provide support reviewing issues and pull
* Reco utils review, development and optimization.
* Github statistics.
* Continuous integration build / test setup.
* **[Nikhil Joglekar](https://github.com/nikhilrj)**
* Improving documentation
* Quick start notebook
* Operationalization notebook
* **[Scott Graham](https://github.com/gramhagen)**
* Improving documentation
* VW notebook
Expand Down Expand Up @@ -61,9 +59,8 @@ To contributors: please add your name to the list when you submit a patch to the
* Spark optimization and support
* **[Heather Spetalnick (Shapiro)](https://github.com/heatherbshapiro)**
* AzureML documentation and support
* **[Jianxun Lian](https://github.com/Leavingseason)**
* xDeepFM algorithm
* DKN algorithm
* **[Jeremy Reynolds](https://github.com/jreynolds01)**
* Reference architecture
* **[Markus Cozowicz](https://github.com/eisber)**
* SAR improvements on Spark
* **[Max Kaznady](https://github.com/maxkazmsft)**
Expand All @@ -76,6 +73,10 @@ To contributors: please add your name to the list when you submit a patch to the
* Restricted Boltzmann Machine algorithm
* **[Nicolas Hug](https://github.com/NicolasHug)**
* Jupyter notebook demonstrating the use of [Surprise](https://github.com/NicolasHug/Surprise) library for recommendations
* **[Nikhil Joglekar](https://github.com/nikhilrj)**
* Improving documentation
* Quick start notebook
* Operationalization notebook
* **[Pratik Jawanpuria](https://github.com/pratikjawanpuria)**
* RLRMC algorithm
* **[Qi Wan](https://github.com/Qcactus)**
Expand Down
58 changes: 58 additions & 0 deletions GLOSSARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Glossary

* A/B testing: Methodology to evaluate the performance of a system in production. In the context of Recommendation Systems it is used to measure a machine learning model performance in real-time. It works by randomizing an environment response into two groups A and B, typically half of the traffic goes to the machine learning model output and the other half is left without model. By comparing the metrics from A and B branches, it is possible to evaluate whether it is beneficial the use of the model or not. A test with more than two groups it is named Multi-Variate Test.

* Click-through rate (CTR): Ratio of the number of users who click on a link over the total number of users that visited the page. CTR is a measure of the user engagement.

* Cold-start problem: The cold start problem concerns the recommendations for users with no or few past history (new users). Providing recommendations to users with small past history becomes a difficult problem for collaborative filtering models because their learning and predictive ability is limited. Multiple research have been conducted in this direction using content-based filtering models or hybrid models. These models use auxiliary information like user or item metadata to overcome the cold start problem.

* Collaborative filtering algorithms (CF): CF algorithms make prediction of what is the likelihood of a user selecting an item based on the behavior of other users [1]. It assumes that if user A likes item X and Y, and user B likes item X, user B would probably like item Y. See the [list of CF examples in Recommenders repository](../../examples/02_model_collaborative_filtering).

* Content-based filtering algorithms (CB): CB algorithms make prediction of what is the likelihood of a user selecting an item based on the similarity of users and items among themselves [1]. It assumes that if user A lives in country X, has age Y and likes item Z, and user B lives in country X and has age Y, user B would probably like item Z. See the [list of CB examples in Recommenders repository](../../examples/02_model_content_based_filtering).

* Conversion rate: In the context of e-commerce, the conversion rate is the ratio between the number of conversions (e.g. number of bought items) over the total number of visits. In the context of recommendation systems, conversion rate measures how efficient is an algorithm to provide recommendations that the user buys.

* Diversity metrics: In the context of Recommendation Systems, diversity applies to a set of items, and is related to how different the items are with respect to each other [4].

* Explicit interaction data: When a user explicitly rate an item, typically between 1-5, the user is giving a value on the likeliness of the item.

* Hybrid filtering algorithms: This type of recommendation system can implement a combination of collaborative and content-based filtering models. See the [list of examples in Recommenders repository](../../examples/02_model_hybrid).

* Implicit interaction data: Implicit interactions are views or clicks that show a certain interest of the user about a specific items. These kind of data is more common but it doesn't define the intention of the user as clearly as the explicit data.

* Item information: These include information about the item, some examples can be name, description, price, etc.

* Knowledge graph algorithms: A knowledge graph algorithm is the one that uses knowledge graph data. In comparison with standard algorithms, it allows to explore graph's latent connections and improve the precision of results; the various relations in the graph can extend users' interest and increase the diversity of recommended items; also, these algorithms bring explainability to recommendation systems [5].

* Knowledge graph data: A knowledge graph is a directed heterogeneous graph in which nodes correspond to entities (items or item attributes) and edges correspond to relations [5].

* Long tail items: Typically, the item interaction distribution has the form of long tail, where items in the tail have a small number of interactions, corresponding to unpopular items, and items in the head have a large number of interactions [1,2]. From the algorithmic point of view, items in the tail suffer from the cold-start problem, making them hard for recommendation systems to use. However, from the business point of view, the items in the tail can be highly profitable, since these items are less popular, business can apply a higher margin to them. Recommendation systems that optimize metrics like novelty and diversity, can help to find users willing to get these long tail items.

* Multi-Variate Test (MVT): Methodology to evaluate the performance of a system in production. It is similar to A/B testing, with the difference that instead of having two test groups, MVT has multiples groups.

* Novelty metrics: In Recommendation Systems, the novelty of a piece of information generally refers to how different it is with respect to "what has been previously seen" [4].

* Online metrics: Also named business metrics. They are the metrics computed online that reflect how the Recommendation System is helping the business to improve user engagement or revenue. These metrics include CTR, conversion rate, etc.

* Offline metrics: Metrics computed offline for measuring the performance of the machine learning model. These metrics include ranking, rating, diversity and novelty metrics.

* Ranking metrics: These are used to evaluate how relevant recommendations are for users. They include precision at k, recall at k, nDCG and MAP. See the [list of metrics in Recommenders repository](../../examples/03_evaluate).

* Rating metrics: These are used to evaluate how accurate a recommender is at predicting ratings that users give to items. They include RMSE, MAE, R squared or explained variance. See the [list of metrics in Recommenders repository](../../examples/03_evaluate).

* Revenue per order: The revenue per order optimization objective is the default optimization objective for the "Frequently bought together" recommendation model type. This optimization objective cannot be specified for any other recommendation model type.

* User information: These include all information that define the user, some examples can be name, address, email, demographics, etc.

## References and resources

[1] Aggarwal, Charu C. "Recommender systems". Vol. 1. Cham: Springer International Publishing, 2016.
[2]. Park, Yoon-Joo, and Tuzhilin, Alexander. "The long tail of recommender systems and how to leverage it." In Proceedings of the 2008 ACM conference on Recommender systems, pp. 11-18. 2008. [Link to paper](http://people.stern.nyu.edu/atuzhili/pdf/Park-Tuzhilin-RecSys08-final.pdf).
[3]. Armstrong, Robert. "The long tail: Why the future of business is selling less of more." Canadian Journal of Communication 33, no. 1 (2008). [Link to paper](https://www.cjc-online.ca/index.php/journal/article/view/1946/3141).
[4] Castells, P., Vargas, S., and Wang, Jun. "Novelty and diversity metrics for recommender systems: choice, discovery and relevance." (2011). [Link to paper](https://repositorio.uam.es/bitstream/handle/10486/666094/novelty_castells_DDR_2011.pdf?sequence=1).
[5] Wang, Hongwei; Zhao, Miao; Xie, Xing; Li, Wenjie and Guo, Minyi. "Knowledge Graph Convolutional Networks for Recommender Systems". The World Wide Web Conference WWW'19. 2019. [Link to paper](https://arxiv.org/abs/1904.12575).





Loading