diff --git a/NEWS.md b/NEWS.md index bca26ac7f2..9e902a4689 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,8 @@ # What's New +## Update January 13, 2022 + +We have a new release [Recommenders 1.0.0](https://github.com/microsoft/recommenders/releases/tag/1.0.0)! The codebase has now migrated to TensorFlow versions 2.6 / 2.7 and to Spark version 3. In addition, there are a few changes in the dependencies and extras installed by `pip` (see [this guide](recommenders/README.md#optional-dependencies)). We have also made improvements in the code and the CI / CD pipelines. ## Update September 27, 2021 @@ -13,7 +16,6 @@ We have also added new evaluation metrics: _novelty, serendipity, diversity and Code coverage reports are now generated for every PR, using [Codecov](https://about.codecov.io/). - ## Update June 21, 2021 We have a new release [Recommenders 0.6.0](https://github.com/microsoft/recommenders/releases/tag/0.6.0)! diff --git a/README.md b/README.md index e5d3389dac..626cfd797f 100644 --- a/README.md +++ b/README.md @@ -2,18 +2,15 @@ [![Documentation Status](https://readthedocs.org/projects/microsoft-recommenders/badge/?version=latest)](https://microsoft-recommenders.readthedocs.io/en/latest/?badge=latest) -## What's New (September 27, 2021) +## What's New (January 13, 2022) -We have a new release [Recommenders 0.7.0](https://github.com/microsoft/recommenders/releases/tag/0.7.0)! +We have a new release [Recommenders 1.0.0](https://github.com/microsoft/recommenders/releases/tag/1.0.0)! The codebase has now migrated to TensorFlow versions 2.6 / 2.7 and to Spark version 3. In addition, there are a few changes in the dependencies and extras installed by `pip` (see [this guide](recommenders/README.md#optional-dependencies)). We have also made improvements in the code and the CI / CD pipelines. -In this, we have changed the names of the folders which contain the source code, so that they are more informative. This implies that you will need to change any import statements that reference the recommenders package. Specifically, the folder `reco_utils` has been renamed to `recommenders` and its subfolders have been renamed according to [issue 1390](https://github.com/microsoft/recommenders/issues/1390). +Starting with release 0.6.0, Recommenders has been available on PyPI and can be installed using pip! -The recommenders package now supports three types of environments: [venv](https://docs.python.org/3/library/venv.html), [virtualenv](https://virtualenv.pypa.io/en/latest/index.html#) and [conda](https://docs.conda.io/projects/conda/en/latest/glossary.html?highlight=environment#conda-environment) with Python versions 3.6 and 3.7. - -We have also added new evaluation metrics: _novelty, serendipity, diversity and coverage_ (see the [evalution notebooks](examples/03_evaluate/README.md)). - -Code coverage reports are now generated for every PR, using [Codecov](https://about.codecov.io/). +Here you can find the PyPi page: https://pypi.org/project/recommenders/ +Here you can find the package documentation: https://microsoft-recommenders.readthedocs.io/en/latest/ ## Introduction @@ -40,41 +37,51 @@ and currently does not support version 3.8 and above. It is recommended to insta To set up on your local machine: -To install core utilities, CPU-based algorithms, and dependencies: +* To install core utilities, CPU-based algorithms, and dependencies: + + 1. Ensure software required for compilation and Python libraries + is installed. + + + On Linux this can be supported by adding: + + ```bash + sudo apt-get install -y build-essential libpython + ``` + + where `` should be `3.6` or `3.7` as appropriate. -1. Ensure software required for compilation and Python libraries is installed. On Linux this can be supported by adding: -```bash -sudo apt-get install -y build-essential libpython -``` -where `` should be `3.6` or `3.7` as appropriate. + + On Windows you will need [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/). -On Windows you will need [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/). - -2. Create a conda or virtual environment. See the [setup guide](SETUP.md) for more details. + 2. Create a conda or virtual environment. See the + [setup guide](SETUP.md) for more details. -3. Within the created environment, install the package from [PyPI](https://pypi.org): + 3. Within the created environment, install the package from + [PyPI](https://pypi.org): -```bash -pip install --upgrade pip -pip install --upgrade setuptools -pip install recommenders[examples] -``` + ```bash + pip install --upgrade pip + pip install --upgrade setuptools + pip install recommenders[examples] + ``` -4. Register your (conda or virtual) environment with Jupyter: + 4. Register your (conda or virtual) environment with Jupyter: -```bash -python -m ipykernel install --user --name my_environment_name --display-name "Python (reco)" -``` + ```bash + python -m ipykernel install --user --name my_environment_name --display-name "Python (reco)" + ``` -5. Start the Jupyter notebook server + 5. Start the Jupyter notebook server -```bash -jupyter notebook -``` + ```bash + jupyter notebook + ``` -6. Run the [SAR Python CPU MovieLens](examples/00_quick_start/sar_movielens.ipynb) notebook under the `00_quick_start` folder. Make sure to change the kernel to "Python (reco)". + 6. Run the [SAR Python CPU MovieLens](examples/00_quick_start/sar_movielens.ipynb) + notebook under the `00_quick_start` folder. Make sure to + change the kernel to "Python (reco)". -For additional options to install the package (support for GPU, Spark etc.) see [this guide](recommenders/README.md). +* For additional options to install the package (support for GPU, + Spark etc.) see [this guide](recommenders/README.md). **NOTE** - The [Alternating Least Squares (ALS)](examples/00_quick_start/als_movielens.ipynb) notebooks require a PySpark environment to run. Please follow the steps in the [setup guide](SETUP.md#dependencies-setup) to run these notebooks in a PySpark environment. For the deep learning algorithms, it is recommended to use a GPU machine and to follow the steps in the [setup guide](SETUP.md#dependencies-setup) to set up Nvidia libraries. diff --git a/SETUP.md b/SETUP.md index 07694b6d17..3fa17a39ee 100644 --- a/SETUP.md +++ b/SETUP.md @@ -6,7 +6,6 @@ This document describes how to setup all the dependencies to run the notebooks i * [Azure Databricks](https://azure.microsoft.com/en-us/services/databricks/) * Docker container - ## Table of Contents - [Compute environments](#compute-environments) @@ -397,7 +396,7 @@ You can then open the Jupyter notebook server at http://localhost:8888 The process of making a new release and publishing it to pypi is as follows: -First make sure that the tag that you want to add, e.g. `0.6.0`, is added in [recommenders.py/__init__.py](recommenders.py/__init__.py). Follow the [contribution guideline](CONTRIBUTING.md) to add the change. +First make sure that the tag that you want to add, e.g. `0.6.0`, is added in [`recommenders.py/__init__.py`](recommenders.py/__init__.py). Follow the [contribution guideline](CONTRIBUTING.md) to add the change. 1. Make sure that the code in main passes all the tests (unit and nightly tests). 1. Create a tag with the version number: e.g. `git tag -a 0.6.0 -m "Recommenders 0.6.0"`. @@ -406,4 +405,5 @@ First make sure that the tag that you want to add, e.g. `0.6.0`, is added in [re generates a wheel and a tar.gz which are uploaded to a [GitHub draft release](https://github.com/microsoft/recommenders/releases). 1. Fill up the draft release with all the recent changes in the code. 1. Download the wheel and tar.gz locally, these files shouldn't have any bug, since they passed all the tests. +1. Install twine: `pip install twine` 1. Publish the wheel and tar.gz to pypi: `twine upload recommenders*` diff --git a/docs/README.md b/docs/README.md index 024f6aabdd..e4f49c4744 100644 --- a/docs/README.md +++ b/docs/README.md @@ -6,7 +6,8 @@ To setup the documentation, first you need to install the dependencies of the fu conda activate reco_full pip install numpy cython - pip install --no-binary scikit-surprise .[all,experimental] + pip install --no-binary scikit-surprise "scikit-surprise@https://github.com/NicolasHug/Surprise/archive/refs/tags/v1.1.1.tar.gz" + pip install "pymanopt@https://github.com/pymanopt/pymanopt/archive/fb36a272cdeecb21992cfd9271eb82baafeb316d.zip" pip install sphinx_rtd_theme diff --git a/examples/00_quick_start/tfidf_covid.ipynb b/examples/00_quick_start/tfidf_covid.ipynb index 25eda952da..d9a9274753 100644 --- a/examples/00_quick_start/tfidf_covid.ipynb +++ b/examples/00_quick_start/tfidf_covid.ipynb @@ -16,7 +16,7 @@ "# TF-IDF Content-Based Recommendation on the COVID-19 Open Research Dataset\n", "This demonstrates a simple implementation of Term Frequency Inverse Document Frequency (TF-IDF) content-based recommendation on the [COVID-19 Open Research Dataset](https://azure.microsoft.com/en-us/services/open-datasets/catalog/covid-19-open-research/), hosted through Azure Open Datasets.\n", "\n", - "In this notebook, we will create a recommender which will return the top k recommended articles similar to any article of interest (query item) in the COVID-19 Open Reserach Dataset." + "In this notebook, we will create a recommender which will return the top k recommended articles similar to any article of interest (query item) in the COVID-19 Open Research Dataset." ] }, { @@ -1229,4 +1229,4 @@ }, "nbformat": 4, "nbformat_minor": 2 -} \ No newline at end of file +} diff --git a/recommenders/README.md b/recommenders/README.md index df64ff94c2..4bd5fdedf2 100644 --- a/recommenders/README.md +++ b/recommenders/README.md @@ -35,7 +35,7 @@ By default `recommenders` does not install all dependencies used throughout the - experimental: current experimental dependencies that are being evaluated (e.g. libraries that require advanced build requirements or might conflict with libraries from other options) - nni: dependencies for NNI tuning framework. -Note that, currently, xLearn, Surprise and Vowpal Wabbit are in the experimental group. +Note that, currently, xLearn and Vowpal Wabbit are in the experimental group. These groups can be installed alone or in combination: ```bash @@ -64,10 +64,16 @@ When installing with GPU support you will need to point to the PyTorch index to We are currently evaluating inclusion of the following dependencies: - - scikit-surprise: due to incompatibilities with `numpy <= 1.19`, proper installation of Surprise requires `pip install numpy cython` and `pip install --no-binary scikit-surprise recommenders[experimental]` - vowpalwabbit: current examples show how to use vowpal wabbit after it has been installed on the command line; using the [PyPI package](https://pypi.org/project/vowpalwabbit/) with the scikit-learn interface will facilitate easier integration into python environments - xlearn: on some platforms, xLearn requires pre-installation of cmake. +## Other dependencies + +Some dependencies are not available via the recommenders PyPI package, but can be installed in the following ways: + - scikit-surprise: due to incompatibilities with `numpy <= 1.19`, proper installation of Surprise requires `pip install numpy cython` and `pip install --no-binary scikit-surprise "scikit-surprise@https://github.com/NicolasHug/Surprise/archive/refs/tags/v1.1.1.tar.gz"` + - pymanopt: this dependency is required for the RLRMC and GeoIMC algorithms; a version of this code compatible with TensorFlow 2 can be + installed with `pip install "pymanopt@https://github.com/pymanopt/pymanopt/archive/fb36a272cdeecb21992cfd9271eb82baafeb316d.zip"`. + ## NNI dependencies For NNI a more recent version can be installed but is untested. diff --git a/setup.py b/setup.py index 0b791cf59d..15b7a08aae 100644 --- a/setup.py +++ b/setup.py @@ -39,8 +39,6 @@ "memory_profiler>=0.54.0,<1", "nltk>=3.4,<4", "pydocumentdb>=2.3.3<3", # TODO: replace with azure-cosmos - # Temporary fix for pymanopt, only this commit works with TF2 - "pymanopt@https://github.com/pymanopt/pymanopt/archive/fb36a272cdeecb21992cfd9271eb82baafeb316d.zip", "seaborn>=0.8.1,<1", "transformers>=2.5.0,<5", "bottleneck>=1.2.1,<2", @@ -93,9 +91,6 @@ extras_require["experimental"] = [ # xlearn requires cmake to be pre-installed "xlearn==0.40a1", - # Surprise needs to be built from source because of the numpy <= 1.19 incompatibility - # Requires pip to be run with the --no-binary option - "scikit-surprise@https://github.com/NicolasHug/Surprise/archive/refs/tags/v1.1.1.tar.gz", # VW C++ binary needs to be installed manually for some code to work "vowpalwabbit>=8.9.0,<9", ] @@ -104,6 +99,12 @@ "nni==1.5", ] +# The following dependencies can be installed as below, however PyPI does not allow direct URLs. +# Surprise needs to be built from source because of the numpy <= 1.19 incompatibility +# Requires pip to be run with the --no-binary option +# "scikit-surprise@https://github.com/NicolasHug/Surprise/archive/refs/tags/v1.1.1.tar.gz", +# Temporary fix for pymanopt, only this commit works with TF2 +# "pymanopt@https://github.com/pymanopt/pymanopt/archive/fb36a272cdeecb21992cfd9271eb82baafeb316d.zip", setup( name="recommenders", diff --git a/tests/ci/azure_pipeline_test/dsvm_nightly_linux_cpu.yml b/tests/ci/azure_pipeline_test/dsvm_nightly_linux_cpu.yml index 4657e373c9..c1182a5d7e 100644 --- a/tests/ci/azure_pipeline_test/dsvm_nightly_linux_cpu.yml +++ b/tests/ci/azure_pipeline_test/dsvm_nightly_linux_cpu.yml @@ -33,6 +33,6 @@ extends: timeout: 180 conda_env: "nightly_linux_cpu" conda_opts: "python=3.6" - pip_opts: "[examples,dev,experimental] --no-cache --no-binary scikit-surprise" + pip_opts: "[examples,dev,experimental] 'scikit-surprise@https://github.com/NicolasHug/Surprise/archive/refs/tags/v1.1.1.tar.gz' 'pymanopt@https://github.com/pymanopt/pymanopt/archive/fb36a272cdeecb21992cfd9271eb82baafeb316d.zip' --no-cache --no-binary scikit-surprise" pytest_markers: "not spark and not gpu" pytest_params: "-x" diff --git a/tests/ci/azure_pipeline_test/dsvm_notebook_linux_cpu.yml b/tests/ci/azure_pipeline_test/dsvm_notebook_linux_cpu.yml index 0ea6cefeda..d1efb71754 100644 --- a/tests/ci/azure_pipeline_test/dsvm_notebook_linux_cpu.yml +++ b/tests/ci/azure_pipeline_test/dsvm_notebook_linux_cpu.yml @@ -60,5 +60,5 @@ extends: task_name: "Test - Unit Notebook Linux CPU" conda_env: "unit_notebook_linux_cpu" conda_opts: "python=3.6" - pip_opts: "[examples,dev,experimental] --no-cache --no-binary scikit-surprise" + pip_opts: "[examples,dev,experimental] 'scikit-surprise@https://github.com/NicolasHug/Surprise/archive/refs/tags/v1.1.1.tar.gz' 'pymanopt@https://github.com/pymanopt/pymanopt/archive/fb36a272cdeecb21992cfd9271eb82baafeb316d.zip' --no-cache --no-binary scikit-surprise" pytest_markers: "notebooks and not spark and not gpu" diff --git a/tests/ci/azure_pipeline_test/dsvm_unit_linux_cpu.yml b/tests/ci/azure_pipeline_test/dsvm_unit_linux_cpu.yml index 9d7ea00a3e..f5c4057f29 100644 --- a/tests/ci/azure_pipeline_test/dsvm_unit_linux_cpu.yml +++ b/tests/ci/azure_pipeline_test/dsvm_unit_linux_cpu.yml @@ -60,5 +60,5 @@ extends: task_name: "Test - Unit Linux CPU" conda_env: "unit_linux_cpu" conda_opts: "python=3.6" - pip_opts: "[dev,experimental] --no-cache --no-binary scikit-surprise" + pip_opts: "[dev,experimental] 'scikit-surprise@https://github.com/NicolasHug/Surprise/archive/refs/tags/v1.1.1.tar.gz' 'pymanopt@https://github.com/pymanopt/pymanopt/archive/fb36a272cdeecb21992cfd9271eb82baafeb316d.zip' --no-cache --no-binary scikit-surprise" pytest_markers: "not notebooks and not spark and not gpu" diff --git a/tests/integration/examples/test_notebooks_python.py b/tests/integration/examples/test_notebooks_python.py index 0bf3fbbdb1..331eeed9e2 100644 --- a/tests/integration/examples/test_notebooks_python.py +++ b/tests/integration/examples/test_notebooks_python.py @@ -236,6 +236,7 @@ def test_cornac_bpr_integration( @pytest.mark.integration +@pytest.mark.experimental @pytest.mark.parametrize( "expected_values", [({"rmse": 0.4969, "mae": 0.4761})], diff --git a/tests/unit/examples/test_notebooks_python.py b/tests/unit/examples/test_notebooks_python.py index e80bd2b224..cb74cd5b3e 100644 --- a/tests/unit/examples/test_notebooks_python.py +++ b/tests/unit/examples/test_notebooks_python.py @@ -103,6 +103,7 @@ def test_wikidata_runs(notebooks, output_notebook, kernel_name, tmp): ) +@pytest.mark.experimental @pytest.mark.notebooks def test_rlrmc_quickstart_runs(notebooks, output_notebook, kernel_name): notebook_path = notebooks["rlrmc_quickstart"] diff --git a/tests/unit/recommenders/models/test_geoimc.py b/tests/unit/recommenders/models/test_geoimc.py index 6408b661be..d41ab8d3b1 100644 --- a/tests/unit/recommenders/models/test_geoimc.py +++ b/tests/unit/recommenders/models/test_geoimc.py @@ -1,20 +1,23 @@ # Copyright (c) Microsoft Corporation. All rights reserved. # Licensed under the MIT License. -import collections -import pytest -import numpy as np -from scipy.sparse import csr_matrix - -from recommenders.models.geoimc.geoimc_data import DataPtr -from recommenders.models.geoimc.geoimc_predict import Inferer -from recommenders.models.geoimc.geoimc_algorithm import IMCProblem -from recommenders.models.geoimc.geoimc_utils import ( - length_normalize, - mean_center, - reduce_dims, -) -from pymanopt.manifolds import Stiefel, SymmetricPositiveDefinite +try: + import collections + import pytest + import numpy as np + from scipy.sparse import csr_matrix + + from recommenders.models.geoimc.geoimc_data import DataPtr + from recommenders.models.geoimc.geoimc_predict import Inferer + from recommenders.models.geoimc.geoimc_algorithm import IMCProblem + from recommenders.models.geoimc.geoimc_utils import ( + length_normalize, + mean_center, + reduce_dims, + ) + from pymanopt.manifolds import Stiefel, SymmetricPositiveDefinite +except: + pass # skip if pymanopt not installed _IMC_TEST_DATA = [ ( @@ -35,6 +38,7 @@ # `geoimc_data` tests +@pytest.mark.experimental @pytest.mark.parametrize("data, entities", _IMC_TEST_DATA) def test_dataptr(data, entities): ptr = DataPtr(data, entities) @@ -44,6 +48,7 @@ def test_dataptr(data, entities): # `geoimc_utils` tests +@pytest.mark.experimental @pytest.mark.parametrize( "matrix", [ @@ -59,6 +64,7 @@ def test_length_normalize(matrix): ) +@pytest.mark.experimental @pytest.mark.parametrize( "matrix", [ @@ -73,12 +79,14 @@ def test_mean_center(matrix): ) +@pytest.mark.experimental def test_reduce_dims(): matrix = np.random.rand(100, 100) assert reduce_dims(matrix, 50).shape[1] == 50 # `geoimc_algorithm` tests +@pytest.mark.experimental @pytest.mark.parametrize( "dataPtr, rank", [ @@ -86,6 +94,7 @@ def test_reduce_dims(): (DataPtr(_IMC_TEST_DATA[1][0], _IMC_TEST_DATA[1][1]), 3), ], ) +@pytest.mark.experimental def test_imcproblem(dataPtr, rank): # Test init @@ -110,10 +119,12 @@ def test_imcproblem(dataPtr, rank): # `geoimc_predict` tests +@pytest.mark.experimental def test_inferer_init(): assert Inferer(method="dot").method.__name__ == "PlainScalarProduct" +@pytest.mark.experimental @pytest.mark.parametrize( "dataPtr", [ diff --git a/tests/unit/recommenders/models/test_surprise_utils.py b/tests/unit/recommenders/models/test_surprise_utils.py index 879104f7a0..b363d4da38 100644 --- a/tests/unit/recommenders/models/test_surprise_utils.py +++ b/tests/unit/recommenders/models/test_surprise_utils.py @@ -17,7 +17,7 @@ compute_ranking_predictions, ) except: - pass # skip if experimental not installed + pass # skip if surprise not installed TOL = 0.001