Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ediit READMEs based on Simon's feedback; sync dependencies in conda script #1515

Merged
merged 9 commits into from
Sep 7, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 17 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,16 +30,22 @@ For a more detailed overview of the repository, please see the documents on the

Please see the [setup guide](SETUP.md) for more details on setting up your machine locally, on a [data science virtual machine (DSVM)](https://azure.microsoft.com/en-gb/services/virtual-machines/data-science-virtual-machines/) or on [Azure Databricks](SETUP.md#setup-guide-for-azure-databricks).

The installation of the recommenders package has been tested with Python version 3.6. It is recommended to install the package and its dependencies inside a clean environment (such as [conda](https://docs.conda.io/projects/conda/en/latest/glossary.html?highlight=environment#conda-environment) or [venv](https://docs.python.org/3/library/venv.html)).
The installation of the recommenders package has been tested with
- Python version 3.6 and [venv](https://docs.python.org/3/library/venv.html)
- Python versions 3.6, 3.7 and [conda](https://docs.conda.io/projects/conda/en/latest/glossary.html?highlight=environment#conda-environment)

and currently does not support version 3.8 and above. It is recommended to install the package and its dependencies inside a clean environment (such as [conda](https://docs.conda.io/projects/conda/en/latest/glossary.html?highlight=environment#conda-environment) or [venv](https://docs.python.org/3/library/venv.html)).

To set up on your local machine:

To install core utilities, CPU-based algorithms, and dependencies:

1. Ensure software required for compilation is installed. On Linux this can be supported by adding build-essential dependencies:
1. Ensure software required for compilation and Python libraries is installed. On Linux this can be supported by adding:
```bash
sudo apt-get install -y build-essential
sudo apt-get install -y build-essential libpython<version>
```
where `<version>` should be `3.6` or `3.7` as appropriate.

On Windows you will need [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/).

2. Create a conda or virtual environment. See the [setup guide](SETUP.md) for more details.
Expand All @@ -48,8 +54,14 @@ On Windows you will need [Microsoft C++ Build Tools](https://visualstudio.micros

```bash
pip install --upgrade pip
pip install --upgrade setuptools
pip install recommenders[examples]
```
In the case of conda, you also need to
```bash
conda install numpy-base
```
after the pip installation.

4. Register your (conda or virtual) environment with Jupyter:

Expand All @@ -71,6 +83,8 @@ For additional options to install the package (support for GPU, Spark etc.) see

**NOTE for DSVM Users** - Please follow the steps in the [Dependencies setup - Set PySpark environment variables on Linux or MacOS](SETUP.md#dependencies-setup) and [Troubleshooting for the DSVM](SETUP.md#troubleshooting-for-the-dsvm) sections if you encounter any issue.

**DOCKER** - Another easy way to try the recommenders repository and get started quickly is to build [docker images](tools/docker/README.md) suitable for different environments.

## Algorithms

The table below lists the recommender algorithms currently available in the repository. Notebooks are linked under the Environment column when different implementations are available.
Expand Down
14 changes: 10 additions & 4 deletions SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,12 @@ Currently, this repository supports **Python CPU**, **Python GPU** and **PySpark

## Setup guide for Local or DSVM

There are different ways one may use the recommenders utilities. The most convenient one is probably by installing the `recommenders` package from [PyPI](https://pypi.org).

Another way is to build a docker image and use the functions inside a [docker container](#setup-guide-for-docker).

Another alternative is to run all the recommender utilities directly from a local copy of the source code. This requires installing all the necessary dependencies from Anaconda and PyPI. For instructions on how to do this, see [this guide](conda.md).

### Requirements

* A machine running Linux, MacOS or Windows
Expand All @@ -52,12 +58,12 @@ conda update conda -n root
conda update anaconda # use 'conda install anaconda' if the package is not installed
```

There are different ways one may use the recommenders utilities. The most convenient one is probably by installing the `recommenders` package from [PyPI](https://pypi.org). For instructions on how to do these, see [this guide](recommenders/README.md).

An alternative is to run all the recommender utilities directly from a local copy of the source code. This requires installing all the necessary dependencies from Anaconda and PyPI. For instructions on how to do this, see [this guide](conda.md)
If using venv, see [these instructions](#using-a-virtual-environment).

**NOTE** the `xlearn` package has dependency on `cmake`. If one uses the `xlearn` related notebooks or scripts, make sure `cmake` is installed in the system. The easiest way to install on Linux is with apt-get: `sudo apt-get install -y build-essential cmake`. Detailed instructions for installing `cmake` from source can be found [here](https://cmake.org/install/).

**NOTE** the models from Cornac require installation of `libpython` i.e. using `sudo apt-get install -y libpython3.6` or `libpython3.7`, depending on the version of Python.

**NOTE** PySpark v2.4.x requires Java version 8.

<details>
Expand Down Expand Up @@ -368,7 +374,7 @@ See guidelines in the Docker [README](tools/docker/README.md) for detailed instr

Example command to build and run Docker image with base CPU environment.
```{shell}
DOCKER_BUILDKIT=1 docker build -t recommenders:cpu --build-arg ENV="cpu" .
DOCKER_BUILDKIT=1 docker build -t recommenders:cpu --build-arg ENV="cpu" --build-arg VIRTUAL_ENV="conda" .
docker run -p 8888:8888 -d recommenders:cpu
```

Expand Down
5 changes: 5 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,8 @@ nvidia-ml-py3>=7.352.0
tensorflow-gpu>=1.15.0,<2
torch==1.2.0
fastai>=1.0.46,<2
databricks_cli>=0.8.6,<1
pyarrow>=0.8.0,<1.0.0
pyspark>=2.4.5,<3.0.0
cmake>=3.18.4.post1
xlearn==0.40a1
Comment on lines +35 to +39
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after this is merged to staging a new doc is going to be generated. I'm not sure if pyspark dep is going to break the docs, let's see how it goes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the compilation of docs and it seems to work.

14 changes: 10 additions & 4 deletions recommenders/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@ This package contains functions to simplify common tasks used when developing an
## Pre-requisites
Some dependencies require compilation during pip installation. On Linux this can be supported by adding build-essential dependencies:
```bash
sudo apt-get install -y build-essential
```
sudo apt-get install -y build-essential libpython<version>
```
where `<version>` should be `3.6` or `3.7` as appropriate.

On Windows you will need [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/)

Expand All @@ -21,12 +22,17 @@ To install core utilities, CPU-based algorithms, and dependencies
pip install --upgrade pip
pip install recommenders
```
In the case of conda, you also need to
```bash
conda install numpy-base
```
after the pip installation.

## Optional Dependencies

By default `recommenders` does not install all dependencies used throughout the code and the notebook examples in this repo. Instead we require a bare minimum set of dependencies needed to execute functionality in the `recommenders` package (excluding Spark and GPU functionality). We also allow the user to specify which groups of dependencies are needed at installation time (or later if updating the pip installation). The following groups are provided:
By default `recommenders` does not install all dependencies used throughout the code and the notebook examples in this repo. Instead we require a bare minimum set of dependencies needed to execute functionality in the `recommenders` package (excluding Spark, GPU and Jupyter functionality). We also allow the user to specify which groups of dependencies are needed at installation time (or later if updating the pip installation). The following groups are provided:

- examples: dependencies needed to run [example notebooks](https://github.com/microsoft/recommenders/tree/main/examples)
- examples: dependencies related to Jupyter needed to run [example notebooks](https://github.com/microsoft/recommenders/tree/main/examples)
- gpu: dependencies to enable GPU functionality (PyTorch & TensorFlow)
- spark: dependencies to enable Apache Spark functionality used in dataset, splitting, evaluation and certain algorithms
- xlearn: xLearn package (on some platforms it requires pre-installation of cmake)
Expand Down
29 changes: 13 additions & 16 deletions tools/generate_conda_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,40 +35,39 @@
CHANNELS = ["defaults", "conda-forge", "pytorch", "fastai"]

CONDA_BASE = {
"python": "python==3.6.11",
"python": "python>=3.6,<3.8",
"bottleneck": "bottleneck==1.2.1",
"dask": "dask>=0.17.1",
"fastparquet": "fastparquet>=0.1.6",
"cornac": "cornac>=1.11.0",
"ipykernel": "ipykernel>=4.6.1",
"jupyter": "jupyter>=1.0.0",
"lightfm": "lightfm==1.15",
"lightgbm": "lightgbm==2.2.1",
"matplotlib": "matplotlib>=2.2.2",
"mock": "mock==2.0.0",
"nltk": "nltk>=3.4",
"numpy": "numpy>=1.13.3",
"pandas": "pandas>1.0.3,<=1.2.2",
"papermill": "papermill>=2.2.0",
"pip": "pip>=19.2",
"pytest": "pytest>=3.6.4",
"pytest-cov": "pytest-cov>=2.12.1",
"pytorch": "pytorch-cpu>=1.0.0",
"seaborn": "seaborn>=0.8.1",
"requests": "requests>=2.0.0,<3",
"retrying": "retrying>=1.3.3",
"scikit-learn": "scikit-learn>=0.19.1",
"scipy": "scipy>=1.0.0",
"scikit-surprise": "scikit-surprise>=1.0.6",
"swig": "swig==3.0.12",
"lightgbm": "lightgbm==2.2.1",
"cornac": "cornac>=1.11.0",
"papermill": "papermill>=2.2.0",
"scipy": "scipy>=1.0.0",
"seaborn": "seaborn>=0.8.1",
"tqdm": "tqdm>=4.31.1",
"retrying": "retrying>=1.3.3",
}

CONDA_PYSPARK = {"pyarrow": "pyarrow>=0.8.0", "pyspark": "pyspark==2.4.5"}

CONDA_GPU = {
"fastai": "fastai==1.0.46",
"numba": "numba>=0.38.1",
"pytorch": "pytorch>=1.0.0",
"pytorch": "pytorch>=1.0.0,<=1.2.0", # For cudatoolkit=10.0
"cudatoolkit": "cudatoolkit=10.0",
"cudnn": "cudnn>=7.6"
}

PIP_BASE = {
Expand All @@ -78,14 +77,12 @@
"azure-mgmt-cosmosdb": "azure-mgmt-cosmosdb==0.8.0",
"black": "black>=18.6b4",
"category_encoders": "category_encoders>=1.3.0",
"dataclasses": "dataclasses>=0.6",
"hyperopt": "hyperopt==0.1.2",
"idna": "idna==2.7",
"locustio": "locustio==0.11.0",
"locust": "locust>=1,<2",
"memory-profiler": "memory-profiler>=0.54.0",
"nbconvert": "nbconvert==5.5.0",
"pydocumentdb": "pydocumentdb>=2.3.3",
"pymanopt": "pymanopt==0.2.5",
"pyyaml": "pyyaml>=5.4.1,<6",
"xlearn": "xlearn==0.40a1",
"transformers": "transformers==2.5.0",
"tensorflow": "tensorflow==1.15.4",
Expand Down