Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to choose installation environment #146

Closed
krzyslom opened this issue Apr 10, 2020 · 13 comments · Fixed by #147
Closed

Allow to choose installation environment #146

krzyslom opened this issue Apr 10, 2020 · 13 comments · Fixed by #147

Comments

@krzyslom
Copy link
Contributor

As mentioned in #113 . Recently I've tried to run keras::install_keras() inside the Docker image with already installed precommit. Looks like both keras and tensorflow are now installed by default to r-reticulate.

This results in the following error:

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Examining setuptools: 7%|▋ | 2/30 [00:00<00:00, 2603.54it/s]
Comparing specs that have this dependency: 0%| | 0/15 [00:00<?, ?it/s]
Finding conflict paths: 0%| | 0/2 [00:00<?, ?it/s]
Finding shortest conflict path for setuptools: 0%| | 0/2 [00:00<?, ?it/s]
Finding shortest conflict path for setuptools: 50%|█████ | 1/2 [00:01<00:01, 1.99s/it]
Finding shortest conflict path for setuptools: 100%|██████████| 2/2 [00:01<00:00, 1.01it/s]
...truncated...

@lorenzwalthert
Copy link
Owner

lorenzwalthert commented Apr 10, 2020

No wonder everyone uses virtual and conda envs in Python if you run into troubles with incompatibilities so quickly 😄 (bashing over).

I actually considered that in #134. Problem: Potentially complicated because executables from conda environments are not on the path by default (I think). So to find an executable, one had to look through all conda envs bin/ folder to find it or allow the user to specify which conda environment to use. Both adds complexity. I see two solutions:

  • install into r-pre-commit env with precommit::install_precommit() instead of r-reticulate to prevent conflicts.
  • for your specific case: select other installation method that will put the executable on the path.

I think the first option is clearly preferred. What do you think?

@krzyslom
Copy link
Contributor Author

I would go with the first one.

At the moment I'm setting up a Docker image with miniconda3, precommit, keras, and mlflow. When all of them are installed in their separate conda environments (respectively r-reticulate, r-tensorflow, r-mlflow-1.7.0) I experience no problems while mixing up their functionality. I was able to train a keras model with mlflow-registered parameters and utilize precommit when committing changes. The only additional thing that has to be done is to call reticulate::use_condaenv("r-tensorflow") inside the keras training script. Also the mentioned mlflow environment is a default one.

However I would expect that all of those tools could be installed inside a single conda environment, similarly to e.g. dplyr and mlr3 installed inside one R library. I'm going to check this configuration without R context and provide feedback.

@lorenzwalthert
Copy link
Owner

Ok, great, then I'll implement that. Note that {precommit} does not depend on any python module directly, it only depends on the executable. You can consider it a coincidence that pre-commit was written in Python in the first place, which in turn means that it will work independently of the activated conda environment (and reticulate::use_condaenv('r-precommit') won't ever be useful I think).

Well, I have experienced such problems before and there are also some very narrow version requirements for some python libraries that make them incompatible easily.

@lorenzwalthert
Copy link
Owner

lorenzwalthert commented Apr 10, 2020

Surprisingly, on Windows, the executable cannot really be run with system2 when the executable was not installed in the standard conda env and the env is not activated: conda-forge/pre-commit-feedstock#9.

I think this is a problem because I don't know how to conda activate && /path/to/pre-commit with system2 or if there is another way to make sure the env is activated.

For the time being, you could also consider alternative installation methods such as pip.

@krzyslom
Copy link
Contributor Author

Turns out that it is possible to install pre-commit and keras into the same conda environment from R interface.

The following code works:

precommit::install_precommit()
keras::install_keras("conda", conda_python_version = NULL,
                     tensorflow = "2.2.0rc2")

By default the keras package forces Python 3.6 version, hardcoded in the tensorflow::install_tensorflow call. In keras::install_keras this argument can be passed through the ellipsis. After changing it to NULL an error is thrown saying that tensorflow 2.2.0 could not be found. It is mandatory to explicitly pass 2.2.0rc2.

This results in a library with Python 3.8.2 version (since the first installation step considered only pre-commit). However, in this case inside the conda environment most of the modules are installed from pip.

The steps can be reversed

keras::install_keras()
precommit::install_precommit()

And then it just works, but the Python version is 3.6.10 due to the mentioned hardcoded parameter. pre-commit doesn't care about the Python version in this case. Now most of the modules are installed from conda-forge.

When running conda env create -f environment.yaml on the following file

name: r-reticulate
channels:
  - defaults
  - conda-forge
  - pypi
dependencies:
  - python=3.8.2
  - pre-commit
  - keras

there will be conflicts found. This is due to fact, that even if the pypi channel is available on Anaconda Cloud, there are no packages inside it.

To use pip packages with conda the following structure has to be set up

name: r-reticulate
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.8.2
  - pre-commit
  - pip
  - pip:
    - keras
    - tensorflow

Strangely tensorflow has to be explicitly listed. This is the closest setup to the very first one in this comment, with some minor changes.

Basically when creating any kind of virtual environment we usually list the desired packages at the same time and the environment sorts itself out. This is not the case for R. It's true, that we can install.packages(c("precommit", "mlflow", "keras")), but the actual environment installation steps are separate - <package>::install_<package>. Normally conda will inform the developer that some packages inside the environment had to be downgraded/upgraded, if some new module relies on the older/newer versions of already installed packages (if not possible, then conflict error occurs). This becomes a problem when the installation function is not flexible by default, like one in the keras package. I can easily imagine that not every R user wants to spend extra amount of time figuring this out and they would expect things to just work.

I wonder if there could be a common R package solution for installing Python envs. Not overall one like the reticulate package, but something that would automatically install R packages as well.
For instance rve::install(c("precommit", "mlflow", "keras")) would not only install Python modules inside the virtual environment at the same time, like reticulate::conda_install, but also the corresponding R packages.

@lorenzwalthert
Copy link
Owner

lorenzwalthert commented Apr 12, 2020

Interesting investigation. So if I got you right, installing the python packages from outside of R works fine but when you install into the conda env via R, the trouble starts, in particular when packages like R {keras} or {tensorflow} have fixed python version requirements? As from my experience, it's generally better to avoid installing packages into a conda env with pip.

So I think RStudio works on a solution you might be interested. Also, I find it quite frustrating that command line executables don't work in general if the conda env is not activated as described above. This basically makes conda a bad solution for hosting such executables.

@krzyslom
Copy link
Contributor Author

...have fixed python version requirements?

Yes, and even if someone overcomes this with proper parameters set I guess it still could be dangerous, since R package interface could be incompatible with particular version of the Python module.

As for the solution you have mentioned - this looks promising.

Currently, automatic Python environment configuration will only happen when using the aforementioned reticulate Miniconda installation.

Maybe it will make more sense if I install miniconda through reticulate instead of manually, borrowing from Docker image mentioned in the first comment.

Getting rid of <package>::install_<package> as mandatory step is exactly what I had in mind. Looks like I was late with my idea. Thank you for the feedback. :)

@lorenzwalthert
Copy link
Owner

lorenzwalthert commented Apr 18, 2020

@krzyslom I wonder if you can't convince the people in rstudio/keras3#1014 to install into the r-tensorflow env instead of r-reticulate, then I'd have one less problem to worry here 😄. I am not sure why they prefer the doc update instead of the change in the installation env.

@lorenzwalthert
Copy link
Owner

lorenzwalthert commented Apr 18, 2020

I guess there is hope for multiple commands with https://stackoverflow.com/questions/20974317/multiple-shell-commands-in-windows. Problem is that stdout and stdin are not captured consistently in my understanding and the use of system is discouraged.

@krzyslom
Copy link
Contributor Author

krzyslom commented Apr 19, 2020

@lorenzwalthert from this reticulate article I understand, that the approach the community is aiming at and will possibly become a standard, is to add the Config/reticulate: key to the DESCRIPTION file.

Actually tensorflow is already doing this but keras is not.
However they are not using onLoad function, so the solution is not automated yet on their side.

Should precommit, mlflow, keras, etc. have this field set up with R/zzz.R with onAttach function (mentioned in the article), then users could just run

library(precommit)
library(mlflow)
library(keras)
<pkg>::<fnc>

and the conda environment should be automatically set up. Or users could just set them all up in Depends: inside their packages' DESCRIPTION.

This way all packages could just use the one default r-reticulate env.

@lorenzwalthert
Copy link
Owner

Thanks for the explanation. The difference to most python packages is that {precommit} only needs the executable and no Python function and I not sure the former is supported in this approach. The automated installation described in the article you link also would require anyone to use miniconda to use {precommit}, but some consider this a heavy dependency so we also support other installation methods. I found a workaround to install into r-precommit in #147 I believe so I won't use r-reticulate.

@krzyslom
Copy link
Contributor Author

I guess that the workaround and configured DESCRIPTION file could complement each other
A) workaround

# can be conda, virtualenv, pip installation
precommit::install_precommit()
precommit::use_precommit()
# Installs to r-precommit

B) DESCRIPTION

# R miniconda installation
library(precommit)
use_precommit()
# if not already installed - install miniconda through reticulate package? -> Y
# installs to `r-reticulate` and figures out python packages automatically

Since precommit does not depend on any Python modules, its installation should never clash with already existent r-reticulate environment. Please notice that the issue only occurred when precommit was installed first, and then keras couldn't handle the Python version downgrade.

I can try this when the workaround PR gets merged.

@lorenzwalthert
Copy link
Owner

Yeah I think we could also use approach B) later. But I think I spend so much time already on that that I'd like to move on for now. Also, maybe it's better to wait and see how the Config/reticulate approach evolves and support it when it's more mature.

Please notice that the issue only occurred when precommit was installed first, and then keras couldn't handle the Python version downgrade.

Yeah but in my understanding it's a likely scenario anyways, so since pre-commit does not have a lot of dependencies, I think it's a good option to install into its own environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants