Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError on cache in multi-users case #488

Closed
fcollonval opened this issue Sep 8, 2020 · 29 comments
Closed

RuntimeError on cache in multi-users case #488

fcollonval opened this issue Sep 8, 2020 · 29 comments
Labels
type::bug Something isn't working

Comments

@fcollonval
Copy link
Member

Fail to create a new environment using mamba.

Context:

  • JupyterHub installation
  • Shared pkgs and envs directory with rw rights at a common group level
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/conda/exceptions.py", line 1079, in call
return func(args, kwargs)
File "/opt/conda/lib/python3.7/site-packages/mamba/mamba.py", line 941, in exception_converter
raise e
File "/opt/conda/lib/python3.7/site-packages/mamba/mamba.py", line 935, in exception_converter
exit_code = wrappedmain(args, kwargs)
File "/opt/conda/lib/python3.7/site-packages/mamba/mamba.py", line 894, in wrappedmain
result = do_call(args, p)
File "/opt/conda/lib/python3.7/site-packages/mamba/mamba.py", line 782, in do_call
exit_code = create(args, parser)
File "/opt/conda/lib/python3.7/site-packages/mamba/mamba.py", line 661, in create
install(args, parser, "create")
File "/opt/conda/lib/python3.7/site-packages/mamba/mamba.py", line 418, in install
prefix=prefix,
File "/opt/conda/lib/python3.7/site-packages/mamba/utils.py", line 73, in get_index
is_downloaded = dlist.download(True)
RuntimeError: Operation not permitted: '/usr/local/share/jupyter/pkgs/cache/ba887a88.json'

The cache file is owned by another user than the one creating the environment.

Reference:
mamba 0.5.1
conda 4.8.4

Note: It was fixed by using conda create instead of mamba create

@wolfv
Copy link
Member

wolfv commented Sep 9, 2020

Thanks for the detailed bug report!

@adriendelsalle
Copy link
Member

adriendelsalle commented Oct 24, 2020

Hi @fcollonval , makes me think about this conda issue.

About JHub

If I remember well, your JHub install uses PAM auth and local process spawner ?

I faced the same issue with LDAP auth and docker spawner, using a spawner hook to replace the user uid/gid in the container (default is jovyan if I remember well). That way the user feel at home seeing his company uid and it allowed me to deal with private/public workspaces.
The shared packages cache is mounted as a volume and gid was mapped on the host as jhub_users.

Both situation lead to users acting with their own uid/gid on the filesystem if I'm right.

About conda/mamba

In my case, the error triggered on some unpacked packages (see the issue), not on json file. But the initial comment of this issue is very similar to yours.
I guess conda (and also mamba) missed configuration for both permissions and ownership on shared package cache?

I'm however surprised that using conda fixed the issue in your case!

@dandaman
Copy link

We experience the same issue - Is there a solution yet?
@fcollonval how did you solve it?

@wolfv
Copy link
Member

wolfv commented Oct 22, 2021

@dandaman what version of mamba? We recently pushed some improvements with version 0.17.0.

@dandaman
Copy link

@wolfv Thank you for your prompt response!

mamba --version
mamba 0.15.3
conda 4.10.3

I'll have a go at the new version then and report back

@davestacionis
Copy link

I have just upgraded to 0.17.0 and the issue remains.

@wolfv
Copy link
Member

wolfv commented Nov 3, 2021

Would you be able to write a test that would show us your issue?

@davestacionis
Copy link

Sure, it is easy to reproduce. A multi-user hpc and the users will run shared pipelines for specific processes. Each pipeline will begin by downloading the required packages into the shared pkgs directory. The error appears when a second user runs the exact same pipeline and the json files are already present in the cache directory, but owned by another user. No permission changes on the cache files have any affect. The only workarounds are to remove the cache files or not use the shared pkgs directory. If I can provide more detail/test data, just let me know what you would like to see.

@wolfv
Copy link
Member

wolfv commented Nov 3, 2021

@davestacionis are you using ACLs?
If yes, your problem might be related to this: #1123

There is a proposed fix in #1137 however, the authors haven't had time to get back to it.

@davestacionis
Copy link

I am not using ACLs but I think #1123 still applies. We are using standard group permissions and have setgid at the top level of the install to ensure everyone in our users group has access.

@wolfv
Copy link
Member

wolfv commented Nov 3, 2021

any chance you could convert that into a bash script for us to try out?!

@dandaman
Copy link

dandaman commented Nov 3, 2021

@wolfv, for us its the same - also after upgrading to 0.17.0 the issue remains.

Everytime a different user uses mamba to install things the above error occurs the next time another user tries to install things. Permissions on the pkgs/cache (g+rwxs) work at the file system level i.e. the different users can delete cache json files created by others..

The problem can always be resolved by either using conda or deleting the contents of pkg/cache.

So I think we have the identical situation as described by @davestacionis

I'll checkif I can reproduce it on my local linux client...

@davestacionis
Copy link

I'm not clear on what you want me to test via script. The proposed fix looks like it requires changes to the source/C code.

@wolfv
Copy link
Member

wolfv commented Nov 3, 2021

I want a bash script that replicates your setup. I believe that should only take a couple of lines of bash , no?

Ideally you could integrate that as a test in our GitHub action (that would fail for now)

@davestacionis
Copy link

While working on a script to duplicate the issue, I found a fix for my environment. Our install was shared by many users and they had write access to the pkgs & cache folders. Removing this access and setting the gid back to root resolved the issue and now shared pkg updates must be done via sudo. This forces user pkgs & envs into their home/personal environment, but that is ok for now.

@wolfv
Copy link
Member

wolfv commented Nov 8, 2021

great. thansk for letting us know.

@samtygier-stfc
Copy link

I am hitting a similar issue in our setup. We build linux VMs for a users using ansible. This installs mambaforge in /opt and creates an environment with our software installed, this runs as root. The VM is then given to a single user, who will use it for a few days or weeks. We want the user to be able to update the environment. We don't know who the user will be at install time, so we chmod +w /opt/mambaforge/envs and /opt/mambaforge/pkgs/cache so that whichever user gets the machine they have write access there.

This worked with conda and conda update. But with mamba update we get:

RuntimeError: Operation not permitted: '/mambaforge/pkgs/cache/b556ea5a.json'

I can reproduce with the following docker file:

FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y wget

RUN wget -nv -O Mambaforge.sh https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh
RUN bash Mambaforge.sh -b -p /mambaforge

RUN /mambaforge/bin/mamba init
SHELL ["/bin/bash", "-c"]

RUN source /mambaforge/etc/profile.d/conda.sh && mamba create --yes -n test numpy==1.21.3

RUN chmod -R o+w /mambaforge/envs
RUN chmod -R o+w /mambaforge/pkgs/cache

# Make the cache files look old so that mamba needs to update them
RUN find /mambaforge/pkgs/cache -exec touch -d "7 days ago" {} +

RUN useradd -ms /bin/bash user
USER user

Build with:

docker build --no-cache -t mamba_user_test -f dockerfile .

then in:

docker run -it --rm mamba_user_test bash

run:

source /mambaforge/etc/profile.d/conda.sh && conda activate test && mamba update --yes numpy

 >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/mambaforge/lib/python3.9/site-packages/conda/exceptions.py", line 1079, in __call__
        return func(*args, **kwargs)
      File "/mambaforge/lib/python3.9/site-packages/mamba/mamba.py", line 917, in exception_converter
        raise e
      File "/mambaforge/lib/python3.9/site-packages/mamba/mamba.py", line 911, in exception_converter
        exit_code = _wrapped_main(*args, **kwargs)
      File "/mambaforge/lib/python3.9/site-packages/mamba/mamba.py", line 869, in _wrapped_main
        result = do_call(args, p)
      File "/mambaforge/lib/python3.9/site-packages/mamba/mamba.py", line 740, in do_call
        exit_code = update(args, parser)
      File "/mambaforge/lib/python3.9/site-packages/mamba/mamba.py", line 630, in update
        return install(args, parser, "update")
      File "/mambaforge/lib/python3.9/site-packages/mamba/mamba.py", line 486, in install
        index = load_channels(pool, channels, repos)
      File "/mambaforge/lib/python3.9/site-packages/mamba/utils.py", line 122, in load_channels
        index = get_index(
      File "/mambaforge/lib/python3.9/site-packages/mamba/utils.py", line 103, in get_index
        is_downloaded = dlist.download(True)
    RuntimeError: Operation not permitted: '/mambaforge/pkgs/cache/ddfd5f96.json'

`$ /mambaforge/condabin/mamba update --yes numpy`

  environment variables:
                 CIO_TEST=<not set>
        CONDA_DEFAULT_ENV=test
                CONDA_EXE=/mambaforge/bin/conda
             CONDA_PREFIX=/mambaforge/envs/test
    CONDA_PROMPT_MODIFIER=(test)
         CONDA_PYTHON_EXE=/mambaforge/bin/python
               CONDA_ROOT=/mambaforge
              CONDA_SHLVL=1
           CURL_CA_BUNDLE=<not set>
                     PATH=/mambaforge/envs/test/bin:/mambaforge/condabin:/usr/local/sbin:/usr/lo
                          cal/bin:/usr/sbin:/usr/bin:/sbin:/bin
       REQUESTS_CA_BUNDLE=<not set>
            SSL_CERT_FILE=<not set>

     active environment : test
    active env location : /mambaforge/envs/test
            shell level : 1
       user config file : /home/user/.condarc
 populated config files : /mambaforge/.condarc
          conda version : 4.10.3
    conda-build version : not installed
         python version : 3.9.7.final.0
       virtual packages : __linux=5.11.0=0
                          __glibc=2.31=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /mambaforge  (read only)
      conda av data dir : /mambaforge/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /mambaforge/pkgs
                          /home/user/.conda/pkgs
       envs directories : /home/user/.conda/envs
                          /mambaforge/envs
               platform : linux-64
             user-agent : conda/4.10.3 requests/2.26.0 CPython/3.9.7 Linux/5.11.0-41-generic ubuntu/20.04.1 glibc/2.31
                UID:GID : 1000:1000
             netrc file : None
           offline mode : False

A work around is for the user to remove the .json files.

rm /mambaforge/pkgs/cache/*.json

@xylar
Copy link
Contributor

xylar commented Feb 15, 2022

I just started seeing this issue out of nowhere as part of my Azure pipeline. During conda mambabuild ..., I see:

RuntimeError: Operation not permitted: '/usr/share/miniconda/pkgs/cache/2ce54b42.json'

I presume something must have changed about permissions in this directory in the Azure docker image for Ubuntu. Similar CI with conda build works as expected.

@jonashaag
Copy link
Contributor

Just saw this here as well: pandas-dev/pandas#45902

jonashaag added a commit to jonashaag/pandas that referenced this issue Feb 22, 2022
sphuber added a commit to aiidateam/aiida-core that referenced this issue Apr 12, 2022
A few fixes were needed:

* Increase timeout for `install-with-pip` and `install-with-conda` jobs
  These were taking a longer time. The problem for conda on Python 3.8
  is a known problem

* Switch to mamba: The build using conda was failing because it was
  running out of memory. Mamba should be faster and use less memory.

* The switch to mamba required a workaround to delete any existing JSON
  files in `/usr/share/miniconda/pkgs/cache`. This seems a known problem:
  mamba-org/mamba#488
sphuber added a commit to aiidateam/aiida-core that referenced this issue Apr 12, 2022
A few fixes were needed:

* Increase timeout for `install-with-pip` and `install-with-conda` jobs
  These were taking a longer time. The problem for conda on Python 3.8
  is a known problem

* Switch to mamba: The build using conda was failing because it was
  running out of memory. Mamba should be faster and use less memory.

* The switch to mamba required a workaround to delete any existing JSON
  files in `/usr/share/miniconda/pkgs/cache`. This seems a known problem:
  mamba-org/mamba#488
@akaszynski
Copy link

akaszynski commented Aug 13, 2022

This is still an issue when using the mamba solver within conda with conda install -n base conda-libmamba-solver.

Here's a workaround that worked for us in GitHub actions:

      - name: 'Workaround for mamba-org/mamba#488'
        run: rm /usr/share/miniconda/pkgs/cache/*.json

Thanks to pandas-dev/pandas#45902 for figuring this out.

@coroa
Copy link
Contributor

coroa commented Nov 30, 2022

I guess at least for some people that might be identical to #1123 and thus potentially addressed by #2141 .

ie mtime update to a given time denied (unless owned by effective uid), while special calling case of utime or utimensat allows updating timestamp to the current time as long as the file is writable.

jfrost-mo added a commit to jfrost-mo/example-python-project that referenced this issue Dec 16, 2022
@determ1ne
Copy link

To anyone who wants a quick dirty fix, it's possible to fix the problem by hooking utimensat().

#include <fcntl.h>
#include <sys/stat.h>
#include <stddef.h>
#define __USE_GNU
#include <dlfcn.h>

int utimensat(int dirfd, const char *pathname, const struct timespec times[2], int flags) {
    int (*outimensat)(int, const char *, const struct timespec[2], int) = dlsym(RTLD_NEXT, "utimensat");
    return outimensat(dirfd, pathname, (struct timespec*)NULL, flags);
}
  1. save the file as fixmamba.c
  2. cc fixmamba.c -fPIC -shared -o fixmamba.so -ldl
  3. LD_PRELOAD="$(pwd)/fixmamba.so" mamba install something

@determ1ne
Copy link

According to man utimensat, having write access is not enough for passing a non-null value to utimensat times. The STL std::filesystem::last_write_time is doing touch -d instead of touch.

conda handles the behavior correctly. See https://github.com/conda/conda/blob/82fa24baedcb42a9504912f6f851aeeae2fc478f/conda/core/subdir_data.py#L320 .

@pgramme
Copy link

pgramme commented Jan 23, 2023

Upgrading to mamba 1.2.0 seems to have fixed the issue for me (unless it was by chance, because caching of repodata temporarily bypasses the issue in my case). So I suppose #2141 is indeed responsible for the fix. Thanks @coroa !

My setup: multi-user install on Linux using standard group permissions and setgid on pkgs and cache. No ACL used

Would be good if others confirm the fix

@wolfv
Copy link
Member

wolfv commented Jan 23, 2023

Awesome! Thanks @coroa :)

@zhangshaos
Copy link

I have the same problem when I use conda on Windows.
Error message is error libmamba Could not open lockfile 'D:\ProgramData\miniconda3\pkgs\cache\cache.lock'.
The problem is current user do not have right to write 'D:\ProgramData\miniconda3\pkgs\cache\cache.lock'`.
When I open adaconda power shell as a administer, the problem goes away.

@AngryMaciek
Copy link

AngryMaciek commented Sep 23, 2023

I am working inside a Docker container on Ubuntu (container also based on Ubuntu, 22.04) and just noticed this exact issue happening for me too. My mamba is 1.4.2; installed from latest installer of Miniforge during image building. I use gosu to execute commands with regular access from a root user, I added him to a group which has access to the mambaforge directory.

@jlhervylcl
Copy link

Hello,

We are encountering the same issue.

critical libmamba filesystem error: cannot set permissions: Operation not permitted [/app/list/ia/micromamba_cache/pkgs/cache]

The cache dir has been generated by the install of environments by one user. And the install from another user generates this error, even though he has rwx permissions on all the directories and files inside micromamba_cache. But he is not the owner.

@JohanMabille JohanMabille added the type::bug Something isn't working label Dec 5, 2024
@jjerphan
Copy link
Member

This might be related to groups ownership and permissions of the caches and mamba.

Please see: #3740 (comment)

Feel free to reopen an issue with a reproducer if you still have this issue with 2.0.

Note that 1.x is note being developed anymore, but to address security fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type::bug Something isn't working
Projects
None yet
Development

No branches or pull requests