-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache is hitting race conditions when when sharing the cache directory across multiple running docker containers #7434
Comments
We very intentionally do not lock the cache, it's designed to be concurrency safe. My guess is that error is something to do with the file being created by a different user...? |
Are both containers running as the same user? |
They should be running as the same user. They are based on the same image. |
To give you a little more context, these Docker containers are spun up by Jenkins as part of the pipeline. When building a wheel with an extension, I build the wheel for each supported version of Python in their own Docker container (based on the same image that has all the supported versions of Python installed) running in parallel. When these containers run, they run with the exact same settings, same environment variables, same user, and same path to cache mounted on the host system. When the containers run one at a time, everything runs fine. It uses the cache without an issue. However, when multiple containers are accessing the same cache directory, I get the permission errors. |
Is the error always when trying to write the |
Other files as well. If you want I can put the stack trace for those here too |
Actually, I think it might be only CACHEDIR.TAG that I'm running into. At least when using Linux. |
Let's see if #7550 resolves this then |
Any luck after #7550? |
I have had only limited attempts to try it due to other issues that pulled my attention away. I'm hoping to update the instances of uv on my pipelines this week though. |
It doesn't look like it fixed it. uv 0.4.20
|
This seems to be running into an issue on windows these days. |
I'm seeing the same issue. I'm trying to switch some Gitlab CI pipelines from pip to uv. I have three jobs that run in parallel. The jobs use tox environments and they are running on the same Windows machine (Gitlab Shell Executor, no docker). When all three jobs run in parallel one of them fails (see error message below) most of the time when running
.gitlab-ci.yml stages:
- lint
- docs
variables:
PIP_INDEX_URL: "..."
PIP_CACHE_DIR: "C:\\Cache\\pip"
UV_DEFAULT_INDEX: "..."
UV_CACHE_DIR: "C:\\cache\\uv"
before_script:
- python -m venv _venv
- .\_venv\Scripts\Activate.ps1
- python --version
- python -m pip install pip --upgrade
- python -m pip install uv
- uv pip install tox tox-uv
tc_format_check:
stage: lint
script: tox -e tc-format-check
tags:
- mytag
static_code_analysis:
allow_failure: true
stage: lint
script: tox -e static-code-analysis
tags:
- mytag
typecheck:
allow_failure: true
stage: lint
script: tox -e types
tags:
- mytag tox.ini [tox]
minversion = 3.18
envlist = codestyle, docstyle, errors, types
isolated_build = True
toxworkdir = _tox
[testenv]
; install_command = pip install --cache-dir c:\cache\pip
extras = test
passenv =
CI
USERNAME
GIT_*
deps:
sphinx
myst-parser
sphinx-multiversion
commands =
test-py3{7,8,9,10, 11}: pytest {posargs:tests}
[testenv:static-code-analysis]
description = Check code and tests for PEP 8 compliance and code complexity.
skip_install = true
envdir = {toxworkdir}/static-code-analysis
deps =
ruff
commands =
ruff check {posargs:src/ tests/}
[testenv:tc-format-check]
description = Check test functions.
skip_install = true
envdir = {toxworkdir}/pylint
deps =
pylint >= 3.0.2
commands =
pylint {posargs: tests/}
[testenv:types]
description = Run static type checker.
envdir = {toxworkdir}/lint
deps =
mypy
types-requests
commands =
mypy --check-untyped-defs --no-implicit-optional --follow-imports=skip --disallow-untyped-calls --ignore-missing-imports src\ error message
|
@Slarag thank you for confirming this. I was starting to think I was the only person experiencing this and/or I was taking crazy pills. |
Can you share the owner and permissions on the file (just like a |
I think it's basically expected that we can't perform proper synchronization across containers? I don't think the file locks can be properly held. |
@zanieb In my case I'm not using containers at all. The Gitlab shell executor just runs different powershell processes on the same machine. And the jobs just access the same cache directory on that machine. I'm not shure how the shell executor fully works internally, but there's no isolation or so. The jobs can in theory access each others files and have all the access rights of the assigned Windows user. So it's just like three simulaneous processes running uv.
https://docs.gitlab.com/runner/executors/shell.html Could it be a problem that it's different uv executables acessing the same cache (uv installed in each separate venv by |
Hello, We are seeing the same issue as well. We are not using Docker containers. Simply running multiple instances of UV with a clean cache will repro the issue. The problem happens when two processes decide to build the same package at the same time. One process will end up reading while another is writing. On Linux this works, but on Windows due to the way file locking works, this causes issues. Someone on my team actually spent a day investigating this issue but it appeared to be nontrivial to fix. |
I just reproduced the "Failed to retrieve temporary file" thing on my Windows machine by building in parallel with setuptools -- is that what you're seeing @jan11011977? |
I'm going to track this in #11002 since the Windows issues are unrelated to the original report here (Docker). |
While building wheels on my CI machine, I've been running into an issue involving the cache directory trying to access and write to the same files. This only happens when I have two or more containers running with the same mounted path, the location of UV_CACHE_DIR aka cache path. This doesn't happen every time, just when uv is trying to access the same files in the cache. For this reason the ci pipeline could be fine depending on which stages are running at the same time.
It looks like it's possibly a race condition that wasn't a problem when I was using pip. Is there a "process locking file" located outside of the UV_CACHE_DIR that I should also be mounting so that the cache isn't written to at the same time?
uv==2.12
build==1.2.1
Note: In this example, I'm using Python 3.8 on Linux but it's a happens with all version in Linux and Windows Docker containers.
Docker Command used:
docker run -ti -v uvcache:/.cache/uv -e UV_CACHE_DIR=/.cache/uv some_docker_image_that_has_uv_installed
Uv command used inside the container:
python3.8 -m build --wheel --installer=uv
Output
The text was updated successfully, but these errors were encountered: