Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last.ckpt symlink breaking on Windows #18969

Closed
Jason94 opened this issue Nov 8, 2023 · 2 comments
Closed

Last.ckpt symlink breaking on Windows #18969

Jason94 opened this issue Nov 8, 2023 · 2 comments
Labels
bug Something isn't working duplicate This issue or pull request already exists ver: 2.1.x

Comments

@Jason94
Copy link

Jason94 commented Nov 8, 2023

Bug description

Windows doesn't seem to support the symlink behavior implemented in this PR for the last.ckpt:
#18748

Running a non-administrator elevated command prompt, I get this error message:

OSError: [WinError 1314] A required privilege is not held by the client: 'lightning_logs\\0_5_0__feedback3_3__encoder\\checkpoints\\epoch=01---val_loss=0.9149-val_f1=0.6876.ckpt' -> 'lightning_logs\\0_5_0__feedback3_3__encoder\\checkpoints\\last.ckpt'

When I run the command prompt as elevated it does complete the symlink, but when it doesn't function correctly. When I try using it in anything, I get an error message. This is the path it gave to the symlink, which is the relative path to the linked file from the working directory of the training script, not the symlink:
lightning_logs\0_5_0__feedback3_2__encoder\checkpoints\epoch=09---val_loss=0.2521-val_f1=0.9265.ckpt.
(The symlink was right next to it. The path to it from the working directory of the training script is lightning_logs\0_5_0__feedback3_2__encoder\checkpoints\last.ckpt.)

I think the easiest fix here is to revert the change made if it detects it's running in Windows OS, and just save it as a copy like it used to. Even if the symlink behavior could be fixed, I don't want to expect people using my code to run training scripts as an administrator. Alternatively, in the PR the idea of having a flag to determine if the last checkpoint was symlinked or not was brought up, and that would also be an acceptable fix.

What version are you seeing the problem on?

v2.1

How to reproduce the bug

No response

Error messages and logs

OSError: [WinError 1314] A required privilege is not held by the client: 'lightning_logs\\0_5_0__feedback3_3__encoder\\checkpoints\\epoch=01---val_loss=0.9149-val_f1=0.6876.ckpt' -> 'lightning_logs\\0_5_0__feedback3_3__encoder\\checkpoints\\last.ckpt'

Environment

Current environment
  • CUDA:
    - GPU:
    - NVIDIA GeForce GTX 1080
    - available: True
    - version: 11.8
  • Lightning:
    - lightning: 2.1.1
    - lightning-api-access: 0.0.5
    - lightning-cloud: 0.5.50
    - lightning-fabric: 2.1.1
    - lightning-utilities: 0.9.0
    - pytorch-lightning: 2.1.1
    - torch: 2.1.0+cu118
    - torchaudio: 2.1.0+cu118
    - torchmetrics: 1.2.0
    - torchvision: 0.16.0+cu118
  • Packages:
    - absl-py: 2.0.0
    - aiobotocore: 2.5.4
    - aiohttp: 3.8.6
    - aioitertools: 0.11.0
    - aiosignal: 1.3.1
    - annotated-types: 0.6.0
    - ansicon: 1.89.0
    - antlr4-python3-runtime: 4.9.3
    - anyio: 3.7.1
    - arrow: 1.3.0
    - async-timeout: 4.0.3
    - attrs: 23.1.0
    - backoff: 2.2.1
    - beautifulsoup4: 4.12.2
    - bitsandbytes: 0.41.2
    - blessed: 1.20.0
    - boto3: 1.28.17
    - botocore: 1.31.17
    - cachetools: 5.3.2
    - certifi: 2022.12.7
    - charset-normalizer: 2.1.1
    - click: 8.1.7
    - colorama: 0.4.6
    - contourpy: 1.2.0
    - croniter: 1.4.1
    - cycler: 0.12.1
    - dateutils: 0.6.12
    - deepdiff: 6.7.0
    - docker: 6.1.3
    - docstring-parser: 0.15
    - fastapi: 0.104.1
    - filelock: 3.9.0
    - fonttools: 4.44.0
    - frozenlist: 1.4.0
    - fsspec: 2023.4.0
    - google-auth: 2.23.4
    - google-auth-oauthlib: 1.1.0
    - grpcio: 1.59.2
    - h11: 0.14.0
    - hydra-core: 1.3.2
    - idna: 3.4
    - idrt: 0.5.1
    - importlib-resources: 6.1.1
    - inquirer: 3.1.3
    - itsdangerous: 2.1.2
    - jinja2: 3.1.2
    - jinxed: 1.2.0
    - jmespath: 1.0.1
    - joblib: 1.3.2
    - jsonargparse: 4.27.0
    - kiwisolver: 1.4.5
    - lightning: 2.1.1
    - lightning-api-access: 0.0.5
    - lightning-cloud: 0.5.50
    - lightning-fabric: 2.1.1
    - lightning-utilities: 0.9.0
    - markdown: 3.5.1
    - markdown-it-py: 3.0.0
    - markupsafe: 2.1.2
    - matplotlib: 3.8.1
    - mdurl: 0.1.2
    - mpmath: 1.3.0
    - multidict: 6.0.4
    - networkx: 3.0
    - numpy: 1.24.1
    - oauthlib: 3.2.2
    - omegaconf: 2.3.0
    - ordered-set: 4.1.0
    - packaging: 23.2
    - pandas: 2.1.2
    - pillow: 9.3.0
    - pip: 22.3.1
    - protobuf: 4.23.4
    - psutil: 5.9.6
    - pyasn1: 0.5.0
    - pyasn1-modules: 0.3.0
    - pydantic: 2.4.2
    - pydantic-core: 2.10.1
    - pygments: 2.16.1
    - pyjwt: 2.8.0
    - pyparsing: 3.1.1
    - pypika: 0.48.9
    - python-dateutil: 2.8.2
    - python-editor: 1.0.4
    - python-multipart: 0.0.6
    - pytorch-lightning: 2.1.1
    - pytz: 2023.3.post1
    - pywin32: 306
    - pyyaml: 6.0.1
    - readchar: 4.0.5
    - redis: 5.0.1
    - requests: 2.28.1
    - requests-oauthlib: 1.3.1
    - rich: 13.6.0
    - rsa: 4.9
    - s3fs: 2023.4.0
    - s3transfer: 0.6.2
    - scikit-learn: 1.3.2
    - scipy: 1.11.3
    - seaborn: 0.13.0
    - setuptools: 65.5.0
    - six: 1.16.0
    - sniffio: 1.3.0
    - soupsieve: 2.5
    - starlette: 0.27.0
    - starsessions: 1.3.0
    - sympy: 1.12
    - tensorboard: 2.15.1
    - tensorboard-data-server: 0.7.2
    - tensorboardx: 2.6.2.2
    - threadpoolctl: 3.2.0
    - torch: 2.1.0+cu118
    - torchaudio: 2.1.0+cu118
    - torchmetrics: 1.2.0
    - torchvision: 0.16.0+cu118
    - tqdm: 4.66.1
    - traitlets: 5.13.0
    - types-python-dateutil: 2.8.19.14
    - typeshed-client: 2.4.0
    - typing-extensions: 4.8.0
    - tzdata: 2023.3
    - urllib3: 1.26.13
    - uvicorn: 0.24.0.post1
    - wcwidth: 0.2.9
    - websocket-client: 1.6.4
    - websockets: 11.0.3
    - werkzeug: 3.0.1
    - wrapt: 1.15.0
    - yarl: 1.9.2
  • System:
    - OS: Windows
    - architecture:
    - 64bit
    - WindowsPE
    - processor: AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD
    - python: 3.11.2
    - release: 10
    - version: 10.0.19045

More info

No response

@Jason94 Jason94 added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Nov 8, 2023
@awaelchli
Copy link
Contributor

awaelchli commented Nov 8, 2023

Fixed in #18942 😊
Thanks for the report. Sorry, we don't have many windows developers here so it just fell through the cracks. And our Windows CI machines were happy to make the symlinks so we didn't notice this limitation at first.

@awaelchli awaelchli added duplicate This issue or pull request already exists and removed needs triage Waiting to be triaged by maintainers labels Nov 8, 2023
@Jason94
Copy link
Author

Jason94 commented Nov 8, 2023

Thanks for the quick fix! Sorry I didn't find your PR in my search :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists ver: 2.1.x
Projects
None yet
Development

No branches or pull requests

2 participants