Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FutureWarning: You are using torch.load with weights_only=False #1429

Open
mskaif opened this issue Oct 23, 2024 · 10 comments
Open

FutureWarning: You are using torch.load with weights_only=False #1429

mskaif opened this issue Oct 23, 2024 · 10 comments
Labels

Comments

@mskaif
Copy link

mskaif commented Oct 23, 2024

Describe the bug
FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state = torch.load(filename, lambda storage, loc: storage)

This warning is triggered by all torch.load used in stanza. The issue does not cause any problem with data processing at the moment but the long warnings are distracting.

To Reproduce
Steps to reproduce the behavior:

  1. upgrade torch to 2.4.1

Expected behavior
no error

Environment (please complete the following information):

  • OS: Windows
  • Python version: python 3.12.7
  • Stanza version: 1.9.2
@mskaif mskaif added the bug label Oct 23, 2024
@mskaif
Copy link
Author

mskaif commented Oct 23, 2024

The error can be suppressed by using the following before calling stanza functions but is not a solution

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

source: ultralytics/ultralytics#14994 (comment)

@AngledLuffa
Copy link
Collaborator

AngledLuffa commented Oct 23, 2024 via email

AngledLuffa added a commit that referenced this issue Oct 24, 2024
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
AngledLuffa added a commit that referenced this issue Oct 24, 2024
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
@AngledLuffa
Copy link
Collaborator

Some of the models can be updated to use weights_only=True right away, but others require resaving with enums or other data structures removed. Will have to investigate some more.

AngledLuffa added a commit that referenced this issue Oct 24, 2024
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
AngledLuffa added a commit that referenced this issue Oct 24, 2024
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
@mskaif
Copy link
Author

mskaif commented Oct 25, 2024

Some of the models can be updated to use weights_only=True right away, but others require resaving with enums or other data structures removed. Will have to investigate some more.

sorry for not getting back earlier. I'm using the built-in models like so:
STANZA_PIPE = stanza.Pipeline(
lang="en",
dir=settings.STANZA_DATA_DIR,
processors="tokenize,mwt,pos",
download_method=None,
use_gpu=False,
)

affected from the pipeline are:
tokenization\trainer.py:82
mwt\trainer.py:201
pos\trainer.py:139
common\pretrain.py:56
common\char_model.py:271

Thank you for the commit!

AngledLuffa added a commit that referenced this issue Oct 25, 2024
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
AngledLuffa added a commit that referenced this issue Oct 27, 2024
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
AngledLuffa added a commit that referenced this issue Oct 28, 2024
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
AngledLuffa added a commit that referenced this issue Oct 28, 2024
…re in the save files, use weights_only=True. Significantly cuts down on the number of torch warnings. #1429
@dvrogozh
Copy link

dvrogozh commented Dec 7, 2024

Please, be aware that on pytorch 2.6 this warning will become an error. That got reported to pytorch as:

I posted more details in pytorch/pytorch#142123 (comment), but shortly huggingface/transformers#34632 PR on pytorch side has flipped the default of weights_only from False to True in the upcoming pytorch 2.6.

You can consider to add explicit list of allowed safe globals following similar approach which was done in Huggingface Transformers and Accelerate. For the reference, see:

@AngledLuffa
Copy link
Collaborator

I am finishing up some model training and will be able to make a new release with the updated models soon.

@dvrogozh
Copy link

dvrogozh commented Dec 7, 2024

@AngledLuffa : note that at the moment the failure reported in pytorch/pytorch#142123 is not fixed in the latest stanza from main branch (I tried 539760c - see log below). The repro is with:

import stanza
pos_pipeline = stanza.Pipeline(lang='en', processors='tokenize,pos', use_gpu=True, device='xpu')
sentence = "Some sentence"
pos_pipeline(sentence)

The #1430 previously merged in stanza is not enough to handle this case. The failure happens on this torch.load():

data = torch.load(self.filename, lambda storage, loc: storage)

Full log:

2024-12-06 16:29:16 INFO: Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES
Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.9.0.json: 392kB [00:00, 70.6MB/s]
2024-12-06 16:29:17 INFO: Downloaded file to /home/dvrogozh/stanza_resources/resources.json
2024-12-06 16:29:17 WARNING: Language en package default expects mwt, which has been added
2024-12-06 16:29:17 INFO: Loading these models for language: en (English):
===============================
| Processor | Package         |
-------------------------------
| tokenize  | combined        |
| mwt       | combined        |
| pos       | combined_charlm |
===============================

2024-12-06 16:29:17 INFO: Using device: xpu
2024-12-06 16:29:17 INFO: Loading: tokenize
2024-12-06 16:29:18 INFO: Loading: mwt
2024-12-06 16:29:18 INFO: Loading: pos
/home/dvrogozh/git/pytorch/pytorch/torch/_weights_only_unpickler.py:515: UserWarning: Detected pickle protocol 3 in the checkpoint, which was not the default pickle protocol used by `torch.load` (2). The weights_only Unpickler might not support all instructions implemented by this protocol, please file an issue for adding support if you encounter this.
  warnings.warn(
Traceback (most recent call last):
  File "/home/dvrogozh/tmp/st.py", line 3, in <module>
    pos_pipeline = stanza.Pipeline(lang='en', processors='tokenize,pos', use_gpu=True, device='xpu')
  File "/home/dvrogozh/git/stanza/stanza/pipeline/core.py", line 308, in __init__
    self.processors[processor_name] = NAME_TO_PROCESSOR_CLASS[processor_name](config=curr_processor_config,
  File "/home/dvrogozh/git/stanza/stanza/pipeline/processor.py", line 193, in __init__
    self._set_up_model(config, pipeline, device)
  File "/home/dvrogozh/git/stanza/stanza/pipeline/pos_processor.py", line 32, in _set_up_model
    self._trainer = Trainer(pretrain=self.pretrain, model_file=config['model_path'], device=device, args=args, foundation_cache=pipeline.foundation_cache)
  File "/home/dvrogozh/git/stanza/stanza/models/pos/trainer.py", line 34, in __init__
    self.load(model_file, pretrain, args=args, foundation_cache=foundation_cache)
  File "/home/dvrogozh/git/stanza/stanza/models/pos/trainer.py", line 174, in load
    emb_matrix = pretrain.emb
  File "/home/dvrogozh/git/stanza/stanza/models/common/pretrain.py", line 50, in emb
    self.load()
  File "/home/dvrogozh/git/stanza/stanza/models/common/pretrain.py", line 56, in load
    data = torch.load(self.filename, lambda storage, loc: storage)
  File "/home/dvrogozh/git/pytorch/pytorch/torch/serialization.py", line 1480, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
        (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
        (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
        WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
(pytorch.xpu) dvrogozh@willow-spr03:~/tmp$ cat st.py
import stanza

pos_pipeline = stanza.Pipeline(lang='en', processors='tokenize,pos', use_gpu=True, device='xpu')
sentence = "Some sentence"
pos_pipeline(sentence)

@AngledLuffa
Copy link
Collaborator

Got it, but that's the main branch. The updates merged in are in the dev branch, which at that line has torch.load(... weights_only=True)

@dvrogozh
Copy link

dvrogozh commented Dec 7, 2024

Got it, but that's the main branch. The updates merged in are in the dev branch, which at that line has torch.load(... weights_only=True)

Ah, sorry. I missed that.

@stanfordnlp stanfordnlp deleted a comment from huiyan2021 Dec 17, 2024
@AngledLuffa
Copy link
Collaborator

This should now be pushed in v1.10.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants