Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safetensors #7812

Closed
wants to merge 10 commits into from
Closed

Add safetensors #7812

wants to merge 10 commits into from

Conversation

coldwaterq
Copy link

What does this PR do ?

By default, state_dicts are being saved as pytorch files which contain pickles. This is a security concern which this PR attempts to start the mitigation for while preserving backwards compatability.

Collection: all, but none specifically (core functionality)

Changelog

  • add nemo.utils.secure.torch_load and nemo.utils.secure.torch_save
  • torch.load and torch.save removed from save_restore_connector and alternates of it
  • secure.torch_load and torch_save use safetensors when available, however if safe=False torch.load and torch.save are used to allow for backwards compatability
  • safe is a named parameter and passed through from save_to and restore_from. The default is safe=False to preserve backwards compatability
  • added unit tests to double check backwards compatability and secure functions work correctly.

Usage

  • When using nemo with untrusted .nemo files this will greatly reduce the potential for the user to be attacked.
# Existing usage should be preserved. although the result will include a .safetensor inside the .nemo now as well
model.save_to(save_path=model_save_path)
model.save_to(save_path=model_save_path, safe=False)

# the .nemo will only include the .safetensor.
model.save_to(save_path=model_save_path, safe=True)

# Existing usage should be preserved. although if the .nemo contains a .safetensor, it will be used instead of the pytorch version
model.restore_from(model_save_path)
model.restore_from(save_path=model_save_path, safe=False)

# The restore will fail if a .safetensor isn't available to prevent an untrusted .nemo file from resulting in an exploit
model.restore_from(save_path=model_save_path, safe=False)

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@github-actions github-actions bot added core Changes to NeMo Core ASR NLP labels Oct 26, 2023
Zhilin123 and others added 8 commits October 26, 2023 12:23
* update text server to support compute logprobs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

---------

Signed-off-by: Zhilin Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: coldwaterq <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: coldwaterq <[email protected]>
Still running tests to make sure nothing is negatively impacted.

Signed-off-by: coldwaterq <[email protected]>
Signed-off-by: coldwaterq <[email protected]>
@@ -464,7 +464,7 @@
)
super().__init__()

def save_to(self, model, save_path: str):
def save_to(self, model, save_path: str, safe: bool = False):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
save_path = save_path.replace(".nemo", "_XYZ.nemo")
super().save_to(model, save_path, safe=safe)

class MockModelV2(MockModel):

Check notice

Code scanning / CodeQL

Unused local variable

Variable MockModelV2 is not used.
@coldwaterq
Copy link
Author

/jenkins

@coldwaterq
Copy link
Author

/jenkins

@coldwaterq
Copy link
Author

jenkins

2 similar comments
@ericharper
Copy link
Collaborator

jenkins

@ericharper
Copy link
Collaborator

jenkins

Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Nov 14, 2023
@titu1994 titu1994 removed the stale label Nov 14, 2023
Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Nov 29, 2023
Copy link
Contributor

github-actions bot commented Dec 6, 2023

This PR was closed because it has been inactive for 7 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR core Changes to NeMo Core NLP stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants