Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Framework-independent DP model format #2982

Closed
24 of 37 tasks
Tracked by #3122
njzjz opened this issue Nov 8, 2023 · 1 comment
Closed
24 of 37 tasks
Tracked by #3122

[Feature Request] Framework-independent DP model format #2982

njzjz opened this issue Nov 8, 2023 · 1 comment
Assignees
Milestone

Comments

@njzjz
Copy link
Member

njzjz commented Nov 8, 2023

Summary

Implement a framework-independent DP model format.

Detailed Description

Background

Currently, the DP model file is dependent on the deep learning framework. The TensorFlow model is in ProtoBuf format (.pb), while the developing PyTorch model is in .pt format. These two files are hard to convert between each other. The ONNX package aims to do it on the OP level, but it is limited since both TensorFlow and PyTorch have lots of unsupported OPs, and DP models may have customized OPs.

The DeePMD-kit needs to implement a framework-independent DP model format to have multiple backend support, as described below. Different frameworks are expected to behave similarly for the same model data.

Data structure

  1. The model data is based on the current input parameters, ensuring alignment for each framework. Unimplemented parameters should also be aligned, and the framework raises a NotImplementedError during runtime.

  2. Add a @variables key to each layer's dictionary, with a type of dict[str, np.ndarray], to store network parameters corresponding to what is needed to be restored in the current init_frz_model (which currently ensures complete restoration). "@variables" has a special character @ and should be a reserved name and avoided in the future. The keys of @variables should be aligned for all frameworks. Type embedding should be explicitly written and not hidden.

{
    "argument1": ...,
    "@variables": { 
        "variable1": ..., 
    }
}
  1. Add the following meta-information at the top level: (1) Software, version, and module used to generate the model file. (2) Generation time. (3) A unified model definition version for all frameworks.
{
    "model": ...,
    "software": ...,
    "software_version": ...,
    "time": ...,
    "model_version": ...,
}

Data storage

HDF5 file is used to store data. h5py is a dependency of TensorFlow, PyTorch, and the existing DeePMD-kit, so this doesn't bring extra dependencies.

  1. All variables are stored in the HDF5 file using a unique path. The json path is preserved and should not be used.
  2. The JSON file is stored in the json path, where the type of @variables is dict[str, str]. The value of the @variables dict is the path to the variable, which could be different among different platforms.
  3. Convert dict[str, np.ndarray] to dict[str, str] when saving the model and convert it back when restoring it.

Binding with class

Add deserialize (methodclass) and serialize to each class. The parent class should call the method of subclass. The implementation should follow dpdispacher:

https://github.com/deepmodeling/dpdispatcher/blob/065731a60be3b58979b54f1d33562ef189800158/dpdispatcher/submission.py#L97-L166

The deserialize (methodclass) and serialize of the top class can be called by external modules.

Progress

Further Information, Files, and Links

No response

@njzjz
Copy link
Member Author

njzjz commented May 24, 2024

Close, considering the framework has been set up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

1 participant