Skip to content

v3.0.0a0

Pre-release
Pre-release
Compare
Choose a tag to compare
@njzjz njzjz released this 03 Mar 09:22
· 568 commits to devel since this release
ec32340

DeePMD-kit v3: A multiple-backend framework for deep potentials

We are excited to announce the first alpha version of DeePMD-kit v3. DeePMD-kit v3 allows you to train and run deep potential models on top of TensorFlow or PyTorch. DeePMD-kit v3 also supports the DPA-2 model, a novel architecture for large atomic models.

Highlights

Multiple-backend framework

image

DeePMD-kit v3 adds a pluggable multiple-backend framework to provide consistent training and inference experiences between different backends. You can:

  • Use the same training data and the input script to train a deep potential model with different backends. Switch backends based on efficiency, functionality, or convenience:
# Training a model using the TensorFlow backend
dp --tf train input.json
dp --tf freeze

# Training a mode using the PyTorch backend
dp --pt train input.json
dp --pt freeze
  • Use any model to perform inference via any existing interfaces, including dp test, Python/C++/C interface, and third-party packages (dpdata, ASE, LAMMPS, AMBER, Gromacs, i-PI, CP2K, OpenMM, ABACUS, etc). Take an example on LAMMPS:
# run LAMMPS with a TensorFlow backend model
pair_style deepmd frozen_model.pb
# run LAMMPS with a PyTorch backend model
pair_style deepmd frozen_model.pth
# Calculate model deviation using both models
pair_style deepmd frozen_model.pb frozen_model.pth out_file md.out out_freq 100
  • Convert models between backends, using dp convert-backend, if both backends support a model:
dp convert-backend frozen_model.pb frozen_model.pth
dp convert-backend frozen_model.pth frozen_model.pb
  • Add a new backend to DeePMD-kit much more quickly if you want to contribute to DeePMD-kit.

PyTorch backend: a backend designed for large atomic models and new research

We added the PyTorch backend in DeePMD-kit v3 to support the development of new models, especially for large atomic models.

DPA-2 model: Towards a universal large atomic model for molecular and material simulation

DPA-2 model is a novel architecture for Large Atomic Model (LAM) and can accurately represent a diverse range of chemical systems and materials, enabling high-quality simulations and predictions with significantly reduced efforts compared to traditional methods. The DPA-2 model is only implemented in the PyTorch backend. An example configuration is in the examples/water/dpa2 directory.

The DPA-2 descriptor includes two primary components: repinit and repformer. The detailed architecture is shown in the following figure.

DPA-2

Training strategies for large atomic models

The PyTorch backend has supported multiple training strategies to develop large atomic models.

Parallel training: Large atomic models have a number of hyper-parameters and complex architecture, so training a model on multiple GPUs is necessary. Benefiting from the PyTorch community ecosystem, the parallel training for the PyTorch backend can be driven by torchrun, a launcher for distributed data parallel.

torchrun --nproc_per_node=4 --no-python dp --pt train input.json

Multi-task training: Large atomic models are trained against data in a wide scope and at different DFT levels, which requires multi-task training. The PyTorch backend supports multi-task training, sharing the descriptor between different An example is given in examples/water_multi_task/pytorch_example/input_torch.json.

Finetune: Fine-tune is useful to train a pre-train large model on a smaller, task-specific dataset. The PyTorch backend has supported --finetune argument in the dp --pt train command line.

Developing new models using Python and dynamic graphs

Researchers may feel pain about the static graph and the custom C++ OPs from the TensorFlow backend, which sacrifices research convenience for computational performance. The PyTorch backend has a well-designed code structure written using the dynamic graph, which is currently 100% written with the Python language, making extending and debugging new deep potential models easier than the static graph.

Supporting traditional deep potential models

People may still want to use the traditional models already supported by the TensorFlow backend in the PyTorch backend and compare the same model among different backends. We almost rewrote all of the traditional models in the PyTorch backend, which are listed below:

  • Features supported:
    • Descriptor: se_e2_a, se_e2_r, se_atten, hybrid;
    • Fitting: energy, dipole, polar, fparam/apram support
    • Model: standard, DPRc
    • Python inference interface
    • C++ inference interface for energy only
    • TensorBoard
  • Features not supported yet:
    • Descriptor: se_e3, se_atten_v2, se_e2_a_mask
    • Fitting: dos
    • Model: linear_ener, DPLR, pairtab, linear_ener, frozen, pairwise_dprc, ZBL, Spin
    • Model compression
    • Python inference interface for DPLR
    • C++ inference interface for tensors and DPLR
    • Paralleling training using Horovod
  • Features not planned:
    • Descriptor: loc_frame, se_e2_a + type embedding, se_a_ebd_v2
    • NVNMD

Warning

As part of an alpha release, the PyTorch backend's API or user input arguments may change before the first stable version.

DP backend and format: reference backend for other backends

DP is a reference backend for development that uses pure NumPy to implement models without using any heavy deep-learning frameworks. It cannot be used for training but only for Python inference. As a reference backend, it is not aimed at the best performance but only the correct results. The DP backend uses HDF5 to store model serialization data, which is backend-independent.
The DP backend and the serialization data are used in the unit test to ensure different backends have consistent results and can be converted between each other.
In the current version, the DP backend has a similar supporting status to the PyTorch backend, while DPA-1 and DPA-2 are not supported yet.

Authors

The above highlights were mainly contributed by

Breaking changes

  • Python 3.7 support is dropped. by @njzjz in #3185
  • We require all model files to have the correct filename extension for all interfaces so a corresponding backend can load them. TensorFlow model files must end with .pb extension.
  • Python class DeepTensor (including DeepDiople and DeepPolar) now returns atomic tensor in the dimension of natoms instead of nsel_atoms. by @njzjz in #3390
  • For developers: the Python module structure is fully refactored. The old deepmd module was moved to deepmd.tf without other API changes, and deepmd_utils was moved to deepmd without other API changes. by @njzjz in #3177, #3178

Other changes

Enhancement

  • Neighbor stat for the TensorFlow backend is 80x accelerated. by @njzjz in #3275
  • i-PI: remove normalize_coord by @njzjz in #3257
  • LAMMPS: fix_dplr.cpp delete redundant setup and set atom->image when pre_force by @shiruosong in #3344, #3345
  • Bump scikit-build-core to 0.8 by @njzjz in #3369
  • Bump LAMMPS to stable_2Aug2023_update3 by @njzjz in #3399
  • Add fparam/aparam support for fine-tune by @njzjz in #3313
  • TF: remove freeze warning for optional nodes by @njzjz in #3381

CI/CD

Bugfix

  • Fix TF 2.16 compatibility by @njzjz in #3343
  • Detect version in advance before building deepmd-kit-cu11 by @njzjz in #3172
  • C API: change the required shape of electric field to nloc * 3 by @njzjz in #3237

New Contributors

Full Changelog: v2.2.8...v3.0.0a0