Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: improve multi-backend documentation #3875

Merged
merged 4 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion doc/freeze/freeze.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Freeze a model

The trained neural network is extracted from a checkpoint and dumped into a protobuf(.pb) file. This process is called "freezing" a model. The idea and part of our code are from [Morgan](https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc). To freeze a model, typically one does
The trained neural network is extracted from a checkpoint and dumped into a model file. This process is called "freezing" a model.
To freeze a model, typically one does

::::{tab-set}

Expand All @@ -11,6 +12,7 @@ $ dp freeze -o model.pb
```

in the folder where the model is trained. The output model is called `model.pb`.
The idea and part of our code are from [Morgan](https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc).

:::

Expand Down
18 changes: 17 additions & 1 deletion doc/model/sel.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,26 @@ All descriptors require to set `sel`, which means the expected maximum number of

To determine a proper `sel`, one can calculate the neighbor stat of the training data before training:

::::{tab-set}

:::{tab-item} TensorFlow {{ tensorflow_icon }}

```sh
dp neighbor-stat -s data -r 6.0 -t O H
dp --tf neighbor-stat -s data -r 6.0 -t O H
njzjz marked this conversation as resolved.
Show resolved Hide resolved
```

:::

:::{tab-item} PyTorch {{ pytorch_icon }}

```sh
dp --pt neighbor-stat -s data -r 6.0 -t O H
```

:::

::::

where `data` is the directory of data, `6.0` is the cutoff radius, and `O` and `H` is the type map. The program will give the `max_nbor_size`. For example, `max_nbor_size` of the water example is `[38, 72]`, meaning an atom may have 38 O neighbors and 72 H neighbors in the training data.

The `sel` should be set to a higher value than that of the training data, considering there may be some extreme geometries during MD simulations. As a result, we set `sel` to `[46, 92]` in the water example.
47 changes: 41 additions & 6 deletions doc/model/train-fitting-dos.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Fit electronic density of states (DOS) {{ tensorflow_icon }}
# Fit electronic density of states (DOS) {{ tensorflow_icon }} {{ pytorch_icon }} {{ dpmodel_icon }}

:::{note}
**Supported backends**: TensorFlow {{ tensorflow_icon }}
**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}, DP {{ dpmodel_icon }}
:::

Here we present an API to DeepDOS model, which can be used to fit electronic density of state (DOS) (which is a vector).
Expand Down Expand Up @@ -82,10 +82,26 @@ To prepare the data, we recommend shifting the DOS data by the Fermi level.

The training command is the same as `ener` mode, i.e.

::::{tab-set}

:::{tab-item} TensorFlow {{ tensorflow_icon }}

```bash
dp --tf train input.json
```
njzjz marked this conversation as resolved.
Show resolved Hide resolved

:::

:::{tab-item} PyTorch {{ pytorch_icon }}

```bash
dp train input.json
dp --pt train input.json
```

:::

::::

The detailed loss can be found in `lcurve.out`:

```
njzjz marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -117,14 +133,33 @@ The detailed loss can be found in `lcurve.out`:

In this earlier version, we can use `dp test` to infer the electronic density of state for given frames.

::::{tab-set}

:::{tab-item} TensorFlow {{ tensorflow_icon }}

```bash

dp --tf freeze -o frozen_model.pb

dp --tf test -m frozen_model.pb -s ../data/111/$k -d ${output_prefix} -a -n 100
```

:::

:::{tab-item} PyTorch {{ pytorch_icon }}

```bash

$DP freeze -o frozen_model.pb
dp --pt freeze -o frozen_model.pth

$DP test -m frozen_model.pb -s ../data/111/$k -d ${output_prefix} -a -n 100
dp --pt test -m frozen_model.pth -s ../data/111/$k -d ${output_prefix} -a -n 100
```

if `dp test -d ${output_prefix} -a` is specified, the predicted DOS and atomic DOS for each frame is output in the working directory
:::

::::

if `dp test -d ${output_prefix} -a` is specified, the predicted DOS and atomic DOS for each frame are output in the working directory

```
${output_prefix}.ados.out.0 ${output_prefix}.ados.out.1 ${output_prefix}.ados.out.2 ${output_prefix}.ados.out.3
Expand Down
2 changes: 1 addition & 1 deletion doc/test/model-deviation.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ One can also use a subcommand to calculate the deviation of predicted forces or
dp model-devi -m graph.000.pb graph.001.pb graph.002.pb graph.003.pb -s ./data -o model_devi.out
```

where `-m` specifies graph files to be calculated, `-s` gives the data to be evaluated, `-o` the file to which model deviation results is dumped. Here is more information on this sub-command:
where `-m` specifies model files to be calculated, `-s` gives the data to be evaluated, `-o` the file to which model deviation results are dumped. Here is more information on this sub-command:

```bash
usage: dp model-devi [-h] [-v {DEBUG,3,INFO,2,WARNING,1,ERROR,0}]
Expand Down
2 changes: 1 addition & 1 deletion doc/third-party/gromacs.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ Then, in your working directories, we have to write `input.json` file:

Here is an explanation for these settings:

- `graph_file` : The graph file (with suffix .pb) generated by `dp freeze` command
- `graph_file` : The [model file](../backend.md) generated by `dp freeze` command
- `type_file` : File to specify DP atom types (in space-separated format). Here, `type.raw` looks like

```
Expand Down
1 change: 1 addition & 0 deletions doc/third-party/lammps-command.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ pair_style deepmd models ... keyword value ...
pair_style deepmd graph.pb
pair_style deepmd graph.pb fparam 1.2
pair_style deepmd graph_0.pb graph_1.pb graph_2.pb out_file md.out out_freq 10 atomic relative 1.0
pair_style deepmd graph_0.pb graph_1.pth out_file md.out out_freq 100
pair_coeff * * O H

pair_style deepmd cp.pb fparam_from_compute TEMP
Expand Down
4 changes: 2 additions & 2 deletions doc/train/training-advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,9 +170,9 @@ One can set other environmental variables:
| DP_AUTO_PARALLELIZATION | 0, 1 | 0 | Enable auto parallelization for CPU operators. |
| DP_JIT | 0, 1 | 0 | Enable JIT. Note that this option may either improve or decrease the performance. Requires TensorFlow supports JIT. |

## Adjust `sel` of a frozen model
## Adjust `sel` of a frozen model {{ tensorflow_icon }}
njzjz marked this conversation as resolved.
Show resolved Hide resolved

One can use `--init-frz-model` features to adjust (increase or decrease) [`sel`](../model/sel.md) of a existing model. Firstly, one needs to adjust [`sel`](./train-input.rst) in `input.json`. For example, adjust from `[46, 92]` to `[23, 46]`.
One can use `--init-frz-model` features to adjust (increase or decrease) [`sel`](../model/sel.md) of an existing model. Firstly, one needs to adjust [`sel`](./train-input.rst) in `input.json`. For example, adjust from `[46, 92]` to `[23, 46]`.

```json
"model": {
Expand Down
18 changes: 17 additions & 1 deletion doc/train/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,26 @@ $ cd $deepmd_source_dir/examples/water/se_e2_a/

After switching to that directory, the training can be invoked by

::::{tab-set}

:::{tab-item} TensorFlow {{ tensorflow_icon }}

```bash
$ dp train input.json
$ dp --tf train input.json
```

:::

:::{tab-item} PyTorch {{ pytorch_icon }}

```bash
$ dp --pt train input.json
```

:::

::::

where `input.json` is the name of the input script.

By default, the verbosity level of the DeePMD-kit is `INFO`, one may see a lot of important information on the code and environment showing on the screen. Among them two pieces of information regarding data systems are worth special notice.
Expand Down
21 changes: 20 additions & 1 deletion doc/troubleshooting/howtoset_num_nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,13 +72,32 @@ There is no one general parallel configuration that works for all situations, so
Here are some empirical examples.
If you wish to use 3 cores of 2 CPUs on one node, you may set the environmental variables and run DeePMD-kit as follows:

::::{tab-set}

:::{tab-item} TensorFlow {{ tensorflow_icon }}

```bash
export OMP_NUM_THREADS=3
export DP_INTRA_OP_PARALLELISM_THREADS=3
export DP_INTER_OP_PARALLELISM_THREADS=2
dp --tf train input.json
```

:::

:::{tab-item} PyTorch {{ pytorch_icon }}

```bash
export OMP_NUM_THREADS=3
export DP_INTRA_OP_PARALLELISM_THREADS=3
export DP_INTER_OP_PARALLELISM_THREADS=2
dp train input.json
dp --pt train input.json
```

:::

::::

For a node with 128 cores, it is recommended to start with the following variables:

```bash
Expand Down