Eastsheng's Wiki

DeepMD 快速上手

2024-09-24 12:18:28

[toc]

将以气态甲烷分子为例,详细介绍Deep Potential (DP)模型的训练和应用。

手动安装cpu版

1
2
3
wget https://github.com/deepmodeling/deepmd-kit/releases/download/v3.0.0b3/deepmd-kit-3.0.0b3-cpu-Linux-x86_64.sh
bash deepmd-kit-3.0.0b3-cpu-Linux-x86_64.sh
conda activate /home/xxx/softwares/deepmd-kit

安装过程中可以自定义路径 /home/xxx/softwares/miniconda/miniconda3/envs/deepmd-kit

激活环境

数据准备

1
2
3
4
wget https://bohrium-api.dp.tech/ds-dl/DeePMD-kit-Tutorial-a8z5-v1.zip
unzip DeePMD-kit-Tutorial-a8z5-v1.zip
cd DeePMD-kit_Tutorial
tree -L 1

.
├── 00.data : 存储训练和测试数据
├── 01.train:包含使用DeePMD-kit训练模型的示例脚本
├── 01.train.finished:包括训练过程的完整结果
├── 02.lmp:包含使用LAMMPS进行分子动力学模拟的示例脚本
└── 02.lmp.finished

5 directories, 0 files

1
tree 00.data/ -L 1

00.data/
├── abacus_md:包含通过ABACUS从头分子动力学(AIMD)模拟获得的数据
├── training_data
└── validation_data

3 directories, 0 files

DeePMD-kit的训练数据来源于第一性原理计算数据,包括atomic types, simulation cells, atomic coordinates, atomic forces, system energies, and virials.

采用”dpdata”工具划分数据:

1
2
pip install dpdata
python split_data.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# split_data.py
import dpdata
import numpy as np

# load data of abacus/md format
data = dpdata.LabeledSystem("../DeePMD-kit_Tutorial/00.data/abacus_md", fmt="abacus/md")
print("# the data contains %d frames" % len(data))

# random choose 40 index for validation_data
rng = np.random.default_rng()
index_validation = rng.choice(201, size=40, replace=False)

# other indexes are training_data
index_training = list(set(range(201)) - set(index_validation))
data_training = data.sub_system(index_training)
data_validation = data.sub_system(index_validation)

# all training data put into directory:"training_data"
data_training.to_deepmd_npy("../DeePMD-kit_Tutorial/00.data/training_data")

# all validation data put into directory:"validation_data"
data_validation.to_deepmd_npy("../DeePMD-kit_Tutorial/00.data/validation_data")

print("# the training data contains %d frames" % len(data_training))
print("# the validation data contains %d frames" % len(data_validation))

# the data contains 201 frames
# the training data contains 161 frames
# the validation data contains 40 frames

00.data/training_data/
├── set.000:存放压缩格式数据(NumPy压缩数组)的目录
├── type_map.raw:它是一个包含原子类型(表示为整数)的文件
└── type.raw:它是一个包含原子类型名称的文件

1 directory, 2 files

输入脚本准备

数据准备完成后,可以开始训练

现在转到training 目录。DeePMD-kit需要一个json格式的文件来指定训练参数。

1
pip show dargs || pip install --upgrade dargs
1
2
3
4
5
6
7
# input_pre.py
# Show input.json
from deepmd.utils.argcheck import gen_args
from dargs.notebook import JSON

with open("../DeePMD-kit_Tutorial/01.train/input.json") as f:
JSON(f.read(), gen_args())
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
{
"_comment": " model parameters",
"model": {
"type_map": ["H", "C"],
"descriptor" :{
"type": "se_e2_a", # 描述符的类型
"sel": "auto",
"rcut_smth": 0.50, # 平滑开始的位置
"rcut": 6.00,
"neuron": [25, 50, 100], # 嵌入 神经 网络 的 大小
"resnet_dt": false,
"axis_neuron": 16, # G (嵌入 矩阵) 的 子矩阵 的 大小
"seed": 1,
"_comment": " that's all"
},
"fitting_net" : {
"neuron": [240, 240, 240], # 拟合 神经 网络 的 大小
"resnet_dt": true,
"seed": 1,
"_comment": " that's all"
},
"_comment": " that's all"
},

"learning_rate" :{
"type": "exp",
"decay_steps": 50,
"start_lr": 0.001,
"stop_lr": 3.51e-8,
"_comment": "that's all"
},

"loss" :{
"type": "ener",
"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0,
"_comment": " that's all"
},

"training" : { # 训练参数
"training_data": {
"systems": ["../00.data/training_data"],
"batch_size": "auto",
"_comment": "that's all"
},
"validation_data":{
"systems": ["../00.data/validation_data"],
"batch_size": "auto",
"numb_btch": 1,
"_comment": "that's all"
},
"numb_steps": 10000,
"seed": 10,
"disp_file": "lcurve.out",
"disp_freq": 200,
"save_freq": 1000,
"_comment": "that's all"
},

"_comment": "that's all"
}

模型训练

运行 DeePMD-kit 来开始训练

1
2
cd ./01.train
dp train input.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# screen输出
[2024-09-24 12:48:59,897] DEEPMD INFO batch 0: trn: rmse = 1.86e+01, rmse_e = 1.35e-01, rmse_f = 5.87e-01, lr = 1.00e-03
[2024-09-24 12:48:59,897] DEEPMD INFO batch 0: val: rmse = 1.91e+01, rmse_e = 1.34e-01, rmse_f = 6.05e-01
[2024-09-24 12:49:11,313] DEEPMD INFO batch 200: trn: rmse = 4.95e+00, rmse_e = 1.99e+00, rmse_f = 1.59e-01, lr = 8.15e-04
[2024-09-24 12:49:11,314] DEEPMD INFO batch 200: val: rmse = 5.72e+00, rmse_e = 1.99e+00, rmse_f = 1.88e-01
[2024-09-24 12:49:11,314] DEEPMD INFO batch 200: total wall time = 11.68 s
[2024-09-24 12:49:23,078] DEEPMD INFO batch 400: trn: rmse = 4.44e+00, rmse_e = 5.28e-01, rmse_f = 1.70e-01, lr = 6.63e-04
[2024-09-24 12:49:23,078] DEEPMD INFO batch 400: val: rmse = 2.95e+00, rmse_e = 5.24e-01, rmse_f = 1.11e-01
[2024-09-24 12:49:23,078] DEEPMD INFO batch 400: total wall time = 11.76 s
[2024-09-24 12:49:36,968] DEEPMD INFO batch 600: trn: rmse = 2.00e+00, rmse_e = 1.01e-01, rmse_f = 8.59e-02, lr = 5.40e-04
[2024-09-24 12:49:36,968] DEEPMD INFO batch 600: val: rmse = 2.70e+00, rmse_e = 1.03e-01, rmse_f = 1.16e-01
[2024-09-24 12:49:36,968] DEEPMD INFO batch 600: total wall time = 13.89 s
[2024-09-24 12:49:48,904] DEEPMD INFO batch 800: trn: rmse = 1.25e+00, rmse_e = 4.03e-02, rmse_f = 5.93e-02, lr = 4.40e-04
[2024-09-24 12:49:48,904] DEEPMD INFO batch 800: val: rmse = 1.85e+00, rmse_e = 4.16e-02, rmse_f = 8.83e-02
[2024-09-24 12:49:48,904] DEEPMD INFO batch 800: total wall time = 11.94 s
[2024-09-24 12:50:01,073] DEEPMD INFO batch 1000: trn: rmse = 1.62e+00, rmse_e = 1.60e-02, rmse_f = 8.56e-02, lr = 3.59e-04
[2024-09-24 12:50:01,074] DEEPMD INFO batch 1000: val: rmse = 1.56e+00, rmse_e = 1.57e-02, rmse_f = 8.23e-02
[2024-09-24 12:50:01,074] DEEPMD INFO batch 1000: total wall time = 12.17 s
[2024-09-24 12:50:01,347] DEEPMD INFO saved checkpoint model.ckpt
...
[2024-09-24 12:59:24,280] DEEPMD INFO batch 10000: total wall time = 15.22 s
[2024-09-24 12:59:24,565] DEEPMD INFO saved checkpoint model.ckpt
[2024-09-24 12:59:24,565] DEEPMD INFO average training time: 0.0616 s/batch (exclude first 200 batches)
[2024-09-24 12:59:24,565] DEEPMD INFO finished training
[2024-09-24 12:59:24,565] DEEPMD INFO wall time: 625.217 s
WARNING:tensorflow:disable_mixed_precision_graph_rewrite() called when mixed precision is already disabled.
1
2
3
4
5
6
7
8
# 输出lcurve.out文件
# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr
# If there is no available reference data, rmse_*_{val,trn} will print nan
0 1.91e+01 1.86e+01 1.34e-01 1.35e-01 6.05e-01 5.87e-01 1.0e-03
200 5.72e+00 4.95e+00 1.99e+00 1.99e+00 1.88e-01 1.59e-01 8.1e-04
400 2.95e+00 4.44e+00 5.24e-01 5.28e-01 1.11e-01 1.70e-01 6.6e-04
600 2.70e+00 2.00e+00 1.03e-01 1.01e-01 1.16e-01 8.59e-02 5.4e-04
800 1.85e+00 1.25e+00 4.16e-02 4.03e-02 8.83e-02 5.93e-02 4.4e-04
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 绘图
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# 设置Mathtext字体,可以选择合适的字体
plt.rcParams['mathtext.fontset'] = 'custom'
plt.rcParams['mathtext.rm'] = 'Arial' # 使用Arial字体作为Mathtext字体

# 在这之后进行绘图
path = "./01.train/"
with open(f"{path}lcurve.out") as f:
headers = f.readline().split()[1:]
lcurve = pd.DataFrame(np.loadtxt(f"{path}lcurve.out"), columns=headers)
legends = ["rmse_e_val", "rmse_e_trn", "rmse_f_val", "rmse_f_trn"]
for legend in legends:
plt.loglog(lcurve["step"], lcurve[legend], label=legend)
plt.legend()
plt.xlabel("Training steps")
plt.ylabel("Loss")
plt.show()

Figure_1

模型冻结

在训练结束时,应该将保存在 TensorFlow 检查点文件中的模型参数冻结为一个通常以扩展名.pb结尾的模型文件。只需执行以下操作:

1
dp freeze -o graph.pb

模型测试

检查训练模型的质量

1
dp test -m graph.pb -s ../00.data/validation_data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[2024-09-24 13:02:43,966] DEEPMD INFO    # ---------------output of dp test---------------
[2024-09-24 13:02:43,966] DEEPMD INFO # testing system : ../00.data/validation_data
[2024-09-24 13:02:44,293] DEEPMD INFO # number of test data : 40
[2024-09-24 13:02:44,293] DEEPMD INFO Energy MAE : 2.501408e-03 eV
[2024-09-24 13:02:44,293] DEEPMD INFO Energy RMSE : 3.163454e-03 eV
[2024-09-24 13:02:44,294] DEEPMD INFO Energy MAE/Natoms : 5.002815e-04 eV
[2024-09-24 13:02:44,294] DEEPMD INFO Energy RMSE/Natoms : 6.326908e-04 eV
[2024-09-24 13:02:44,294] DEEPMD INFO Force MAE : 2.953246e-02 eV/A
[2024-09-24 13:02:44,294] DEEPMD INFO Force RMSE : 3.907943e-02 eV/A
[2024-09-24 13:02:44,294] DEEPMD INFO Virial MAE : 3.898108e-02 eV
[2024-09-24 13:02:44,294] DEEPMD INFO Virial RMSE : 5.222230e-02 eV
[2024-09-24 13:02:44,294] DEEPMD INFO Virial MAE/Natoms : 7.796217e-03 eV
[2024-09-24 13:02:44,294] DEEPMD INFO Virial RMSE/Natoms : 1.044446e-02 eV
[2024-09-24 13:02:44,294] DEEPMD INFO # -----------------------------------------------

计算预测数据和原始数据之间的相关性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import dpdata
import matplotlib.pyplot as plt
import numpy as np

# training_systems = dpdata.LabeledSystem("./00.data/training_data", fmt="deepmd/npy")
# predict = training_systems.predict("./01.train/graph.pb")

# # print(training_systems["energies"],predict["energies"])
# data = np.vstack((training_systems["energies"],predict["energies"])).T
# np.savetxt("./train_pre.dat",data)


data = np.loadtxt("./train_pre.dat")
y1 = data[:,0]
y2 = data[:,1]

print(y1.shape,y2.shape)

plt.scatter(y1,y2)

x_range = np.linspace(plt.xlim()[0], plt.xlim()[1])

plt.plot(x_range, x_range, "r--", linewidth=1)
plt.xlabel("Energy of DFT")
plt.ylabel("Energy predicted by deep potential")

plt.show()

Figure_1

在LAMMPS中执行MD模拟

1
2
3
cd 02.lmp
cp ../01.train/graph.pb ./
tree -L 1

.
├── ch4.dump
├── conf.lmp:气相甲烷分子动力学模拟的初始构型
├── graph.pb
├── in.lammps:标准的 MD 模拟 LAMMPS 输入文件,只有pair_style 和pair_style 例外
└── log.lammps

0 directories, 5 files

在具有兼容版本的 LAMMPS 环境中,执行深度势分子动力学:

1
mpirun -np 8 lmp_mpi -i in.lammps
1
2
3
4
5
6
7
8
9
Summary of lammps deepmd module ...
>>> Info of deepmd-kit:
installed to: /home/xxx/softwares/deepmd-kit
source:
source branch: HEAD
source commit: cbf2de6
source commit at: 2024-07-27 05:11:58 +0000
support model ver.: 1.1
build variant: cpu

参考

[1] https://docs.deepmodeling.com/projects/deepmd/en/r2/getting-started/quick_start.html

[2] https://hyper.ai/tutorials/26001