Eastsheng's Wiki

一个DeepMD完整例子:water

2024-10-10 12:18:28

[toc]

小白,按教程写的,不一定对,但是,是一个完成的流程

数据准备

执行VASP计算获取OUTCAR

1
2
3
4
mkdir 00.data
cd 00.data
mkdir VASP
cd VASP

创建VASP模拟输入数据

  • INCAR

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    SYSTEM = Water

    ! ab initio
    PREC = Normal ! Precision level, standard precision
    ENMAX = 400 ! cutoff should be set manually
    ISMEAR = 0 ! Gaussian smearing; metals:1
    SIGMA = 0.1 ! Smearing value in eV; metals:0.2

    ! MD
    IBRION = 1 ! Activate MD
    NFREE = 2 ! 2 independent degrees of freedom
    NSW = 1000 ! Max electronic SCF steps
    EDIFFG = -0.02 ! forces smaller 0.02 A/eV
    POTIM = 2 ! Timestep in fs

    MDALGO = 2 ! Nosé-Hoover thermostat
    SMASS = 0 ! MD Algorithm: -3-microcanonical ensemble; 0-canonical ensemble

    TEBEG = 40 ! Start temperature K
    TEEND = 400 ! End temperature K

  • KPOINTS

    1
    2
    3
    4
    5
    Gamma-point only
    0
    Monkhorst Pack
    1 1 1
    0 0 0

    Gamma-point only:表示只使用伽玛点(k = 0),即在布里渊区的中心。这通常用于小系统或简单计算,因为只考虑伽玛点可以简化计算。

    0:这是指定的 k 点数,”0” 表示只使用伽玛点,不进行额外的 k 点采样。

    Monkhorst Pack:这是定义 k 点网格的方式,Monkhorst-Pack 方法通过在布里渊区中均匀地分布 k 点来生成网格,适合周期性系统。

    1 1 1:这三位数字表示在三个方向上的 k 点数。在此例中,表示在每个方向上只使用一个 k 点(即只有伽玛点)。

    0 0 0:这是对 k 点的偏移量,通常与 Monkhorst-Pack 网格一起使用。这里的 “0 0 0” 表示不进行任何偏移,直接在伽玛点。

  • POSCAR

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    H2O_2
    0.52918 ! scaling parameter
    15 0 0
    0 15 0
    0 0 15
    1 2
    select
    cart
    0.00 0.00 0.00 F F F
    1.10 -1.43 0.00 T T F
    1.10 1.43 0.00 T T F
  • POTCAR

    1
    cat ~/softwares/VASP/PAW_PBE/O/POTCAR ~/softwares/VASP/PAW_PBE/H/POTCAR >POTCAR
1
tree

├── INCAR
├── KPOINTS
├── POSCAR
└── POTCAR

1
2
3
mpirun -np 16 vasp_std
...
tree

.
├── CHG
├── CHGCAR
├── CONTCAR
├── DOSCAR
├── EIGENVAL
├── HILLSPOT
├── IBZKPT
├── ICONST
├── INCAR
├── KPOINTS
├── OSZICAR
├── OUTCAR
├── PCDAT
├── POSCAR
├── POTCAR
├── REPORT
├── vasprun.xml
├── WAVECAR
└── XDATCAR

1
2
3
cp OUTCAR ../
cd ..
tree -L 1

.
├── OUTCAR
└── VASP

数据转换与划分

普通方法

1
2
3
4
5
6
7
# split.py
import dpdata
import numpy as np

data = dpdata.LabeledSystem("OUTCAR")
data.to("deepmd/npy", "data", set_size=3)
# set_size = 3 这里数字小由于帧数不够,仅用于演示
1
2
cd data
tree -L 1

.
├── set.000
├── set.001
├── set.002
├── set.003
├── type_map.raw
└── type.raw

这里划分出四组数据,前三组作为训练集,最后一组作为测试集

1
2
3
4
5
6
7
8
9
10
11
12
mkdir train_data && mkdir val_data
mkdir train_data/data_0 && mkdir train_data/data_1 && mkdir train_data/data_2
mkdir val_data/data_3

mv set.000 ./train_data/data_0/
mv set.001 ./train_data/data_1/
mv set.002 ./train_data/data_2/
mv set.003 ./val_data/data_3/
cp type_map.raw type.raw ./train_data/data_0/
cp type_map.raw type.raw ./train_data/data_1/
cp type_map.raw type.raw ./train_data/data_2/
mv type_map.raw type.raw ./val_data/data_3/

这里是需要手动把训练集和测试集分配好

1
tree -L 3

.
├── train_data
│ ├── data_0
│ │ ├── set.000
│ │ ├── type_map.raw
│ │ └── type.raw
│ ├── data_1
│ │ ├── set.001
│ │ ├── type_map.raw
│ │ └── type.raw
│ └── data_2
│ ├── set.002
│ ├── type_map.raw
│ └── type.raw
└── val_data
└── data_3
├── set.003
├── type_map.raw
└── type.raw

一步完成

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# split.py
import dpdata
import numpy as np

data = dpdata.LabeledSystem("OUTCAR")
# data.to("deepmd/npy", "data", set_size=200)

rng = np.random.default_rng()
index_validation = rng.choice(1001, size=200, replace=False)
# other indexes are training_data
index_training = list(set(range(1001)) - set(index_validation))
data_training = data.sub_system(index_training)
data_validation = data.sub_system(index_validation)

data_training.to_deepmd_npy("./data/training_data")
data_validation.to_deepmd_npy("./data/validation_data")
print("# the training data contains %d frames" % len(data_training))
print("# the validation data contains %d frames" % len(data_validation))

后面training_data和validation_data数据集路径需要相应修改为:

“systems”: [“../00.data/data/training_data/“],

“systems”: [“../00.data/data/validation_data/“],

准备input.json

指定训练参数

1
2
3
4
cd ../../
mkdir 01.data
cd 01.data/
touch input.json

输入下面内容,注意修改type_map,和training_data和validation_data数据集

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
{
"_comment": " model parameters",
"model": {
"type_map": ["O","H"],
"descriptor" :{
"type": "se_e2_a",
"sel": "auto",
"rcut_smth": 0.50,
"rcut": 6.00,
"neuron": [25, 50, 100],
"resnet_dt": false,
"axis_neuron": 16,
"seed": 1,
"_comment": " that's all"
},
"fitting_net" : {
"neuron": [240, 240, 240],
"resnet_dt": true,
"seed": 1,
"_comment": " that's all"
},
"_comment": " that's all"
},

"learning_rate" :{
"type": "exp",
"decay_steps": 50,
"start_lr": 0.001,
"stop_lr": 3.51e-8,
"_comment": "that's all"
},

"loss" :{
"type": "ener",
"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0,
"_comment": " that's all"
},

"training" : {
"training_data": {
"systems": ["../00.data/data/train_data/data_0/", "../00.data/data/train_data/data_1/", "../00.data/data/train_data/data_2/"],
"batch_size": "auto",
"_comment": "that's all"
},
"validation_data":{
"systems": ["../00.data/data/val_data/data_3/"],
"batch_size": "auto",
"numb_btch": 1,
"_comment": "that's all"
},
"numb_steps": 10000,
"seed": 10,
"disp_file": "lcurve.out",
"disp_freq": 200,
"save_freq": 1000,
"_comment": "that's all"
},

"_comment": "that's all"
}

训练与测试

1
dp train input.json

[2024-10-10 12:30:17,515] DEEPMD INFO batch 10000: val: rmse = 8.50e-03, rmse_e = 3.00e-03, rmse_f = 6.62e-03
[2024-10-10 12:30:17,515] DEEPMD INFO batch 10000: total wall time = 12.63 s
[2024-10-10 12:30:17,774] DEEPMD INFO saved checkpoint model.ckpt
[2024-10-10 12:30:17,774] DEEPMD INFO average training time: 0.0600 s/batch (exclude first 200 batches)
[2024-10-10 12:30:17,774] DEEPMD INFO finished training
[2024-10-10 12:30:17,774] DEEPMD INFO wall time: 610.835 s
WARNING:tensorflow:disable_mixed_precision_graph_rewrite() called when mixed precision is already disabled.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# 设置Mathtext字体,可以选择合适的字体
plt.rcParams['mathtext.fontset'] = 'custom'
plt.rcParams['mathtext.rm'] = 'Arial' # 使用Arial字体作为Mathtext字体

# 在这之后进行绘图
path = "./01.train/"
with open(f"{path}lcurve.out") as f:
headers = f.readline().split()[1:]
lcurve = pd.DataFrame(np.loadtxt(f"{path}lcurve.out"), columns=headers)
legends = ["rmse_e_val", "rmse_e_trn", "rmse_f_val", "rmse_f_trn"]
for legend in legends:
plt.loglog(lcurve["step"], lcurve[legend], label=legend)
plt.legend()
plt.xlabel("Training steps")
plt.ylabel("Loss")
plt.show()

Figure_1

  • 预测:数据集小,仅供演示
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import dpdata
import numpy as np

# training_systems = dpdata.LabeledSystem(
# "./00.data/data/train_data/data_0/",
# fmt="deepmd/npy")

training_systems = dpdata.LabeledSystem("./00.data/OUTCAR")

predict = training_systems.predict("./01.train/graph.pb")

# print(training_systems["energies"],predict["energies"])
data = np.vstack((training_systems["energies"],predict["energies"])).T
np.savetxt("./01.train/train_pre.dat",data)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import numpy as np

import matplotlib.pyplot as plt
import fastdataing as fd
data = np.loadtxt("./01.train/train_pre.dat")
y1 = data[:,0]
y2 = data[:,1]

print(y1.shape,y2.shape)
ax = fd.add_ax(fd.add_fig(figsize=(8,6)))
plt.subplots_adjust(left=0.2)
ax.scatter(y1,y2)

x_range = np.linspace(plt.xlim()[0], plt.xlim()[1])

ax.plot(x_range, x_range, "r--", linewidth=1)
ax.set_xlabel("Energy of DFT")
ax.set_ylabel("Energy predicted by deep potential")

plt.show()

Figure_1

冻结与压缩

1
2
3
dp freeze -o graph.pb
dp compress -i graph.pb -o graph-compress.pb
ls graph*

graph-compress.pb graph.pb

执行DeepMD模拟

LAMMPS输入文件准备

  • deepmd势:graph.pb
1
2
3
cd ..
mkdir 02.lmp && cd 02.lmp
cp ../01.train/graph.pb .
  • input文件:in.lammps
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# liquid water

units metal
boundary p p p
atom_style full

neighbor 1.0 bin
neigh_modify every 10 delay 0 check no

read_data system.data

pair_style deepmd graph.pb
pair_coeff * *

velocity all create 300.0 23456789
fix 1 all npt temp 300.0 300.0 0.1 iso 1.0 1.0 1.0
timestep 0.001

thermo_style custom step pe ke etotal temp press vol
thermo 100
dump 1 all custom 100 traj_relax.lammpstrj id type element x y z
dump_modify 1 element O H
run 5000

  • water的lammpsdata:system.data

    不需要键角信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
LAMMPS Description

1536 atoms

2 atom types

0.0 30 xlo xhi
0.0 30 ylo yhi
0.0 30 zlo zhi

Masses

1 15.9994 # O
2 1.008 # H

Atoms # full

1 1 1 -1.1794 0.0 0.0 0.0
2 1 2 0.5897 0.8164904 0.0 0.577359
3 1 2 0.5897 -0.8164904 0.0 0.577359
4 2 1 -1.1794 0.0 0.0 3.5
5 2 2 0.5897 0.8164904 0.0 4.0773589999999995
6 2 2 0.5897 -0.8164904 0.0 4.0773589999999995
7 3 1 -1.1794 0.0 0.0 7.0
8 3 2 0.5897 0.8164904 0.0 7.5773589999999995
9 3 2 0.5897 -0.8164904 0.0 7.5773589999999995
10 4 1 -1.1794 0.0 0.0 10.5
...

在具有兼容版本的 LAMMPS 环境中,执行深度势分子动力学:

1
mpirun -np 8 lmp_mpi -i in.lammps

LAMMPS (2 Aug 2023)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
Loaded 1 plugins from /home/xxx/softwares/deepmd-kit/lib/deepmd_lmp
Reading data file …

为什么感觉跑的很慢呢?

所有文件保存在water_deepmd

参考

[1] https://docs.deepmodeling.com/projects/deepmd/en/r2/getting-started/index.html

Tags: DeepMD