论文复现-机器学习预测甲烷水合物平衡条件

[toc]

Renewable and Sustainable Energy Reviews 173 (2023) 113103

题目	A reliable model to predict the methane-hydrate equilibrium: An updated database and machine learning approach
数据	The database includes 987 data samples for pure water, 15 different single-salt brines, and 6 different multi-salt systems.
模型	Multilayer Perceptron (MLP) ANN, Decision Tree (DT), and Extremely Randomized Trees (ET)
实现	Sickit-Learn library
预测目标	平衡温度

The gathered database includes 987 data samples for pure water, 15 different single-salt brines, and 6 different multi-salt systems. Covering 15 different salts, it is the largest database gathered on the topic of interest so far.

Almost all the experimental works in the literature have reported C in %wt. However, it is opted to use ion’s mole fraction (x) for the modeling purpose for several reasons.
- using C, it is not possible to correctly address the multi-salts systems.
- using ions concentration, there would be 12 characteristic features for brines, which is lower than the number of different salts, which is 15. Using a fewer number of features not only decreases the processing load but also lowers the overfitting probability.
- More importantly, using the ions concentration, it is possible to address the salts not present in the reported experimental data samples.
In addition to brine characteristics, P will also be used as the operational parameter to predict the equilibrium T for the CH4-hydrate.

The gathered database includes 977 data samples after removing the duplicates.
Data splitting in the ratio of 80/20 for training/testing was considered. Before splitting, 40 data samples belonging to MgBr2 were set aside for additional testing. Although there is no other reference reporting experimental data for this salt, data samples are available containing both Mg⁺⁺ and Br⁻ ions for other salts.
As additional testing, if the developed model provides an acceptable prediction for MgBr2, it would be able to be used for any hypothetical salt the ions of which are available in the database.
The remaining 937 data samples were split into training and testing subsets in the ratio of 80/20.
A 5-fold Cross-Validation (CV) was used to validate the developed models.

the 5-fold CV RMSE acquired from 20 trials for MLP, DT, and ET models.
The closer error values to each other for different random states, the more stable the model. As each trial uses a different random state, the yielded results are not the same.

Residual ($Res_i$), Coefficient of determination ($R^2$), $RMSE$, Absolute Relative Error ($ARD$), and Average Absolute Relative Error ($AARD$).

Tags: 机器学习论文复现