SCI和EI收录∣中国化工学会会刊

Chinese Journal of Chemical Engineering ›› 2021, Vol. 39 ›› Issue (11): 286-296.DOI: 10.1016/j.cjche.2021.03.002

Previous Articles     Next Articles

Prediction of methane storage in covalent organic frameworks using big-data-mining approach

Huan Zhang1,2, Peisong Yang2, Duli Yu2,3, Kunfeng Wang3, Qingyuan Yang1,2   

  1. 1 State Key Laboratory of Organic-Inorganic Composites, Beijing University of Chemical Technology, Beijing 100029, China;
    2 Beijing Advanced Innovation Center for Soft Matter Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, China;
    3 College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
  • Received:2021-01-05 Revised:2021-02-10 Online:2021-12-27 Published:2021-11-28
  • Contact: Kunfeng Wang, Qingyuan Yang
  • Supported by:
    This work was financially supported by the National Natural Science Foundation of China (22078004), the Fundamental Research Funds for the Central Universities (buctrc201727) and the Big Science Project from BUCT (XK180301).

Prediction of methane storage in covalent organic frameworks using big-data-mining approach

Huan Zhang1,2, Peisong Yang2, Duli Yu2,3, Kunfeng Wang3, Qingyuan Yang1,2   

  1. 1 State Key Laboratory of Organic-Inorganic Composites, Beijing University of Chemical Technology, Beijing 100029, China;
    2 Beijing Advanced Innovation Center for Soft Matter Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, China;
    3 College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
  • 通讯作者: Kunfeng Wang, Qingyuan Yang
  • 基金资助:
    This work was financially supported by the National Natural Science Foundation of China (22078004), the Fundamental Research Funds for the Central Universities (buctrc201727) and the Big Science Project from BUCT (XK180301).

Abstract: A combination of computational materials screening and machine learning (ML) technique is being adopted as a popular approach to study various materials toward application of interest. In this work, we began with high-throughput molecular simulations to calculate the methane storage (6.5 MPa) and deliverable (6.5-0.58 MPa) capacities of 404,460 covalent organic frameworks (COFs) at 298 K. Then, the full data sets with 23 features were randomly split into training and test sets in a ratio of 20:80, which were applied to evaluate the prediction abilities of several ML algorithms, including gradient boosting decision tree (GBDT), neural network (NN), support vector machine (SVM), random forest (RF) and decision tree (DT). The results indicate that the RF model has the highest prediction accuracy, which was further employed to reduce the dimension of features space and quantitatively analyze the relative importance of each feature value. The binary classification predictors built using the features with the highest influence weight can give a successful identification of top-performing candidates from the test set containing 323,168 COFs with an accuracy exceeding 96%. The deliverable capacities of the identified COFs were found to outperform those reported so far for various adsorbents. The findings may provide a useful guidance for the design and synthesis of new high-performance materials for methane storage application.

Key words: Covalent organic framework, Monte Carlo simulation, Methane, Machine learning, Model

摘要: A combination of computational materials screening and machine learning (ML) technique is being adopted as a popular approach to study various materials toward application of interest. In this work, we began with high-throughput molecular simulations to calculate the methane storage (6.5 MPa) and deliverable (6.5-0.58 MPa) capacities of 404,460 covalent organic frameworks (COFs) at 298 K. Then, the full data sets with 23 features were randomly split into training and test sets in a ratio of 20:80, which were applied to evaluate the prediction abilities of several ML algorithms, including gradient boosting decision tree (GBDT), neural network (NN), support vector machine (SVM), random forest (RF) and decision tree (DT). The results indicate that the RF model has the highest prediction accuracy, which was further employed to reduce the dimension of features space and quantitatively analyze the relative importance of each feature value. The binary classification predictors built using the features with the highest influence weight can give a successful identification of top-performing candidates from the test set containing 323,168 COFs with an accuracy exceeding 96%. The deliverable capacities of the identified COFs were found to outperform those reported so far for various adsorbents. The findings may provide a useful guidance for the design and synthesis of new high-performance materials for methane storage application.

关键词: Covalent organic framework, Monte Carlo simulation, Methane, Machine learning, Model