SCI和EI收录∣中国化工学会会刊

中国化学工程学报 ›› 2023, Vol. 61 ›› Issue (9): 68-81.DOI: 10.1016/j.cjche.2023.02.026

• Full Length Article • 上一篇    下一篇

Extraction and analysis of risk factors from Chinese chemical accident reports

Xi Luo, Xiayuan Feng, Xu Ji, Yagu Dang, Li Zhou, Kexin Bi, Yiyang Dai   

  1. School of Chemical Engineering, Sichuan University, Chengdu 610065, China
  • 收稿日期:2022-09-13 修回日期:2023-02-16 出版日期:2023-09-28 发布日期:2023-12-14
  • 通讯作者: Xu Ji,E-mail:jxhhpb@163.com;Yiyang Dai,E-mail:daiyy@scu.edu.cn
  • 基金资助:
    The authors are grateful for the support of the National Key Research and Development Program of China (2021YFB4000505) and Sichuan Science and Technology Program (2021YFS0301).

Extraction and analysis of risk factors from Chinese chemical accident reports

Xi Luo, Xiayuan Feng, Xu Ji, Yagu Dang, Li Zhou, Kexin Bi, Yiyang Dai   

  1. School of Chemical Engineering, Sichuan University, Chengdu 610065, China
  • Received:2022-09-13 Revised:2023-02-16 Online:2023-09-28 Published:2023-12-14
  • Contact: Xu Ji,E-mail:jxhhpb@163.com;Yiyang Dai,E-mail:daiyy@scu.edu.cn
  • Supported by:
    The authors are grateful for the support of the National Key Research and Development Program of China (2021YFB4000505) and Sichuan Science and Technology Program (2021YFS0301).

摘要: Accidents in chemical production usually result in fatal injury, economic loss and negative social impact. Chemical accident reports which record past accident information, contain a large amount of expert knowledge. However, manually finding out the key factors causing accidents needs reading and analyzing of numerous accident reports, which is time-consuming and labor intensive. Herein, in this paper, a semi-automatic method based on natural language process (NLP) technology is developed to construct a knowledge graph of chemical accidents. Firstly, we build a named entity recognition (NER) model using SoftLexicon (simplify the usage of lexicon) + BERT-Transformer-CRF (conditional random field) to automatically extract the accident information and risk factors. The risk factors leading to accident in chemical accident reports are divided into five categories: human, machine, material, management, and environment. Through analysis of the extraction results of different chemical industries and different accident types, corresponding accident prevention suggestions are given. Secondly, based on the definition of classes and hierarchies of information in chemical accident reports, the seven-step method developed at Stanford University is used to construct the ontology-based chemical accident knowledge description model. Finally, the ontology knowledge description model is imported into the graph database Neo4j, and the knowledge graph is constructed to realize the structured storage of chemical accident knowledge. In the case of information extraction from 290 Chinese chemical accident reports, SoftLexicon + BERT-Transformer-CRF shows the best extraction performance among nine experimental models. Demonstrating that the method developed in the current work can be a promising tool in obtaining the factors causing accidents, which contributes to intelligent accident analysis and auxiliary accident prevention.

关键词: Chemical processes, Chemical process safety, Natural language process, Knowledge graph, Neural networks, Algorithm

Abstract: Accidents in chemical production usually result in fatal injury, economic loss and negative social impact. Chemical accident reports which record past accident information, contain a large amount of expert knowledge. However, manually finding out the key factors causing accidents needs reading and analyzing of numerous accident reports, which is time-consuming and labor intensive. Herein, in this paper, a semi-automatic method based on natural language process (NLP) technology is developed to construct a knowledge graph of chemical accidents. Firstly, we build a named entity recognition (NER) model using SoftLexicon (simplify the usage of lexicon) + BERT-Transformer-CRF (conditional random field) to automatically extract the accident information and risk factors. The risk factors leading to accident in chemical accident reports are divided into five categories: human, machine, material, management, and environment. Through analysis of the extraction results of different chemical industries and different accident types, corresponding accident prevention suggestions are given. Secondly, based on the definition of classes and hierarchies of information in chemical accident reports, the seven-step method developed at Stanford University is used to construct the ontology-based chemical accident knowledge description model. Finally, the ontology knowledge description model is imported into the graph database Neo4j, and the knowledge graph is constructed to realize the structured storage of chemical accident knowledge. In the case of information extraction from 290 Chinese chemical accident reports, SoftLexicon + BERT-Transformer-CRF shows the best extraction performance among nine experimental models. Demonstrating that the method developed in the current work can be a promising tool in obtaining the factors causing accidents, which contributes to intelligent accident analysis and auxiliary accident prevention.

Key words: Chemical processes, Chemical process safety, Natural language process, Knowledge graph, Neural networks, Algorithm