SCI和EI收录∣中国化工学会会刊

Chinese Journal of Chemical Engineering ›› 2023, Vol. 59 ›› Issue (7): 72-84.DOI: 10.1016/j.cjche.2022.12.013

Previous Articles     Next Articles

Data cleaning method for the process of acid production with flue gas based on improved random forest

Xiaoli Li1,2, Minghua Liu1, Kang Wang1, Zhiqiang Liu3, Guihai Li4   

  1. 1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China;
    2. Beijing Key Laboratory of Computational Intelligence and Intelligent System, Engineering Research Center of Digital Community, Ministry of Education, Beijing 100124, China;
    3. Guixi Smelter, Jiangxi Copper Co., Ltd, Guixi 335400, China;
    4. Beijing RTlink Technology Co., Ltd, Beijing 100102, China
  • Received:2022-05-17 Revised:2022-12-24 Online:2023-10-14 Published:2023-07-28
  • Contact: Minghua Liu,E-mail:liumh829@163.com
  • Supported by:
    This study is supported by the National Natural Science Foundation of China (61873006) and Beijing Natural Science Foundation (4204087, 4212040).

Data cleaning method for the process of acid production with flue gas based on improved random forest

Xiaoli Li1,2, Minghua Liu1, Kang Wang1, Zhiqiang Liu3, Guihai Li4   

  1. 1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China;
    2. Beijing Key Laboratory of Computational Intelligence and Intelligent System, Engineering Research Center of Digital Community, Ministry of Education, Beijing 100124, China;
    3. Guixi Smelter, Jiangxi Copper Co., Ltd, Guixi 335400, China;
    4. Beijing RTlink Technology Co., Ltd, Beijing 100102, China
  • 通讯作者: Minghua Liu,E-mail:liumh829@163.com
  • 基金资助:
    This study is supported by the National Natural Science Foundation of China (61873006) and Beijing Natural Science Foundation (4204087, 4212040).

Abstract: Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling. The operation data is an important basis for state monitoring, optimal control, and fault diagnosis. However, the operating environment of acid production with flue gas is complex and there is much equipment. The data obtained by the detection equipment is seriously polluted and prone to abnormal phenomena such as data loss and outliers. Therefore, to solve the problem of abnormal data in the process of acid production with flue gas, a data cleaning method based on improved random forest is proposed. Firstly, an outlier data recognition model based on isolation forest is designed to identify and eliminate the outliers in the dataset. Secondly, an improved random forest regression model is established. Genetic algorithm is used to optimize the hyperparameters of the random forest regression model. Then the optimal parameter combination is found in the search space and the trend of data is predicted. Finally, the improved random forest data cleaning method is used to compensate for the missing data after eliminating abnormal data and the data cleaning is realized. Results show that the proposed method can accurately eliminate and compensate for the abnormal data in the process of acid production with flue gas. The method improves the accuracy of compensation for missing data. With the data after cleaning, a more accurate model can be established, which is significant to the subsequent temperature control. The conversion rate of SO2 can be further improved, thereby improving the yield of sulfuric acid and economic benefits.

Key words: Acid production, Data cleaning, Isolation forest, Random forest, Data compensation

摘要: Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling. The operation data is an important basis for state monitoring, optimal control, and fault diagnosis. However, the operating environment of acid production with flue gas is complex and there is much equipment. The data obtained by the detection equipment is seriously polluted and prone to abnormal phenomena such as data loss and outliers. Therefore, to solve the problem of abnormal data in the process of acid production with flue gas, a data cleaning method based on improved random forest is proposed. Firstly, an outlier data recognition model based on isolation forest is designed to identify and eliminate the outliers in the dataset. Secondly, an improved random forest regression model is established. Genetic algorithm is used to optimize the hyperparameters of the random forest regression model. Then the optimal parameter combination is found in the search space and the trend of data is predicted. Finally, the improved random forest data cleaning method is used to compensate for the missing data after eliminating abnormal data and the data cleaning is realized. Results show that the proposed method can accurately eliminate and compensate for the abnormal data in the process of acid production with flue gas. The method improves the accuracy of compensation for missing data. With the data after cleaning, a more accurate model can be established, which is significant to the subsequent temperature control. The conversion rate of SO2 can be further improved, thereby improving the yield of sulfuric acid and economic benefits.

关键词: Acid production, Data cleaning, Isolation forest, Random forest, Data compensation