SCI和EI收录∣中国化工学会会刊

Chinese Journal of Chemical Engineering ›› 2025, Vol. 81 ›› Issue (5): 182-199.DOI: 10.1016/j.cjche.2025.02.013

Previous Articles     Next Articles

A systematic data-driven modelling framework for nonlinear distillation processes incorporating data intervals clustering and new integrated learning algorithm

Zhe Wang1, Renchu He2, Jian Long1   

  1. 1. Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China;
    2. Department of Automation, China University of Petroleum, Beijing 102249, China
  • Received:2024-11-11 Revised:2025-02-17 Accepted:2025-02-20 Online:2025-03-08 Published:2025-05-28
  • Contact: Renchu He,E-mail:rche@cup.edu.cn;Jian Long,E-mail:longjian@ecust.edu.cn
  • Supported by:
    This work was supported by the National Key Research and Development Program of China (2023YFB3307801), the National Natural Science Foundation of China (62394343, 62373155,62073142), Major Science and Technology Project of Xinjiang (No. 2022A01006-4), the Programme of Introducing Talents of Discipline to Universities (the 111 Project) under Grant B17017, the Fundamental Research Funds for the Central Universities, Science Foundation of China University of Petroleum, Beijing (No. 2462024YJRC011) and the Open Research Project of the State Key Laboratory of Industrial Control Technology, China (Grant No. ICT2024B70).

A systematic data-driven modelling framework for nonlinear distillation processes incorporating data intervals clustering and new integrated learning algorithm

Zhe Wang1, Renchu He2, Jian Long1   

  1. 1. Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China;
    2. Department of Automation, China University of Petroleum, Beijing 102249, China
  • 通讯作者: Renchu He,E-mail:rche@cup.edu.cn;Jian Long,E-mail:longjian@ecust.edu.cn
  • 基金资助:
    This work was supported by the National Key Research and Development Program of China (2023YFB3307801), the National Natural Science Foundation of China (62394343, 62373155,62073142), Major Science and Technology Project of Xinjiang (No. 2022A01006-4), the Programme of Introducing Talents of Discipline to Universities (the 111 Project) under Grant B17017, the Fundamental Research Funds for the Central Universities, Science Foundation of China University of Petroleum, Beijing (No. 2462024YJRC011) and the Open Research Project of the State Key Laboratory of Industrial Control Technology, China (Grant No. ICT2024B70).

Abstract: The distillation process is an important chemical process, and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling, thus improving the efficiency of process optimization or monitoring studies. However, the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals, which brings challenges to accurate data-driven modelling of distillation processes. This paper proposes a systematic data-driven modelling framework to solve these problems. Firstly, data segment variance was introduced into the K-means algorithm to form K-means data interval (KMDI) clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction. Secondly, maximal information coefficient (MIC) was employed to calculate the nonlinear correlation between variables for removing redundant features. Finally, extreme gradient boosting (XGBoost) was integrated as the basic learner into adaptive boosting (AdaBoost) with the error threshold (ET) set to improve weights update strategy to construct the new integrated learning algorithm, XGBoost-AdaBoost-ET. The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.

Key words: Integrated learning algorithm, Data intervals clustering, Feature selection, Application of artificial intelligence in distillation industry, Data-driven modelling

摘要: The distillation process is an important chemical process, and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling, thus improving the efficiency of process optimization or monitoring studies. However, the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals, which brings challenges to accurate data-driven modelling of distillation processes. This paper proposes a systematic data-driven modelling framework to solve these problems. Firstly, data segment variance was introduced into the K-means algorithm to form K-means data interval (KMDI) clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction. Secondly, maximal information coefficient (MIC) was employed to calculate the nonlinear correlation between variables for removing redundant features. Finally, extreme gradient boosting (XGBoost) was integrated as the basic learner into adaptive boosting (AdaBoost) with the error threshold (ET) set to improve weights update strategy to construct the new integrated learning algorithm, XGBoost-AdaBoost-ET. The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.

关键词: Integrated learning algorithm, Data intervals clustering, Feature selection, Application of artificial intelligence in distillation industry, Data-driven modelling