SCI和EI收录∣中国化工学会会刊

中国化学工程学报 ›› 2024, Vol. 71 ›› Issue (7): 183-192.DOI: 10.1016/j.cjche.2024.03.023

• • 上一篇    下一篇

A deep reinforcement learning approach to gasoline blending real-time optimization under uncertainty

Zhiwei Zhu, Minglei Yang, Wangli He, Renchu He, Yunmeng Zhao, Feng Qian   

  1. Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China
  • 收稿日期:2023-10-17 修回日期:2024-03-13 出版日期:2024-07-28 发布日期:2024-08-30
  • 通讯作者: Yunmeng Zhao,E-mail:yunmeng.zhao@ecust.edu.cn;Feng Qian,E-mail:fqian@ecust.edu.cn
  • 基金资助:
    This work was supported by National Key Research & Development Program – Intergovernmental International Science and Technology Innovation Cooperation Project (2021YFE0112800), National Natural Science Foundation of China (Key Program: 62136003), National Natural Science Foundation of China (62073142), Fundamental Research Funds for the Central Universities (222202417006) and Shanghai Al Lab.

A deep reinforcement learning approach to gasoline blending real-time optimization under uncertainty

Zhiwei Zhu, Minglei Yang, Wangli He, Renchu He, Yunmeng Zhao, Feng Qian   

  1. Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China
  • Received:2023-10-17 Revised:2024-03-13 Online:2024-07-28 Published:2024-08-30
  • Contact: Yunmeng Zhao,E-mail:yunmeng.zhao@ecust.edu.cn;Feng Qian,E-mail:fqian@ecust.edu.cn
  • Supported by:
    This work was supported by National Key Research & Development Program – Intergovernmental International Science and Technology Innovation Cooperation Project (2021YFE0112800), National Natural Science Foundation of China (Key Program: 62136003), National Natural Science Foundation of China (62073142), Fundamental Research Funds for the Central Universities (222202417006) and Shanghai Al Lab.

摘要: The gasoline inline blending process has widely used real-time optimization techniques to achieve optimization objectives, such as minimizing the cost of production. However, the effectiveness of real-time optimization in gasoline blending relies on accurate blending models and is challenged by stochastic disturbances. Thus, we propose a real-time optimization algorithm based on the soft actor-critic (SAC) deep reinforcement learning strategy to optimize gasoline blending without relying on a single blending model and to be robust against disturbances. Our approach constructs the environment using nonlinear blending models and feedstocks with disturbances. The algorithm incorporates the Lagrange multiplier and path constraints in reward design to manage sparse product constraints. Carefully abstracted states facilitate algorithm convergence, and the normalized action vector in each optimization period allows the agent to generalize to some extent across different target production scenarios. Through these well-designed components, the algorithm based on the SAC outperforms real-time optimization methods based on either nonlinear or linear programming. It even demonstrates comparable performance with the time-horizon based real-time optimization method, which requires knowledge of uncertainty models, confirming its capability to handle uncertainty without accurate models. Our simulation illustrates a promising approach to free real-time optimization of the gasoline blending process from uncertainty models that are difficult to acquire in practice.

关键词: Deep reinforcement learning, Gasoline blending, Real-time optimization, Petroleum, Computer simulation, Neural networks

Abstract: The gasoline inline blending process has widely used real-time optimization techniques to achieve optimization objectives, such as minimizing the cost of production. However, the effectiveness of real-time optimization in gasoline blending relies on accurate blending models and is challenged by stochastic disturbances. Thus, we propose a real-time optimization algorithm based on the soft actor-critic (SAC) deep reinforcement learning strategy to optimize gasoline blending without relying on a single blending model and to be robust against disturbances. Our approach constructs the environment using nonlinear blending models and feedstocks with disturbances. The algorithm incorporates the Lagrange multiplier and path constraints in reward design to manage sparse product constraints. Carefully abstracted states facilitate algorithm convergence, and the normalized action vector in each optimization period allows the agent to generalize to some extent across different target production scenarios. Through these well-designed components, the algorithm based on the SAC outperforms real-time optimization methods based on either nonlinear or linear programming. It even demonstrates comparable performance with the time-horizon based real-time optimization method, which requires knowledge of uncertainty models, confirming its capability to handle uncertainty without accurate models. Our simulation illustrates a promising approach to free real-time optimization of the gasoline blending process from uncertainty models that are difficult to acquire in practice.

Key words: Deep reinforcement learning, Gasoline blending, Real-time optimization, Petroleum, Computer simulation, Neural networks