[1] J. Hong, Optimal substrate feeding policy for a fed batch fermentation with substrate and product inhibition kinetics, Biotechnol. Bioeng. 28(9) (1986) 1421-1431. [2] M.S. Iyer, D.C. Wunsch, Dynamic re-optimization of a fed-batch fermentor using adaptive critic designs, IEEE Trans. Neural Netw. 12(6) (2001) 1433-1444. [3] U. Yüzgeç, M. Türker, A. Hocalar, On-line evolutionary optimization of an industrial fed-batch yeast fermentation process, ISA Trans. 48(1) (2009) 79-92. [4] A. Ashoori, B. Moshiri, A. Khaki-Sedigh, M.R. Bakhtiari, Optimal control of a nonlinear fed-batch fermentation process using model predictive approach, J. Process Control 19(7) (2009) 1162-1173. [5] C.V. Peroni, N.S. Kaisare, J.H. Lee, Optimal control of a fed-batch bioreactor using simulation-based approximate dynamic programming, IEEE Trans. Control Syst. Technol. 13(5) (2005) 786-790. [6] S. Syafiie, F. Tadeo, M. Villafín, A. Alonso, Learning control for batch thermal sterilization of canned foods, ISA Trans. 50(1) (2011) 82-90. [7] C. Valencia, G. Espinosa, J. Giralt, F. Giralt, Optimization of invertase production in a fed-batch bioreactor using simulation based dynamic programming coupled with a neural classifier, Comput. Chem. Eng. 31(9) (2007) 1131-1140. [8] D.Z. Li, L. Qian, Q.B. Jin, T.W. Tan, Reinforcement learning control with adaptive gain for a Saccharomyces cerevisiae fermentation process, Appl. Soft Comput. J. 11(8) (2011) 4488-4495. [9] D.V. Prokhorov, D.C.Wunsch, Adaptive critic designs, IEEE Trans. Neural Netw. 8(5) (1997) 997-1007. [10] C.Q. Lian, X. Xu, L. Zuo, Learning control of a bioreactor system using kernel-based heuristic dynamic programming, Proc. World Congr. Intelligent Control Autom. WCICA 2012, pp. 316-321. [11] B.Wang, D.B. Zhao, C. Alippi, D.R. Liu, Dual heuristic dynamic programming for nonlinear discrete-time uncertain systems with state delay, Neurocomputing 134(2014) 222-229. [12] C.Q. Lian, X. Xu, L. Zuo, Z.H. Huang, Adaptive critic design with graph Laplacian for online learning control of nonlinear systems, Int. J. Adapt. Control Signal Process. 28(2014) 290-304. [13] M. Fairbank, E. Alonso, D. Prokhorov, Simple and fast calculation of the second-order gradients for globalized dual heuristic dynamic programming in neural networks, IEEE Trans. Neural Netw. Learn. Syst. 23(10) (2012) 1671-1676. [14] J. Fu, H.B. He, X.M. Zhou, Adaptive learning and control for MIMO system based on adaptive dynamic programming, IEEE Trans. Neural Netw. 22(7) (2011) 1133-1148. [15] Z. Ni, H.B. He, J.Y. Wen, Adaptive learning in tracking control based on the dual critic network design, IEEE Trans. Neural Netw. Learn. Syst. 24(6) (2013) 913-928. [16] F.X. Tan, D.R. Liu, X.P. Guan, Online optimal control for VTOL aircraft system based on DHP algorithm, Proc. of the 33rd Chinese Control Conf., CCC 2014, pp. 2882-2886. [17] X. Xu, Z.S. Hou, C.Q. Lian, H.B. He, Online learning control using adaptive critic designs with sparse kernel machines, IEEE Trans. Neural Netw. Learn. Syst. 24(5) (2013) 762-775. [18] X. Xu, C.Q. Lian, L. Zuo, H.B. He, Kernel-based approximate dynamic programming for real-time online learning control:An experimental study, IEEE Trans. Control Syst. Technol. 22(1) (2014) 146-156. [19] T.H. Song, D.Z. Li, L.L. Cao, K. Hirasawa, Kernel-based least squares temporal difference with gradient correction, IEEE Trans. Neural Netw. Learn. Syst. 27(4) (2016) 771-782. [20] Z.H. Xiong, J. Zhang, Neural networkmodel-based on-line re-optimisation control of fed-batch processes using a modified iterative dynamic programming algorithm, Chem. Eng. Process. 44(4) (2005) 477-484. [21] R.S. Sutton, Learning to predict by the method of temporal differences, Mach. Learn. 3(1998) 9-44. [22] S.J. Bradtke, A.G. Barto, Linear least-squares algorithms for temporal difference learning, Mach. Learn. 22(1-3) (1996) 33-57. [23] X. Xu, H.G. He, D.W. Hu, Efficient reinforcement learning using recursive leastsquares methods, J. Artif. Intell. Res. 16(2002) 259-292. [24] S. Bhatnagar, D. Precup, D. Silver, Convergent temporal-difference learning with arbitrary smooth function approximation, Adv. Neural Inf. Process. Syst.-Proc. Conf 2009, pp. 1204-1212. [25] R.S. Sutton, H.R. Maei, D. Precup, Fast gradient-descent methods for temporaldifference learning with linear function approximation, Proc. Int. Conf. Mach. Learn., ICML 2009, pp. 993-1000. [26] M. Geist, O. Pietquin, Algorithmic survey of parametric value function approximation, IEEE Trans. Neural Netw. Learn. Syst. 24(6) (2013) 845-867. [27] H.R. Maei, C. Szepesvári, S. Bhatnagar, R.S. Sutton, Toward off-policy learning control with function approximation, Proc. Int. Conf. Mach. Learn., ICML 2010, pp. 719-726. |