Prediction of ionic liquid toxicity by interpretable machine learning
Haijun Feng, Li Jiajia, Zhou Jian
2025, 84(8):
201-210.
doi:10.1016/j.cjche.2025.04.018
Abstract
(
)
PDF (8279KB)
(
)
References |
Related Articles |
Metrics
The potential toxicity of ionic liquids (ILs) affects their applications; how to control the toxicity is one of the key issues in their applications. To understand its toxicity structure relationship and promote its greener application, six different machine learning algorithms, including Bagging, Adaptive Boosting (AdaBoost), Gradient Boosting (GBoost), Stacking, Voting and Categorical Boosting (CatBoost), are established to model the toxicity of ILs on four distinct datasets including Leukemia rat cell line IPC-81 (IPC-81), Acetylcholinesterase (AChE), Escherichia coli (E.coli) and Vibrio fischeri. Molecular descriptors obtained from the simplified molecular input line entry system (SMILES) are used to characterize ILs. All models are assessed by the mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE) and correlation coefficient (R2). Additionally, an interpretation model based on SHapley Additive exPlanations (SHAP) is built to determine the positive and negative effects of each molecular feature on toxicity. With additional parameters and complexity, the Catboost model outperforms the other models, making it a more reliable model for ILs' toxicity prediction. The results of the model's interpretation indicate that the most significant positive features, SMR_VSA5, PEOE_VSA8, Kappa2, PEOE_VSA6, SMR_VSA5, PEOE_VSA6 and EState_VSA1, can increase the toxicity of ILs as their levels rise, while the most significant negative features, VSA_EState7, EState_VSA8, PEOE_VSA9 and FpDensityMorgan1, can decrease the toxicity as their levels rise. Also, an IL's toxicity will grow as its average molecular weight and number of pyridine rings increase, whereas its toxicity will decrease as its hydrogen bond acceptors increase. This finding offers a theoretical foundation for rapid screening and synthesis of environmentally-benign ILs.