138 / 2024-09-09 16:41:04
Machine Learning Models for Predicting Photosensitized degradation rate constants of Emerging Pollutants
excited triplet-state dissolved organic matter, photosensitized degradation rate constants, machine learning predict model, emerging pollutants.
Session 57 - Contaminants across the marine continuum: behavior, fate and ecological risk assessment
Abstract Accepted
Siyu Zhang / Chinese Academy of Sciences;Institute of Applied Ecology
Research Background: Photosensitized degradation rate constants (k3DOM*) of emerging pollutants are critical for assessing half-life of chemicals in aquatic environment. Challenged by extracting dissolved organic matter (DOM) from various waters, and separating multiple reaction mechanisms between excited triplet-state DOM (3DOM*) and emerging pollutants, experimental data of k3DOM* are extremely limited, especially for seawater DOM. The existing 2 models for predicting k3DOM* are limited to DOM extracted from Rivers in Beijing, but lack of applicability across different water samples, especially for seawater. Therefore, this study aims to develop a prediction model that is applicable to predict k3DOM* with various types of DOM.

Scientific Problem or Hypothesis: This study employs various sensitizers (Sens) with chromophores similar to those of DOM as DOM analogs. The second-order reaction rate constants (k3Sens*) between Sens and pollutants are expected to cover k3DOM* ranges of emerging pollutants in natural water including fresh water and seawater.

Main Methods: Firstly, experimental data (n=178) involving 81 organic compounds with 21 different Sens were collected. A prediction model for quenching rate constants (kq3Sens*) between Sens and emerging pollutants were built based on machine learning (ML) algorithms using chemical descriptors, Sens’ descriptors and experimental conditions as inputs. Five ML algorithms including RandomForest (RF), eXtreme Gradient Boosting (XGBoost), GradientBoost (GBDT), Light Gradient Boosting Machine (LGBM), and Categorical Boosting (CatBoost) were compared. Subsequently, a linear relationship between k3Sens* and k3Sens* were obtained using SPSS. The bi-model system was then applied to predict k3Sens* values of emerging pollutants detected in various aquatic environments. k3DOM* of emerging pollutants determined with DOM extracted from fresh and seawater were employed to verify the predictions.

Main Results and Conclusion: The results demonstrated that the CatBoost model achieved a strong fit and robust predictive performance, with an adjusted coefficient of determination R2 of 0.6 and an external prediction coefficient of determination R2test of 0.6. Additionally, the deviation between the 286 experimental k3DOM* values and those predicted by the model were within 1 log unit.

Significance: This model successfully predicted k3DOM* value of emerging pollutants in various water samples, providing valuable information for risk assessments.