A cascade mechanism driven quantitative toxic mode of action evaluation framework for evaluating toxicity of organic compounds and a construction method thereof
By constructing a quantitative toxicity model framework based on compound structural features and cascade mechanisms, this approach solves the problem that QSAR methods cannot capture complete causal chains in complex toxic pathways of compounds, enabling accurate prediction and risk assessment of compound toxicity. It is applicable to the toxicity assessment of a variety of compounds.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QINGDAO UNIV
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-19
AI Technical Summary
Existing QSAR methods are insufficient to comprehensively and accurately reflect the multi-level and multi-stage toxic mechanisms of compounds in vivo. Especially under low-dose exposure conditions, traditional QSAR models are inadequate in terms of sensitivity, accuracy and interpretability, and cannot capture the complete causal chain of complex toxic pathways of compounds. Furthermore, they lack a unified model that quantitatively correlates the structural features of compounds with each stage of the action pathway.
A quantitative toxicity model (qMOA) framework based on compound structural features and cascade mechanisms was constructed. Through machine learning and high-throughput experimental data, a quantitative relationship model of compound-target protein binding was established. Combined with molecular dynamics simulations and surface plasmon resonance experiments, quantitative response datasets for MIE, KE1, and KE2 were constructed to form a multi-level qMOA framework.
It enables accurate and systematic prediction and risk assessment of compound toxicity, and can quickly and effectively predict the toxicity of organic compounds. It is applicable to the toxicity assessment of different compounds and provides the necessary basic data for risk assessment.
Smart Images

Figure FT_1 
Figure FT_2 
Figure FT_3
Abstract
Description
Technical Field
[0001] This invention belongs to the field of organic compound toxicity prediction technology, and relates to a quantitative toxicity mode assessment framework and its construction method for predicting the toxicity of organic compounds. It combines machine learning and high-throughput experimental data of cascade mechanisms to construct a quantitative toxicity mode model to quantitatively predict the toxicity of organic compounds. Background Technology
[0002] With the explosive growth of the global chemical species, chemical safety assessment has become a major challenge. As of the end of 2023, Chemical Abstracts Service (CISA) registered 204 million chemicals, a large proportion of which are organic compounds, posing a significant threat to human health and ecosystem sustainability. Therefore, obtaining hazardous properties of organic compounds, such as acute toxicity and carcinogenicity, is of great theoretical and practical significance for conducting risk assessments of hazardous substances, implementing management controls, guiding scientific applications, and producing safe alternatives. However, faced with a vast number of existing and soon-to-be-produced and used organic compounds, traditional toxicological methods, primarily relying on whole-animal testing, are insufficient to meet this challenge due to their time-consuming, labor-intensive, and costly nature. Completely relying on experiments to accumulate data and conduct toxicity evaluations is not only difficult to implement in practice but also inevitably involves time lags. Against this backdrop, quantitative structure-activity relationship (QSAR) has become an important branch of computer-aided drug and chemical design and has become one of the most widely used methods for environmental toxicology assessment. It establishes quantitative relationships between the physicochemical properties (such as molecular size, lipophilicity, polarity, etc.) or structural parameters (two-dimensional / three-dimensional molecular structural features, etc.) of compounds and their biological activities or specific properties (such as toxicity, environmental mobility, metabolic kinetic parameters, etc.) through reasonable mathematical statistics and machine learning algorithms. This allows it to predict the potential hazards of new compounds, guide structural optimization and risk screening, and is widely used in toxicological research and ecological risk assessment of chemicals.
[0003] Despite this, traditional QSAR methods still have significant limitations. They primarily rely on single or limited global toxicity endpoints (such as LC50), receptor binding simulations, or parallel indicators, making it difficult to comprehensively and accurately reflect the multi-level, multi-stage toxic mechanisms of compounds in vivo. Particularly in complex toxic pathways, isolated or parallel indicators often fail to capture the complete causal chain from the molecular initiation event (MIE) to the critical event (KE) and then to the adverse outcome (AO). This deficiency limits the sensitivity, accuracy, and interpretability of traditional QSAR models under low-dose exposure conditions and weakens the reliability of causal inferences. Organizations such as the International Chemical Safety Agency (ICSA) and the U.S. Environmental Protection Agency (EPA) have pointed out that Models of Toxicity (MOAs), as a systematic framework for describing cascade toxicity mechanisms, can more comprehensively characterize the cascade mechanisms by which compounds induce adverse health effects and exhibits higher sensitivity in low-dose exposure assessments. However, existing MOA studies mostly remain at the stage of qualitative classification or single-stage mechanism analysis, lacking a unified model that quantitatively correlates compound structural features with each stage of the pathway. Meanwhile, in the field of environmental toxicology, the development of qMOA models that cover multi-cyclic causal chains for chemicals with complex structures to assess compound toxicity remains a technological gap.
[0004] Organophosphorus flame retardants (OPFRs) are emerging alternatives to traditional brominated flame retardants. Due to their excellent flame-retardant properties and applications in sustainable development-related industries, OPFR production has been steadily increasing—reaching 492,800 tons and 511,200 tons globally in 2020. Existing research confirms that OPFRs have potential neuroendocrine disrupting effects, and the contradiction between the human exposure issues and health risks arising from their widespread use is becoming increasingly significant. Due to their high lipid solubility, they persist in the environment and enter the human body through inhalation, skin contact, and ingestion, and have been frequently detected in various human biological samples, including urine and blood (detection rate >80% in US urine samples and nearly 100% in Chinese blood samples). Numerous studies have confirmed that mitochondria are a key target organelle for OPFRs. The lipophilic nature of OPFRs allows them to penetrate the phospholipid bilayer and accumulate within mitochondria, significantly impairing their function. Given that mitochondrial dysfunction is an important pathogenic mechanism for many diseases and a key model for evaluating the toxicity of environmental pollutants, in-depth research into the mitochondrial toxicity mechanisms of OPFRs is crucial. Research has clarified the cascade pathway of OPFR-induced mitochondrial toxicity: OPFRs bind to mitochondrial enzymes, constituting a molecular initiation event; subsequent enzyme activity inhibition and mitochondrial dysfunction due to mitochondrial membrane potential collapse constitute a series of key events. This complete toxicity chain from MIE to KEs is a complex mechanism that traditional QSAR models struggle to capture and quantify.
[0005] In summary, existing QSAR technology has significant limitations in dealing with emerging organic pollutants with complex structures and diverse mechanisms. There is an urgent need for a quantitative toxicity model (qMOA) framework that combines compound structural characteristics with key biological event cascade mechanisms to achieve accurate and mechanistic prediction and risk assessment of chemical toxicity. Summary of the Invention
[0006] The purpose of this invention is to overcome the shortcomings of existing technologies and provide a method for predicting the toxicity of organic compounds based on their structural characteristics and cascade mechanisms of toxic action. This method can predict the toxicity of compounds based on their structure and concentration, thereby providing necessary basic data for chemical risk assessment and environmental regulation.
[0007] To achieve the above objectives, this invention provides a method for constructing a quantitative toxicity mode assessment framework for evaluating the toxicity of organic compounds, comprising the following steps:
[0008] (1) Constructing a target organic compound dataset: Basic information of multiple organic compounds, including their full name, abbreviation, and CAS number, was obtained by consulting relevant literature;
[0009] (2) Obtaining a dataset of organic compound structure parameter descriptors: The molecular structure parameters of the organic compounds to be studied were organized using the online database Pubchem (https: / / www.ncbi.nlm.nih.gov / pccompound), resulting in the following four aspects of structure descriptors: molecular size, molecular complexity, molecular lipophilicity, and molecular polarity; among which, molecular size includes relative molecular mass and number of heavy atoms; molecular complexity includes the number of rotatable bonds, number of localized π bonds, number of delocalized π bonds, number of single bonds (σ bonds), bond energy, number of hydrogen atoms, number of covalent bonds, and structural symmetry; molecular lipophilicity includes the hydrophobicity (oil affinity) of organic compounds XLogP3 and the hydrophobicity parameter logP; molecular polarity includes polar surface area (PSA). Topological polar surface area (TPSA, representing the polarity, solubility, permeability, etc. of a molecule), the number of hydrogen bond donors, and the number of hydrogen bond acceptors; these descriptors need to be added or removed according to the specific structural characteristics of the target compound. For example, for compounds like OPFRs, based on their special chemical bonds and elements, structural descriptors in molecular complexity can be added: the number of P=O double bonds, the number of PO single bonds, the number of halogen atoms, the number of F atoms, and the number of Cl atoms; in order to construct a high-quality compound dataset, when constructing the organic compound structural parameter descriptor dataset, it is necessary to remove some compounds with undesirable chemical characteristics, including known toxic groups, metabolically unstable compounds, and polymers;
[0010] (3) Constructing a dataset of binding affinity of key target proteins in the interaction mode of organic compounds with toxicity: Two-dimensional structures of organic compounds were obtained from the online database Pubchem and converted into ligand structures for docking using the Molecular Manipulation Environment (MOE) software package; the target protein information of the target compounds was determined, and the X-ray crystal structures of the target proteins of organic compounds were downloaded from the RCSB protein database (http: / / www.rcsb.org / pdb) and processed using AutoDock 4.2 software to remove water molecules and co-crystallized ligands from the crystal proteins; subsequently, the active site residues of the target proteins were identified from the UniProt database (https: / / www.uniprot.org), and the binding pocket region for molecular docking was defined accordingly, using Vina The 1.1.0 program performs semi-flexible docking between organic compounds and the active sites of target proteins. After molecular docking, the binding affinity of each organic compound to the target protein is obtained; a lower binding affinity indicates a stronger binding ability. To overcome the randomness of the docking program, avoid the misleading effect of a single run, and ensure the reliability of the results, each organic compound is docked with the target protein 20 times, generating 20 binding affinities. The result with the lowest binding affinity is selected as the binding affinity between that organic compound and that target protein. When there are multiple target proteins, the sum of the binding affinities of each organic compound with all target proteins is taken as the binding affinity between that organic compound and all target proteins.
[0011] (4) Training and selection of the best ML-QSAR model: Eight existing machine learning models suitable for quantitative structure-activity relationship model were selected for training and selection. The eight machine learning models are linear regression (LR), support vector regression (SVR), decision tree (Trees), random forest (RF), XGBoost, LightGBM, Stacking, and neural network (ANN). On the one hand, a five-fold resampling and cross-validation strategy was used to evaluate the models. All organic compound structure descriptors and corresponding binding affinity data were randomly divided into training and test datasets in a 4:1 ratio. By using the structure descriptors of each organic compound in the training dataset as independent variables and the binding affinity with the target protein as dependent variables, the hyperparameter tuning method of "random search" was used to optimize the parameters of each ML-QSAR model to establish eight ML-QSAR models. Then, the organic compounds in the test dataset were used to optimize the parameters of each ML-QSAR model. The structural descriptors are input into the ML-QSAR model to obtain the predicted binding affinity of organic compounds to target proteins. These values are then compared with the values obtained from molecular docking, and the root mean square error (RMSE) is calculated to evaluate the performance of each model. A lower RMSE value indicates better model performance. The model with the lowest RMSE value and its parameters are used to construct the optimal model and for subsequent analysis. On the other hand, to determine whether the model is overfitting or underfitting, the structural descriptors of all organic compounds are input into the ML-QSAR model to obtain the binding affinity predicted by the ML-QSAR model using the structural descriptors. The root mean square error (RMSE) between the model's predicted values and the values obtained from molecular docking is further calculated. The RMSE values of the eight models are compared. If a model has the lowest RMSE value among all organic compound results and also the lowest RMSE value in cross-validation, the consistency of their rankings indicates that the model has the best overall performance among the eight candidate models and does not exhibit significant underfitting.
[0012] (5) Structure-activity relationship analysis: Calculate the importance weight of each structural descriptor to the binding affinity, screen out the key structural descriptors that determine the binding affinity, and use them as key structural factors that determine the toxicity of organic compounds. Specifically, the importance of all structural descriptors in the binding affinity of organic compounds to target proteins is ranked by the glmm.hp package in R language, which is used to calculate the contribution of hierarchical division (the output is a percentage, and the larger the percentage, the more important it is). The key structural factors that affect the binding affinity of organic compounds to target proteins are obtained, which can be used as the key structures that may affect the toxicity of organic compounds.
[0013] (6) Verification of the accuracy of structure-activity relationship: The accuracy of the structure-activity relationship predicted by computer simulation in step (5) still needs further experimental verification; For the binding affinity obtained by molecular docking, representative organic compounds containing different key structural factors are screened, and the dynamic binding of organic compounds and target proteins is analyzed by molecular dynamics (MD) simulation and surface plasmon resonance (SPR) experiment to obtain more accurate binding capacity data. The binding of organic compounds and target proteins is reflected by the binding capacity results of MD simulation and the KD value results of SPR experiment. The analysis is conducted to see if it conforms to the structure-activity relationship described in step (5) and to verify the accuracy of the structure-activity relationship; Since the binding site of organic compounds and target proteins is their catalytic active site, the binding of organic compounds and target proteins can mediate the changes in subsequent cascade mechanism events, thus proving the accuracy of the structure-activity relationship predicted by the ML-QSAR model;
[0014] (7) Constructing a dataset of organic compounds with key structural features acting on MOA: Based on the representative organic compounds in step (6), using the optimal ML-QSAR model established in step (4), inputting the structural parameters of the organic compounds obtained in step (2), the binding affinity of the above compounds is predicted, which serves as the molecular initiation event (MIE) dataset in acting on MOA; further, based on the key structural factors verified in steps (5) and (6), an exposure group concentration gradient experiment is designed, exposing cells to solutions of organic compounds of different concentrations, and measuring the fold change of representative organic compounds relative to the control group in subsequent MIE cascade key events KE1 and KE2, respectively. The calculation rule is: fold change of the exposure group relative to the control group = value of the exposure group ÷ value of the control group. The results obtained represent the quantitative effect of toxicity. The control group consisted of cells not exposed to organic compounds. This allowed for the construction of a quantitative response dataset between the MIE (Mean Interaction) and the key events KE1 and KE2 of the cascade. The specific measurement method involved measuring the values of key events KE1 and KE2 in the MOA cascade for representative organic compounds with key structural features in the exposed group, as well as the values in the control group (cells not exposed to organic compounds). The fold change in key events KE1 and KE2 between the exposed group and the control group was used as the KE1 and KE2 datasets. The calculation rule was: fold change between the exposed group and the control group = exposed group value ÷ control group value (e.g., if the control group value is 1 and the KE group value is 1.5, then the fold change is 1.5 ÷ 1 = 1.5). The results represent the quantitative impact of toxicity and were used as the KE1 and KE2 datasets.
[0015] (8) Constructing the qMOA framework: Based on the dataset obtained in step (7), a three-level quantitative toxicity model qMOA framework is constructed: The first level is the quantitative calculation of the molecular initiation event (MIE) using the ML-QSAR model based on structural parameters; the second level is the quantitative calculation of the cascade key event (KE1) based on the molecular initiation event; and the third level is the quantitative calculation of the cascade key event (KE2) based on the KE1 key event. The specific process of establishing the qMOA framework is as follows:
[0016] 1) Calculate molecular initiation events based on the best ML-QSAR model: Using the ML-QSAR model established in step (4), input the organic compound structure parameter descriptor in step (2) to calculate the MIE dataset of representative organic compounds with key structural features in the MOA.
[0017] 2) Establish the quantitative relationship between MIE and cascaded KE1 events: Based on the MIE dataset obtained in 1) and the KE1 event dataset measured at different dose gradients in step (7), MIE and concentration are used as independent variables 1 and 2 respectively, and KE1 events are used as dependent variables to construct the quantitative relationship equation between MIE, concentration and cascaded KE1 events.
[0018] 3) Establish the quantitative relationship between KE1 and cascaded KE2 events: Based on the KE1 and KE2 event datasets with specific concentration gradient correspondence in step (7), use KE1 event as independent variable and KE2 event as dependent variable to construct the quantitative relationship equation between KE1 and cascaded KE2 events.
[0019] (9) Toxicity prediction and application: Input the structure descriptor of the organic compound to be tested and the set concentration into the qMOA framework, and calculate the MIE prediction value, KE1 prediction value and KE2 prediction value in sequence. Based on the KE2 prediction value, the toxicity of the organic compound to be tested is quantitatively evaluated.
[0020] In practical applications, the qMOA framework established in this invention only requires inputting the structural characteristics and concentration of the organic compound to be tested to sequentially calculate the quantitative data of MIE, KE1, and KE2 events. The final calculated KE2 data represents the toxicity of the organic compound based on this MOA toxicity mechanism.
[0021] The organic compounds described in this invention include various organic compounds that produce biological effects by binding to cellular molecular targets. For different types of compounds, the cascade events (including MIEs and KEs) in their quantitative toxicity mode of action will depend on the compound's structural features, target properties, and specific toxic mechanism pathways. Although the molecular targets and cascade mechanisms vary among different compounds, the quantitative toxicity mode of action framework constructed in this invention is universal: that is, by establishing a quantitative mapping relationship of "structural features-MIE-cascade KEs," toxicity assessment can be performed on any organic compound with a clearly defined target.
[0022] This invention uses OPFRs as example organic compounds and incorporates the mitochondrial damage cascade mechanism into the qMOA framework, proposing a method for constructing a cascade mechanism-driven quantitative toxicity assessment framework for evaluating the toxicity of organic compounds. The method specifically includes the following steps:
[0023] (1) Constructing a target organic compound dataset: Basic information of multiple organic compounds, including their full name, abbreviation, and CAS number, was obtained by consulting relevant literature;
[0024] (2) Obtaining a dataset of organic compound structure parameter descriptors: The molecular structure parameters of the organic compounds to be studied were organized using the online database Pubchem, and the following four aspects of structure descriptors were obtained: molecular size, molecular complexity, molecular lipophilicity, and molecular polarity. Among them, molecular size includes relative molecular mass and number of heavy atoms; molecular complexity includes number of rotatable bonds, number of localized π bonds, number of delocalized π bonds, number of single bonds (σ bonds), bond energy, number of hydrogen atoms, number of covalent bonds, and structural symmetry. For compounds such as OPFRs, molecular complexity structure descriptors of the number of P=O double bonds, number of PO single bonds, number of halogen atoms, number of F atoms, and number of Cl atoms were added according to their special chemical bonds and elements; molecular lipophilicity includes organic compound hydrophobicity (oil affinity) XLogP3 and hydrophobicity parameter logP; molecular polarity includes polar surface area (PSA), topological polar surface area (TPSA, representing molecular polarity, solubility, permeability, etc.), number of hydrogen bond donors, and number of hydrogen bond acceptors.
[0025] (3) Constructing a dataset of structural parameters of the target organic compound and its binding affinity to key target proteins in the toxic action mode: The two-dimensional structure of the organic compound was obtained from the online database Pubchem and converted into a ligand structure using the Molecular Manipulation Environment (MOE) software package for docking; the target protein information of the target compound was determined, and the X-ray crystal structure of the target protein of the organic compound was downloaded from the RCSB protein database (http: / / www.rcsb.org / pdb) and processed using AutoDock 4.2 software to remove water molecules and co-crystallized ligands from the crystal protein; subsequently, the active site residues of the target protein were identified from the UniProt database (https: / / www.uniprot.org), and the binding pocket region for molecular docking was defined accordingly, using Vina The 1.1.0 program performs semi-flexible docking between organic compounds and the active sites of target proteins. After molecular docking, the binding affinity of each organic compound to the target protein is obtained; lower binding affinity indicates stronger binding ability. To overcome the randomness of the docking program, avoid the misleading effect of a single run, and ensure the reliability of the results, each OPFR is docked with the target protein 20 times, generating 20 binding affinities. The result with the lowest binding affinity is selected as the binding affinity between the organic compound and the target protein. When there are multiple target proteins, the sum of the binding affinities of each OPFR with all target proteins is taken as the binding affinity between the OPFR and the target proteins.
[0026] (4) Training and selection of the best ML-QSAR model: Eight existing machine learning models suitable for quantitative structure-activity relationship (QSAR) models were selected for training and selection. The eight machine learning models were linear regression (LR), support vector regression (SVR), decision tree (Trees), random forest (RF), XGBoost, LightGBM, Stacking, and neural network (ANN). On the one hand, a five-fold resampling and cross-validation strategy was used to evaluate the models. All organic compound structure descriptors and their corresponding binding affinity data were randomly divided into training and test datasets at a ratio of 4:1. By using the structure descriptors of each organic compound in the training dataset as independent variables and the binding affinity with the target protein as dependent variables, the hyperparameter tuning method of "random search" was used to optimize the parameters of each model to establish eight ML-QSAR models. Then, the structure descriptors of each organic compound in the test dataset were substituted into the ML-QSAR model to obtain The predicted binding affinity to the target protein was compared with the value obtained from molecular docking, and the root mean square error (RMSE) was calculated to evaluate the performance of each model. The lower the RMSE value, the better the model performance. The model with the lowest RMSE value and its parameters were used to construct the optimal model and for subsequent analysis. On the other hand, in order to determine whether the model is overfitting or underfitting, the structural descriptors of all organic compounds were input into the ML-QSAR model to obtain the binding affinity predicted by the ML-QSAR model through the structural descriptors. The root mean square error (RMSE) between the model prediction value and the value obtained from molecular docking was further calculated. The RMSE values of 8 models were compared. If a model has the lowest RMSE value in the comparison of all organic compound results, and its RMSE value in cross-validation is also the lowest, the consistency of the ranking of the two indicates that the model has the best overall performance among the 8 candidate models and does not show obvious underfitting. For organophosphorus flame retardants (OPFRs) and key enzymes in mitochondrial metabolism, the optimal model is XGBoost.
[0027] (5) Structure-activity relationship analysis: The importance of all structural descriptors in the binding affinity of OPFRs to key enzymes of mitochondrial metabolism was ranked by “glmm.hp” in R language (the output is a percentage, and the larger the percentage, the more important it is). It was found that the key structural factors affecting the binding affinity of OPFRs to key enzymes of mitochondrial metabolism are the type of substituent and the number of benzene rings, which are the possible key structures that affect the toxicity of OPFRs.
[0028] (6) For representative OPFRs with different substituent types and benzene ring numbers, the structure-activity relationship of their binding affinity to target proteins was verified by experiments;
[0029] (7) For the representative OPFRs selected in step (6), based on the XGBoost-QSAR model established above, the structural parameters of the organic compounds obtained in step (2) are input to predict the binding affinity of the above representative OPFRs to the target protein, which is used as the MIE event dataset in the mitochondrial toxicity MOA of OPFRs; an exposure group concentration gradient experiment is designed to expose cells to different concentrations of representative OPFRs solution, and the cascade mitochondrial metabolic activity in the toxic mode of the above representative OPFRs and the mitochondrial metabolic activity of the control group (cells not exposed to OPFRs solution) are measured. The fold change of mitochondrial metabolic activity of the exposure group relative to the control group is used as the KE1 event number. The fold change in mitochondrial metabolic activity in the exposed group relative to the control group was calculated as follows: fold change = mitochondrial metabolic activity in the exposed group ÷ mitochondrial metabolic activity in the control group. This fold change represents the quantitative toxic effect of OPFRs on mitochondrial metabolic activity and is used as the KE1 event dataset. The mitochondrial membrane potential in the cascade of OPFRs in the exposed group and the mitochondrial membrane potential in the control group were measured. The fold change in mitochondrial membrane potential in the exposed group relative to the control group was used as the KE2 event dataset. The fold change in mitochondrial membrane potential in the exposed group relative to the control group was calculated as follows: fold change = mitochondrial membrane potential in the exposed group ÷ mitochondrial membrane potential in the control group. This result represents the quantitative toxic effect of OPFRs on mitochondrial metabolic activity and is used as the KE2 event dataset.
[0030] (8) Constructing the qMOA framework: Based on the binding affinity (MIE) of OPFRs to key enzymes of mitochondrial metabolism predicted by the XGBoost model, and the KE1 event dataset measured at different concentration gradients in step (7), MIE and concentration are used as independent variables 1 and 2, respectively, and KE1 events are used as dependent variables to construct a quantitative relationship equation between MIE, concentration and cascaded KE1 events; then, based on the KE1 and KE2 event datasets corresponding to specific concentration gradients in step (7), KE1 events are used as independent variables and KE2 events are used as dependent variables to construct a quantitative relationship equation between KE1 and cascaded KE2 events; thus forming the qMOA assessment framework of "OPFRs structure-key enzyme binding affinity-mitochondrial metabolic activity-mitochondrial membrane potential" for OPFRs-induced mitochondrial toxicity.
[0031] The present invention also provides a quantitative toxicity mode assessment framework for evaluating the toxicity of organophosphorus flame retardants constructed by the aforementioned construction method. By simply inputting the structural characteristics and concentration of the organophosphorus flame retardant to be tested, quantitative data of MIE, KE1 and KE2 events can be calculated sequentially. The final calculated KE2 data is the toxicity level of the organophosphorus flame retardant.
[0032] Compared with existing technologies, this invention incorporates the perspective of the targeted toxic mechanism (MIE-KE1-KE2) of organic compounds. Based on a quantitative structure-MIE relationship model constructed using machine learning methods, it further combines experimental data of KE1 and KE2 cascade events to establish a qMOA framework to predict the toxicity of organic compounds to their key toxic mechanisms. This invention can quickly and effectively predict the toxicity of organic compounds based on their structure. The established qMOA framework is applicable to the toxicity assessment of other compounds in the same class used to establish the framework, thus providing necessary basic structural data for organic compound risk assessment and environmental monitoring, and has significant social and economic value. Attached Figure Description
[0033] Figure 1 This is a flowchart illustrating the process of establishing the qMOA framework for this invention.
[0034] Figure 2 This invention relates to a performance comparison of eight machine learning models, where a is 5-fold cross-validation and b is the model fit evaluation.
[0035] Figure 3 This invention relates to a comparison of the importance of structural parameters in structure-property relationships.
[0036] Figure 4 The results of the linear correlation analysis between the number of benzene rings in the OPFRs structure involved in this invention and the binding affinity to the target protein are shown. A negative β indicates a negative correlation between the two, and a p value less than 0.01 indicates that the result is statistically significant.
[0037] Figure 5 This invention relates to the binding affinity of different substituent types in the OPFRs structure to target proteins.
[0038] Figure 6 The present invention relates to the correlation analysis results of QSAR model predictions, MD simulations, and SPR experimental values, where a represents the correlation analysis between QSAR model predictions and SPR experimental values, with an r value of 0.97 indicating a strong correlation and a p value less than 0.01 indicating statistical significance; b represents the correlation analysis between QSAR model predictions and MD simulations, with an r value of 0.91 indicating a strong correlation and a p value less than 0.05 indicating statistical significance.
[0039] Figure 7 The results of mitochondrial metabolic activity assays under exposure to OPFRs with typical structural features as described in this invention are shown in the figure. ** indicates that the changes in the exposed group are statistically significant compared to the control group.
[0040] Figure 8The effects of exposure to OPFRs with typical structural features as described in this invention on mitochondrial function are as follows: a. Changes in mitochondrial membrane potential under exposure to OPFRs with different substituent types; b. Changes in mitochondrial membrane potential under exposure to OPFRs with different numbers of benzene rings. A negative β value indicates a negative correlation, and a p value less than 0.01 indicates statistical significance.
[0041] Figure 9 This invention relates to a framework for a quantitative toxic model of the effects of exposure to OPFRs with typical structural features on mitochondrial function.
[0042] Figure 10 This invention relates to the accuracy evaluation of the calculated values of the quantitative toxicity model framework, where a is the binding affinity of OPFRs to target proteins, b is the predicted toxicity of OPFRs to mitochondrial metabolic activity, and c is the model-predicted toxicity of OPFRs to mitochondrial membrane potential. Detailed Implementation
[0043] The present invention will be further described in detail below with reference to specific embodiments and accompanying drawings.
[0044] Example 1:
[0045] This embodiment relates to a framework construction method for predicting the quantitative toxicity mode of organic compounds. Organophosphorus flame retardants are selected as example compounds, and known key toxicity modes of OPFRs (OPFRs binding to mitochondrial enzymes – enzyme activity inhibition – mitochondrial dysfunction due to mitochondrial membrane potential collapse) are incorporated to establish the qMOA framework. The specific steps are as follows:
[0046] (1) Through relevant literature search and condition screening (with registered CAS numbers and available two-dimensional structures and basic physicochemical structural parameters), 59 OPFRs were finally obtained for the construction of structure-property relationship model;
[0047] (2) The molecular structure parameters of 59 OPFRs were sorted out using the online database Pubchem, and the following structural descriptors were obtained: 2 molecular size-related structural descriptors (relative molecular mass and number of heavy atoms), 13 molecular complexity-related structural descriptors (number of rotatable bonds, number of P=O double bonds, number of PO single bonds, number of localized π bonds, number of delocalized π bonds, number of single bonds (σ bonds), bond energy, number of halogen atoms, number of F atoms, number of Cl atoms, number of hydrogen atoms, number of covalent bonds, and structural symmetry), 2 molecular lipophilicity-related structural descriptors (hydrophobicity (lipophilicity) of organic compounds XLogP3 and hydrophobicity parameter logP), and 4 molecular polarity-related structural descriptors (polar surface area (PSA), topological polar surface area (TPSA, representing the polarity, solubility, permeability, etc. of molecules), number of hydrogen bond donors, and number of hydrogen bond acceptors), which were used to construct a database for machine learning model training;
[0048] (3) X-ray crystal structures of seven key mitochondrial enzymes were downloaded from the RCSB protein database. These seven enzymes include the three rate-limiting enzymes in the tricarboxylic acid cycle: citrate synthase (CS), isocitrate dehydrogenase (ICDH), and α-ketoglutarate dehydrogenase (α-KGDH) complex, as well as mitochondrial complexes I-IV in oxidative phosphorylation. These seven enzymes play a crucial role in mitochondrial aerobic energy metabolism. Subsequently, the active site residues of the target proteins were identified using the UniProt database to assist in locating the binding bag during the docking process. After molecular docking, the binding affinity of OPFRs to the seven key mitochondrial enzymes was obtained. The lower the binding affinity, the stronger the binding ability.
[0049] (4) Training and selection of the best ML-QSAR model: Eight existing machine learning models suitable for quantitative structure-activity relationship models were selected for training and selection. The eight machine learning models are linear regression (LR), support vector regression (SVR), decision tree (Trees), random forest (RF), XGBoost, LightGBM, Stacking, and neural network (ANN). On the one hand, a five-fold resampling and cross-validation strategy was used to evaluate the models. All organic compound structure descriptors and corresponding binding affinity data were randomly divided into training dataset and test dataset in a 4:1 ratio. In the test dataset, the structural descriptors of each organic compound in the training dataset were used as independent variables, and the binding affinity with the target protein was used as the dependent variable. A "random search" hyperparameter tuning method was employed to optimize the parameters of each ML-QSAR model, resulting in eight ML-QSAR models. Furthermore, the structural descriptors of each organic compound in the test dataset were input into the ML-QSAR models to obtain predicted binding affinity values with the target protein. These values were compared with the values obtained from molecular docking, and the root mean square error (RMSE) was calculated to evaluate the performance of each model. A lower RMSE value indicates better model performance. Cross-validation results showed that the XGBoost model had the lowest RMSE value. Figure 2 XGBoost, with the best performance, was chosen as the optimal model and its parameters for subsequent analysis. To determine if overfitting or underfitting was present, the structural descriptors of all organic compounds were input into the ML-QSAR model to obtain the binding affinity predicted by the ML-QSAR model using these descriptors. The root mean square error (RMSE) between the model predictions and the molecular docking results was then calculated. Comparing the RMSE values of the eight models, XGBoost had the lowest RMSE value when comparing all organic compound results, and it also had the lowest RMSE value in cross-validation. This consistency in ranking indicates that XGBoost has the best overall performance among the eight candidate models and does not exhibit significant underfitting. Figure 2 Therefore, the best machine learning model is the XGBoost model;
[0050] (5) Structure-activity relationship analysis: The importance of all structural descriptors in the binding affinity of OPFRs to key mitochondrial metabolic enzymes was ranked using the glmm.hp package in R language, which is used to calculate the contribution of hierarchical division (the output is a percentage, and the higher the percentage, the more important it is). The results showed that the key structural factors affecting the binding affinity of OPFRs to key mitochondrial metabolic enzymes were substituent type (39.14%) and number of benzene rings (55.34%), as shown in the attached figure. Figure 3As shown, this is a possible key structure for OPFRs to influence toxicity; subsequent linear analysis revealed a significant negative correlation between the number of benzene rings in OPFRs and their binding affinity to target proteins, with OPFRs having multiple benzene ring structures exhibiting higher binding affinity, as shown in the attached figure. Figure 4 As shown in the attached figure; for different substituent types, when the substituent is a benzene ring structure, the binding affinity of OPFRs to the target protein is significantly higher than that of other substituent types. Figure 5 As shown; therefore, when the substituents of OPFRs are benzene ring structures, especially polybenzene ring structures, OPFRs can lead to stronger mitochondrial toxicity;
[0051] (6) Verification of the accuracy of QSAR model prediction of OPFRs binding affinity to target proteins: The accuracy of the binding affinity of OPFRs to target proteins predicted by computer simulation in step (4) still needs further experimental verification; Four representative OPFRs were selected for subsequent analysis based on the typical structural features affecting the binding affinity of OPFRs to target proteins, namely the number of benzene rings and the type of substituents, including diethyl phenylphosphonate (DPP), diphenyl toluene phosphate (CDP), resorcinol tetraphenyl diphosphate (RDP), and bisphenol A bis(diphenyl phosphate) (BDP). The compounds included OPFRs with 1, 3, 5, and 6 benzene rings, and triethyl phosphate (TEP), trichloroethyl phosphate (TCEP), tri(2,2,2-trifluoroethyl) phosphate (TFEP), and triphenyl phosphate (TPP), with substituents of alkyl, Cl, F, and benzene rings, respectively. For the binding affinity predicted by computer simulation, the dynamic binding of organic compounds to target proteins was analyzed using MD simulation and SPR experiments to obtain more accurate binding capacity and demonstrate the accuracy of computer simulation predictions of the binding affinity between OPFRs and target proteins. The specific steps are as follows:
[0052] 1) MD simulations were performed using Amber software (University of California, San Francisco, CA version). The dynamic binding of OPFRs with typical structural features to ICDH enzyme (isocitrate dehydrogenase), the target protein with the strongest binding ability in the tricarboxylic acid cycle and oxidative phosphorylation pathway, and mitochondrial complex I was simulated, and the binding of OPFRs to target proteins was evaluated. Since the binding site of OPFRs to mitochondrial complex I is located in the transmembrane region of mitochondria and is surrounded by a phospholipid bilayer, the environment is complex and there are many influencing factors. Therefore, the influence of different structural features on the binding of OPFRs to ICDH was analyzed through MD simulation results.
[0053] 2) The SPR experiment was conducted using the Biacore T200 SPR biosensor system (Cytiva, catalog no. 28975001), with the CM7 sensor chip used. The results were analyzed using Biacore evaluation software, and the equilibrium dissociation constant KD value of OPFRs and ICDH was calculated to evaluate the binding ability. When the substituent is a benzene ring structure, especially a polybenzene ring structure, the KD value is the lowest, which is 0.152 mM. The smaller the KD value, the stronger the binding ability of OPFRs to ICDH enzyme.
[0054] The equilibrium dissociation constant KD is calculated as shown in the formula:
[0055] Equilibrium dissociation constant KD = Binding constant Ka ÷ Dissociation constant Kd
[0056] 3) The accuracy of the computer simulation results was evaluated using the KD values of OPFRs obtained from SPR experiments and the binding affinity obtained from MD simulations. Among them, the correlation analysis results of the binding affinity predicted by machine learning and the KD values of OPFRs with typical structural features measured by SPR experiments and the binding affinity obtained from MD simulations were used to evaluate the accuracy of the computer simulation results. Figure 6 The results showed that the correlation coefficients r were 0.91 and 0.97, respectively, and the p values were both less than 0.05, which proved the accuracy of the XGBoost-QSAR model in simulating and predicting the binding affinity of OPFRs to target proteins.
[0057] (7) Dataset establishment of representative OPFRs for mitochondrial toxic MOA: For the 8 representative OPFRs selected in step (6), based on the XGBoost-QSAR model established in step (4), the organic compound structure parameters obtained in step (2) were input to predict the MIE events (binding affinity with 7 key mitochondrial enzymes) dataset of the above 8 OPFRs for mitochondrial toxic MOA; further, the exposure concentrations of OPFRs in the exposure group were set as 1, 10, 50, 100, 200, 400, 600 and 800 μg / ml, and the mitochondrial metabolic activities of the 8 representative OPFRs on human macrophage cell line (THP-1 cells) and control were measured respectively. Mitochondrial metabolic activity in THP-1 cells (cells not exposed to OPFRs) was measured. Quantitative data on the fold change in mitochondrial metabolic activity relative to the control group were calculated for each exposed group. These fold change data were then compiled into the KE1 event dataset. For each OPFR, key exposure concentrations of 100 and 200 μg / ml were selected. Mitochondrial membrane potential in THP-1 cells of the exposed group and the control group were further measured. Quantitative data on the fold change in mitochondrial membrane potential relative to the control group (unexposed to OPFRs) was calculated for each exposed group. These fold change data were then compiled into the KE2 event dataset. The details are as follows:
[0058] 1) Calculation of the binding affinity of OPFRs to 7 key mitochondrial enzymes: Based on the XGBoost-QSAR model established in step (4), the structural parameters of organic compounds obtained in step (2) are input to obtain quantitative data of the binding affinity of 8 OPFRs to 7 key mitochondrial enzymes. The sum of the binding affinity of the same OPFR to 7 key mitochondrial enzymes is taken as the binding affinity of the OPFR to the mitochondrial enzyme, and the MIE event dataset in the mitochondrial toxic MOA chain of OPFRs is obtained.
[0059] 2) Mitochondrial metabolic activity assay: Mitochondrial metabolic activity was measured using a mitochondrial dehydrogenase activity assay kit (MA0218, Meilunbio, Dalian, China). THP-1 cells were treated with eight OPFRs at concentrations of 1, 10, 50, 100, 200, 400, 600, and 800 μg / mL. The mitochondrial dehydrogenase activity of the treated cells was measured according to the kit instructions, and the mitochondrial dehydrogenase activity of the control group was also measured. The fold change in THP-1 mitochondrial dehydrogenase activity of each exposed group relative to the control group was calculated as: mitochondrial dehydrogenase activity of exposed group THP-1 cells ÷ mitochondrial dehydrogenase activity of control group THP-1 cells (the smaller the value, the greater the toxicity to mitochondrial metabolic activity). The KE1 event dataset in the MOA chain of OPFR mitochondrial toxicity was obtained. Results are shown below. Figure 7 As shown, when the substituent type is a benzene ring structure, especially a polybenzene ring structure, OPFRs have a stronger inhibitory effect on mitochondrial metabolic activity.
[0060] 3) Measurement of Mitochondrial Dysfunction: Mitochondrial membrane potential was used to evaluate mitochondrial dysfunction. The Red CMXRos mitochondrial tracker (MB6046, Meilunbio, Dalian, China) method was used to analyze mitochondrial membrane potential. THP-1 cells were exposed to OPFRs at concentrations of 100 μg / mL and 200 μg / mL, and then co-incubated with 500 nM Red CMXRos in 6-well plates at 37°C for 30 min. After washing the cells twice with PBS, they were resuspended in 600 μL of PBS containing 1% fetal bovine serum. The fluorescence intensity of the Red CMXRos mitochondrial tracker was measured by flow cytometry (CytoFLEX, Beckman Coulter, CA, USA) at excitation (579 nm) and emission (599 nm) wavelengths. Following the manufacturer's instructions, the Red CMXRos mitochondrial tracker fluorescence intensity of each sample from the exposed and control groups was analyzed using the FlowJo_v10.6.22 method. The average fluorescence intensity of CMXRos was quantified to determine changes in mitochondrial membrane potential; the fold change in mitochondrial membrane potential for each exposure group relative to the control group was calculated as follows: average fluorescence intensity of Red CMXRos in THP-1 cells of the exposed group ÷ average fluorescence intensity of Red CMXRos in THP-1 cells of the control group (a larger value indicates greater toxicity to mitochondrial membrane potential). A dataset of KE2 events in the MOA chain of OPFRs-induced mitochondrial toxicity was constructed; the measurement results are as follows. Figure 8As shown, when the substituent type is a benzene ring structure, especially a polybenzene ring structure, the mitochondrial membrane potential is the lowest. The decrease in mitochondrial membrane potential means that mitochondrial function is impaired, indicating that OPFRs have a stronger inhibitory effect on mitochondrial function at this time.
[0061] (8) Constructing a qMOA framework for the mitochondrial toxicity of OPFRs:
[0062] 1) Calculate the binding affinity of OPFRs to seven key mitochondrial enzymes based on the XGBoost-QSAR model: such as Figure 9 As shown, the XGBoost-QSAR model established based on step (4) is the first part of the qMOA framework for the mitochondrial toxicity of OPFRs. This part... Pass By inputting the OPFRs structural parameters obtained in step (2), quantitative data on the binding affinity of 8 OPFRs to 7 key mitochondrial enzymes can be quantitatively calculated. Then, the sum of the binding affinity of the same OPFR to the 7 key mitochondrial enzymes is taken as the binding affinity of that OPFR to the mitochondrial enzyme. For TEP, TCEP, TFEP, TPP, DPP, CDP, RDP, and BDP, the sums of their binding affinity to the 7 key mitochondrial enzymes are -31.4024, -29.8893, -42.6813, -55.1524, -40.2580, -52.2072, -65.4574, and -67.6362, respectively. Figure 10 As shown in Figure .a, in this invention, the correlation coefficient r between the XGBoost-QSAR model's predicted binding affinity and the molecular docking binding affinity is 0.88, with a p-value less than 0.05. 2 The value of 0.8005 indicates that the model's prediction accuracy is good;
[0063] 2) Establishment of the quantitative relationship between the binding affinity of OPFRs to 7 key mitochondrial enzymes and the cascade mitochondrial metabolic activities: Considering that there is often a nonlinear mapping relationship between upstream and downstream mechanisms in the cascade of toxic action, a simple linear model is difficult to accurately describe the correlation between upstream and downstream. Therefore, in this embodiment, a nonlinear fitting model is selected to establish the quantitative relationship between the binding affinity of OPFRs to 7 key mitochondrial enzymes and the cascade mitochondrial metabolic activities. The MIE dataset based on the sum of the binding affinity of each of the 8 OPFRs to the 7 key mitochondrial enzymes in step (7) 1) is used as independent variable 1, the exposure concentration in step (8) 2) is used as independent variable 2, and the KE1 event data of the effect of OPFRs exposure on the mitochondrial dehydrogenase activity of THP-1 cells obtained from the experiment is used as dependent variable. A nonlinear fitting Poly2D model is used to establish the quantitative relationship between binding affinity and OPFRs exposure concentration and mitochondrial metabolic activities, specifically as follows: Figure 9As shown, the quantitative relationship equations established in this embodiment between the binding affinity of OPFRs to seven key mitochondrial enzymes, the exposure concentration of OPFRs, and the cascade mitochondrial metabolic activities are as follows: OPFR-induced fold change in mitochondrial metabolic activity relative to the control group (KE1 event) = 1.17485 - 0.00128 × (sum of binding affinities of OPFRs to seven key mitochondrial enzymes - 5.26352 × 10⁻¹⁰ −4 × OPFRs exposure concentration -3.99271 × 10 −5 × (Sum of the binding affinity of OPFRs to 7 key mitochondrial enzymes) 2 + 3.1399 × 10 −7 × (OPFRs exposure concentration) 2 + 7.95049 × 10 −6 × Sum of OPFRs' binding affinities to 7 key mitochondrial enzymes × OPFRs exposure concentration, where concentration units are μg / ml; Figure 10 As shown in Figure .b, the mitochondrial metabolic activity calculated based on the binding affinity of OPFRs to seven key mitochondrial enzymes in this invention, compared with the experimentally measured values, has a correlation coefficient r of 0.97 and a p-value less than 0.05. 2 The value is 0.9728, indicating that the quantitative relationship prediction accuracy is good;
[0064] 3) Establishment of the quantitative relationship between OPFRs-induced changes in mitochondrial metabolic activity and cascaded changes in mitochondrial membrane potential: Using the KE1 event dataset (based on the fold change in mitochondrial dehydrogenase activity in the exposed group relative to the control group obtained in step (7)) as the independent variable, and the KE2 event dataset (based on the fold change in mitochondrial membrane potential in the exposed group relative to the control group) as the dependent variable, a logistic regression model was used to establish the quantitative relationship between mitochondrial dehydrogenase activity and mitochondrial membrane potential. Specifically, as follows... Figure 9 As shown, the quantitative relationship equation established in this embodiment regarding the effects of OPFRs exposure on mitochondrial dehydrogenase activity and cascade mitochondrial membrane potential is: OPFRs-induced fold change in mitochondrial membrane potential relative to the control group (KE2 event) = 1.00767 + 0.53210 ÷ (1 + (OPFRs-induced fold change in mitochondrial metabolic activity relative to the control group (KE1 event) ÷ 0.98042) 17.2367 );like Figure 10 As shown in .c, the correlation coefficient r between the mitochondrial membrane potential change calculated based on the changes in mitochondrial metabolic activity perturbed by OPFRs and the experimentally measured value in this invention is 0.87, and the p-value is less than 0.05. 2 The value of 0.8661 indicates that the quantitative relationship prediction accuracy is good;
[0065] The correlation analysis between the binding affinity obtained from molecular docking, the experimentally measured mitochondrial metabolic activity and mitochondrial function values, and the various indicators calculated using the quantitative toxicity model framework, as well as the R... 2 Ultimately, the reliability of the cascade mechanism-driven qMOA assessment framework for organic compound toxicity, which incorporates OPFRs as example compounds and incorporates the mechanistic perspective of organic compound toxicity modes from molecular initiation events to critical events, has been demonstrated.
Claims
1. A method for constructing a quantitative toxicity mode assessment framework for evaluating the toxicity of organic compounds, characterized in that, The steps are as follows: (1) Construct a target organic compound dataset: Obtain the identification information of the organic compounds to be studied, including the full name, abbreviation and CAS number of the compound; (2) Obtaining organic compound structure descriptor dataset: Using the molecular structure parameters of the organic compounds to be studied in the online database Pubchem, the following four aspects of structure descriptors are obtained: molecular size, molecular complexity, molecular lipophilicity and molecular polarity; (3) Construct a dataset of binding affinity of target proteins in the interaction mode of organic compounds with toxicity: obtain two-dimensional structures of organic compounds through the online database Pubchem, and preprocess them into ligand structures for docking using the molecular manipulation environment software package; The target protein information of the target compound was determined, the X-ray crystal structure of the target protein was downloaded from the RCSB protein database, and water molecules and co-crystallization ligands in the crystal protein were removed using AutoDock 4.2 software; then, the active site residues of the target protein were identified from the UniProt database, and the binding pocket region for molecular docking was defined accordingly. The Vina 1.1.0 program was used to perform semi-flexible docking of organic compounds with the active sites of target proteins, and the binding affinity values of each organic compound to key target proteins were calculated. (4) Training and screening of the best ML-QSAR model: Eight machine learning models (linear regression model, support vector regression model, decision tree model, random forest model, XGBoost model, LightGBM model, Stacking model and neural network model) suitable for quantitative structure-activity relationship model were selected for training and screening; the structural descriptors and corresponding binding affinity data were randomly divided into training set and test set in a ratio of 4:1; the structural descriptors in the training set were used as independent variables and the binding affinity with the target protein was used as dependent variables. The above eight machine learning models were trained and the parameters were optimized using a five-fold resampling and cross-validation strategy and a "random search" hyperparameter tuning method to establish eight ML-QSAR models; then the structural descriptors in the test set were input into the ML-QSAR model to obtain the predicted value of binding affinity, which was compared with the value obtained by molecular docking, and the root mean square error (RMSE) value was calculated. By comparing the RMSE values of each model, the model with the lowest RMSE value is selected as the optimal ML-QSAR model for subsequent analysis. (5) Structure-activity relationship analysis: Calculate the importance weight of each structural descriptor to binding affinity, screen out the key structural descriptors that determine binding affinity, and use them as key structural factors that determine the toxicity of organic compounds; (6) Verification of the accuracy of structure-activity relationship: Select representative organic compounds containing different key structural factors, obtain binding capacity data through molecular dynamics simulation and surface plasmon resonance experiment, analyze whether they conform to the structure-activity relationship described in step (5), and verify the accuracy of the structure-activity relationship; (7) Constructing a dataset of modes of toxic action (MOA): Based on the representative organic compounds in step (6), using the optimal ML-QSAR model established in step (4), inputting the structural parameters obtained in step (2), predicting the binding affinity as a molecular initiation event (MIE) dataset in the MOA; designing a concentration gradient experiment, exposing cells to solutions of organic compounds of different concentrations, and measuring the fold change of representative organic compounds relative to the control group for subsequent key events KE1 and KE2 of the MIE cascade; the calculation rule is: fold change of the exposed group relative to the control group = value of the exposed group ÷ value of the control group; the result obtained represents the quantitative effect of toxicity, where the control group is cells not exposed to organic compounds; thus constructing a quantitative response dataset between MIE and key events KE1 and KE2 of the cascade. (8) Constructing the qMOA framework: Based on the dataset obtained in step (7), construct a quantitative toxicity model qMOA framework with three levels: The first level is to quantitatively calculate MIE using the ML-QSAR model based on structural parameters; the second level is to quantitatively calculate the cascade key event KE1 based on MIE combined with concentration. The third level is based on the quantitative calculation of cascaded critical events KE2 using KE1; (9) Toxicity prediction and application: Input the structure descriptor of the organic compound to be tested and the set concentration into the qMOA framework, and calculate the MIE prediction value, KE1 prediction value and KE2 prediction value in sequence. Based on the KE2 prediction value, the toxicity of the organic compound to be tested is quantitatively evaluated.
2. The method according to claim 1, characterized in that, In the structural descriptor described in step (2), molecular size includes relative molecular mass and number of heavy atoms; molecular complexity includes number of rotatable bonds, number of localized π bonds, number of delocalized π bonds, number of single bonds, bond energy, number of hydrogen atoms, number of covalent bonds, and structural symmetry; molecular lipophilicity includes the hydrophobicity of organic compounds XLogP3 and the hydrophobic parameter logP; molecular polarity includes polar surface area, topological polar surface area, number of hydrogen bond donors, and number of hydrogen bond acceptors; the above descriptors need to be added or removed according to the specific structural characteristics of the target compound.
3. The method according to claim 1, characterized in that, In step (3), in order to ensure the reliability of the results, overcome the randomness of the docking procedure, and avoid the misleading effect of a single run, each organic compound docks with the target protein 20 times, generating a total of 20 binding affinities, and the result with the lowest binding affinity is selected as the final data.
4. The method according to claim 1, characterized in that, In step (4), the machine learning model includes linear regression model, support vector regression model, decision tree model, random forest model, XGBoost model, LightGBM model, Stacking model and neural network model; the cross-validation strategy is five-fold cross-validation, and the hyperparameter tuning method is random search method.
5. The method according to claim 1, characterized in that, The specific process of establishing the qMOA framework in step (8) is as follows: 1) Calculate molecular initiation events based on the best ML-QSAR model: Using the ML-QSAR model established in step (4), input the organic compound structure parameter descriptor in step (2) to calculate the MIE dataset of representative organic compounds with key structural features in the MOA. 2) Establish the quantitative relationship between MIE and cascaded KE1 events: Based on the MIE dataset obtained in 1) and the KE1 event dataset measured at different dose gradients in step (7), MIE and concentration are used as independent variables 1 and 2 respectively, and KE1 events are used as dependent variables to construct the quantitative relationship equation between MIE and concentration and cascaded KE1 events. 3) Establish the quantitative relationship between KE1 and cascaded KE2 events: Based on the KE1 and KE2 event datasets with specific concentration gradient correspondence in step (7), use KE1 event as independent variable and KE2 event as dependent variable to construct the quantitative relationship equation between KE1 and cascaded KE2 events.
6. The quantitative toxicity mode assessment framework for evaluating the toxicity of organic compounds, constructed by any of the construction methods of claims 1-5.
7. The quantitative toxicity mode assessment framework for evaluating the toxicity of organic compounds according to claim 1, characterized in that, In practical applications, only the structural characteristics and concentration of the organic compound to be tested need to be input, and the quantitative data of MIE, KE1 and KE2 events can be calculated in sequence. The final calculated KE2 data is the toxicity of the organic compound based on this MOA toxicity mechanism.
8. A method for constructing a quantitative toxicity mode assessment framework for evaluating the toxicity of organophosphorus flame retardants, specifically including the following steps: (1) Constructing a target organic compound dataset: Basic information on multiple organophosphorus flame retardants, including their full name, abbreviation, and CAS number, was obtained by consulting relevant literature; (2) Obtaining structural parameter descriptors for organic compounds: The molecular structural parameters of organophosphorus flame retardants were organized using the online database Pubchem, and the following four aspects of structural descriptors were obtained: molecular size, molecular complexity, molecular lipophilicity, and molecular polarity; among them, molecular size includes relative molecular mass and number of heavy atoms; molecular complexity includes number of rotatable bonds, number of localized π bonds, number of delocalized π bonds, number of single bonds, bond energy, number of hydrogen atoms, number of covalent bonds, and structural symmetry. For compounds such as OPFRs, based on their special chemical bonds and elements, molecular complexity structural descriptors of the number of P=O double bonds, number of PO single bonds, number of halogen atoms, number of F atoms, and number of Cl atoms were added; molecular lipophilicity includes the hydrophobicity of organic compounds XLogP3 and the hydrophobic parameter logP; molecular polarity includes polar surface area, topological polar surface area, number of hydrogen bond donors, and number of hydrogen bond acceptors; (3) Constructing a dataset of structural parameters of target organic compounds and their binding affinity to key target proteins in the toxic action mode: Two-dimensional structures of organophosphorus flame retardants were obtained from the online database Pubchem and preprocessed into ligand structures for docking using a molecular manipulation environment software package; the target protein information of organophosphorus flame retardants, namely key enzymes of mitochondrial metabolism, was determined, and the X-ray crystal structure of the target protein was downloaded from the RCSB protein database, and water molecules and co-crystallized ligands in the crystal protein were removed using AutoDock 4.2 software; subsequently, the active site residues of key enzymes of mitochondrial metabolism were identified from the UniProt database, and the binding pocket region for molecular docking was defined accordingly; the organophosphorus flame retardant was semi-flexibly docked with the active site of the target protein using the Vina 1.1.0 program, and the binding affinity values of each organophosphorus flame retardant to key enzymes of mitochondrial metabolism were calculated. The lower the binding affinity, the stronger the binding ability. (4) Training and selection of the best ML-QSAR model: Eight existing machine learning models suitable for quantitative structure-activity relationship model were selected for training and selection: linear regression model, support vector regression model, decision tree model, random forest model, XGBoost model, LightGBM model, Stacking model and neural network model; all organophosphorus flame retardant structure descriptors and corresponding binding affinity data were randomly divided into training set and test set in a 4:1 ratio; the structure descriptors in the training set were used as independent variables, and the binding affinity with key enzymes of mitochondrial metabolism was used as dependent variables, and a five-fold resampling method was adopted. Eight machine learning models were trained and their parameters optimized using a sampling and cross-validation strategy and a "random search" hyperparameter tuning method to establish eight ML-QSAR models. Then, the structure descriptors of each organic compound in the test set were input into the ML-QSAR models to obtain predicted binding affinity values. These values were compared with those obtained from molecular docking, and the root mean square error (RMSE) was calculated. The model with the lowest RMSE value and its corresponding parameters were selected as the optimal ML-QSAR model for subsequent analysis. For organophosphorus flame retardants and key enzymes in mitochondrial metabolism, the optimal model was XGBoost. (5) Structure-activity relationship analysis: The importance of all structural descriptors in the binding affinity of OPFRs to key enzymes of mitochondrial metabolism was ranked by R language. It was found that the key structural factors affecting the binding affinity of OPFRs to key enzymes of mitochondrial metabolism are the substituent type and the number of benzene rings, which are the key structures affecting the toxicity of OPFRs. (6) For representative OPFRs with different substituent types and benzene ring numbers, the structure-activity relationship of their binding affinity to target proteins was verified by MD simulation and SPR experiment; (7) For the representative OPFRs selected in step (6), based on the aforementioned XGBoost-QSAR model, input the organic compound structural parameters obtained in step (2) to predict the binding affinity of the above representative OPFRs to the target protein, which serves as the MIE event dataset in the mitochondrial toxicity MOA of OPFRs; design an exposure group concentration gradient experiment, expose cells to solutions of different concentrations of representative OPFRs, measure the cascade mitochondrial metabolic activity in the toxic mode of the above representative OPFRs and the mitochondrial metabolic activity of the control group, and use the fold change of mitochondrial metabolic activity of the exposure group relative to the control group as the KE1 event dataset, and the fold change of mitochondrial metabolic activity of the exposure group relative to the control group. The fold change was calculated as follows: fold change = mitochondrial metabolic activity in the exposed group ÷ mitochondrial metabolic activity in the control group. This fold change represents the quantitative toxic effect of OPFRs on mitochondrial metabolic activity and is used as the KE1 event dataset. The mitochondrial membrane potential in the cascade of OPFR toxicity in the exposed group and the mitochondrial membrane potential in the control group were measured. The fold change in mitochondrial membrane potential in the exposed group relative to the control group was used as the KE2 event dataset. The fold change in mitochondrial membrane potential in the exposed group relative to the control group was calculated as follows: fold change = mitochondrial membrane potential in the exposed group ÷ mitochondrial membrane potential in the control group. This fold change represents the quantitative toxic effect of OPFRs on mitochondrial metabolic activity and is used as the KE2 event dataset. The control group consisted of cells not exposed to the OPFRs solution. (8) Constructing the qMOA framework: Based on the binding affinity (MIE) of OPFRs to key enzymes of mitochondrial metabolism predicted by the XGBoost model, and the KE1 event dataset measured at different concentration gradients in step (7), MIE and concentration are used as independent variables 1 and 2, respectively, and KE1 events are used as dependent variables to construct a quantitative relationship equation between MIE, concentration and cascaded KE1 events; then, based on the KE1 and KE2 event datasets corresponding to specific concentration gradients in step (7), KE1 events are used as independent variables and KE2 events are used as dependent variables to construct a quantitative relationship equation between KE1 and cascaded KE2 events; thus forming the "OPFRs structure-key enzyme binding affinity-mitochondrial metabolic activity-mitochondrial membrane potential" qMOA assessment framework for OPFRs-induced mitochondrial toxicity.
9. The quantitative toxicity mode assessment framework for evaluating the toxicity of organophosphorus flame retardants constructed by the method of claim 8.
10. The quantitative toxicity mode assessment framework for evaluating the toxicity of organophosphorus flame retardants according to claim 9, characterized in that, By simply inputting the structural characteristics and concentration of the organophosphorus flame retardant to be tested, quantitative data for MIE, KE1, and KE2 events can be calculated sequentially. The final calculated KE2 data is the toxicity of the organophosphorus flame retardant predicted by the quantitative toxicity mode based on the cascade mechanism.