Method for screening antioxidant small molecules from cortex phellodendri based on machine learning and molecular simulation
By combining multi-model ensemble machine learning and dual-principle molecular docking with hierarchical molecular dynamics simulation, the instability and low efficiency of screening natural product antioxidants in existing technologies have been solved. This approach has enabled the screening of small molecules with potential antioxidant activity, achieving efficient and accurate screening results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHANGCHUN UNIV OF SCI & TECH
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-19
AI Technical Summary
Existing methods for screening natural product antioxidants suffer from problems such as insufficient model generalization ability, unstable results, low screening efficiency, and insufficient molecular dynamics simulation, making it difficult to efficiently screen out small molecules with potential antioxidant activity.
Antioxidant activity prediction was performed using a multi-model ensemble machine learning approach. Combining traditional machine learning and deep learning models, a dual-principle molecular docking strategy and hierarchical molecular dynamics simulation were used to screen out small molecules with potential antioxidant mechanisms.
It improves the stability and accuracy of screening results, reduces structural redundancy, enhances high-throughput screening efficiency, and provides theoretical support for biological experiments.
Smart Images

Figure CN122245524A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of big data analysis technology, specifically a method for screening small antioxidant molecules in yellow skin based on machine learning and molecular simulation. Background Technology
[0002] Numerous studies have demonstrated a strong correlation between the abnormal accumulation of reactive oxygen species (ROS) and the development of various chronic diseases, leading to a sustained increase in demand for safe and effective antioxidants in food additives, cosmetics, and pharmaceuticals. Currently, commonly used synthetic antioxidants include butylated hydroxyanisole (BHA), butylated hydroxytoluene (BHT), and tert-butylhydroquinone (TBHQ). However, recent research suggests that these antioxidants may cause DNA damage and genotoxic alterations, raising concerns about their safety. In contrast, natural antioxidants derived from edible plants and animals offer advantages such as low toxicity and easy availability. Therefore, identifying natural antioxidants from plants has become a promising avenue for developing new antioxidants.
[0003] Wampee is a tropical plant belonging to the Rutaceae family, mainly distributed in the tropical and subtropical regions of southern China. Existing literature reports that the pulp, seeds, peel, stems, and leaves of wampee are rich in various bioactive compounds such as flavonoids, coumarins, and alkaloids, and have the potential to be developed into natural antioxidants.
[0004] However, existing screening schemes based on computational virtual screening of active ingredients in natural products have several limitations. Many existing studies use only a single antioxidant experiment to construct the training set and a single machine learning model for activity prediction. This results in insufficient generalization ability of the model across different antioxidant evaluation systems, and the accuracy of the prediction results across different evaluation scales needs improvement. Furthermore, natural products contain multiple structures, some of which are highly complex and flexible, often being hydroxyl or glycosides, frequently exhibiting multiple conformations. Many schemes rely solely on a single molecular docking method for virtual screening, neglecting the differences in scoring functions and search strategies between different docking procedures, and the potential for variations in docking posture and results under different random seeds or initial conformations for the same docking procedure. This affects the stability and reproducibility of the screening results. In addition, in practical high-throughput screening, some schemes lack a unified standard for processing machine learning predictions and molecular docking results, making it difficult to efficiently screen a batch of potential candidate small molecules from a large number of small molecules. On the other hand, existing studies employ short molecular dynamics simulations and lack fixed screening metrics. These simulations often involve short-duration simulations of a small number of known potentially active small molecules from previous studies, primarily focusing on the trajectory stability of the complex or the binding energy of ligands in the docking region. They neglect the contribution of key residues and the mechanistic characterization at the electronic structure level, making it difficult to provide theoretical support for the experimental design of subsequent biological experiments. In summary, there is still an urgent need for a systematic screening method for antioxidant active ingredients in natural products that can rationally combine machine learning with multi-scale molecular simulations, while balancing screening efficiency, structural diversity, and reproducibility. Summary of the Invention
[0005] The purpose of this invention is to provide a method for screening small antioxidant molecules in yellow skin based on machine learning and molecular simulation, so as to solve the problems raised in the prior art.
[0006] To achieve the above objectives, the present invention provides the following technical solution: a small molecule screening method for antioxidants in yellow skin based on machine learning and molecular simulation, the small molecule screening method comprising the following steps: Establish a database of small molecules from the yellow peel, and generate comprehensive predictive activity indicators based on the structural characteristics of small molecules from the yellow peel in the database; We performed dual-principle molecular docking between small molecules of yellow skin and antioxidant-related target proteins to obtain the final docking scores of the first type of small molecules and the final docking scores of the second type of small molecules. Candidate small molecules were selected based on comprehensive predicted activity indicators, final docking scores of Class I and Class II small molecules; Hierarchical molecular dynamics simulations were used to screen candidate small molecules, and small molecules with potential antioxidant mechanisms were identified.
[0007] The construction process of the small molecule database of yellow peel is as follows: by searching literature databases and public chemical databases, small molecule chemical components of yellow peel are collected, their molecular structure representation information is recorded, and the structural representation information of each small molecule is standardized and duplicates are removed. The antioxidant activity includes direct scavenging of reactive oxygen free radicals and indirect antioxidant activity generated by regulating antioxidant-related signaling pathways; The yellow peel mentioned refers to plants of the genus *Pyrantelia* in the family Rutaceae.
[0008] The process of generating the comprehensive predictive activity index: The molecular structure features of small molecules in yellow skin were extracted from the yellow skin small molecule database and used as input variables for machine learning models. Various types of machine learning models were used to predict antioxidant activity, and the average value of the prediction results was used as a comprehensive prediction activity index.
[0009] The machine learning model includes traditional machine learning models and deep learning models. The traditional machine learning models include multilayer perceptrons, random forests, and support vector machines. The deep learning models include graph neural networks and knowledge graph pre-trained transformer models. The traditional machine learning model uses Morgan fingerprints as molecular fingerprint feature inputs, while the deep learning model uses molecular graph structures with atoms as nodes and chemical bonds as edges as molecular graph feature inputs. All models were evaluated using five-fold cross-validation, and ROC-AUC was used to characterize model performance.
[0010] The dual-principle molecular docking specifically refers to: the first type of molecular docking is a binding energy calculation-based molecular docking based on physical force fields or empirical potential functions; the second type of molecular docking is a scoring-based molecular docking based on binding pocket hotspots or spatial adaptation principles; and the structural information of antioxidant-related target proteins is obtained by searching literature databases and public protein databases. The first type of molecular docking focuses on evaluating the ligand-protein binding strength from an energy perspective, while the second type focuses on evaluating the spatial matching degree of the ligand from the perspectives of binding pocket hotspots and conformational adaptation. Using two molecular docking methods with different docking principles, multiple molecular docking operations were performed between the yellow-skinned small molecule and antioxidant-related target proteins. For the first type of molecular docking, the median binding energy of the optimal posture after N=10 docking attempts and the highest score among multiple conformations produced in a single docking run for the second type of molecular docking were used as evaluation indicators for screening. Final docking score for the first type of small molecules: The first type of molecular docking involves at least N independent dockings. The independent dockings are calculated using different random seeds and different initial ligand conformations, and the median of the N docking results is taken. Final docking score for the second type of small molecules: The highest score among the multiple conformations generated in the second type of molecular docking is taken as the final docking score for the second type of small molecules. Under the same input and fixed parameter conditions, its docking score has high repeatability, and the difference in repeated calculations can be ignored. Therefore, it is preferred to take the highest score among the multiple conformations generated in a single docking run as the second type of docking score. If necessary, multiple calculations can also be performed and the highest or average value can be taken. The conformation described represents all the results of type II molecular docking.
[0011] The screening process for the candidate small molecules: Histograms and scatter plots were plotted for the comprehensive predicted activity index, the final docking score of the first type of small molecules, and the final docking score of the second type of small molecules. The data distribution characteristics of each index were analyzed. Based on the preset thresholds of the three indexes, yellow-skinned small molecules that simultaneously meet the preset thresholds of the three indexes were retained to form a preliminary screening candidate set. The preset thresholds were determined based on the data distribution and industry experience. Chemical similarity clustering is performed based on molecular fingerprints and similarity. Those with similarity higher than a threshold are removed, and only K are retained. All retained yellow skin small molecules are collected and labeled as candidate small molecules.
[0012] The chemical similarity clustering is performed using Morgan molecular fingerprinting and Tanimoto similarity, with the similarity threshold preset based on industry experience; the number of small molecules entering the candidate set is limited during the clustering process.
[0013] The hierarchical molecular dynamics simulation screening includes: A unified force field distribution, solvation treatment, and temperature and pressure balance control were applied to candidate small molecules. The simulation phases are set with three simulation duration gradients: T1, T2, and T3. The simulation duration is gradually increased, and the thresholds of the kinetic and hydrogen bond-related screening indicators are increased accordingly. The screening indicators include classical kinetic indicators, hydrogen bond occupancy, and the criteria for sustained occupancy of the terminal ligand. The criteria for sustained occupancy of the terminal ligand include calculating the centroid distance between the ligand and the binding pocket during the last time period T of the long-term simulation. The short-range simulation duration T1 satisfies 10≤T1≤50ns, the medium-range simulation duration T2 satisfies 50≤T2≤200ns, and the long-range simulation duration T3 satisfies 200≤T3≤500ns. The final trajectories of candidate small molecules from the three stages of molecular dynamics simulation are analyzed, and the criterion of the continuous occupancy of the binding pocket of the ligand pair is used as the evaluation standard for screening the final stable binding small molecules.
[0014] By calculating the centroid distance between the ligand and protein binding pocket within a specific time interval, small yellow-skinned molecules with a centroid distance smaller than a preset centroid distance are marked as passed. For small yellow-skinned molecule that consistently occupies the criterion, we combined free energy calculation to evaluate the binding strength of ligand-protein, key residue energy decomposition to identify core interaction sites, principal component analysis and free energy landscape construction to verify the conformational stability of the complex, and analyzed the antioxidant potential of small yellow-skinned molecule in free radical scavenging through quantum chemical electronic structure parameters. We then labeled them as small yellow-skinned molecule with potential antioxidant mechanism characteristics and elucidated their potential antioxidant molecular mechanism.
[0015] The free energy calculation adopts the MM-PBSA binding energy calculation method, and the key residue energy decomposition adopts the MM-GBSA key residue decomposition method; the quantum chemical electronic structure parameters are obtained through geometric optimization and frequency analysis, and the solvent effect is simulated in an aqueous environment using a polarized continuum model.
[0016] The activity of the small molecules of yellow skin with potential antioxidant mechanisms was verified by in vitro antioxidant experiments, which included chemical experiments to directly scavenge reactive oxygen free radicals and antioxidant experiments based on cell models.
[0017] Compared with the prior art, the beneficial effects of the present invention are: 1. This invention uses a multi-model integrated machine learning approach to predict antioxidant activity. It combines traditional machine learning and deep learning models to cover different molecular structure feature input types. At the same time, it uses five-fold cross-validation to ensure model performance, effectively eliminate the bias of a single algorithm, improve the generalization and robustness of the prediction results, and solve the problem of insufficient prediction accuracy of a single model in the prior art.
[0018] 2. This invention designs a dual-principle molecular docking strategy to evaluate the binding potential of small molecules of yellow skin to target proteins from two dimensions: energy binding strength and spatial conformational adaptation. At the same time, it performs multiple independent calculations on the binding energy calculation docking and takes the median, which effectively reduces the result bias caused by a single random docking. The two docking results complement each other and improve the stability and reproducibility of the screening results.
[0019] 3. This invention uses machine learning prediction results and dual-principle molecular docking results to perform unified standardization processing of library percentiles and empirical thresholds. Intersection screening is used to quickly narrow down the candidate range. Then, chemical similarity clustering is combined with limiting the number of candidates within a cluster. This reduces structural redundancy while ensuring the chemical diversity of candidate small molecules, greatly improving the efficiency of high-throughput screening and reducing subsequent computation and experimental costs.
[0020] 4. This invention employs a hierarchical molecular dynamics simulation strategy, gradually increasing the simulation duration and raising the screening threshold from coarse to fine, to rapidly screen out kinetically unstable small molecules. Simultaneously, it uses the sustained ligand occupancy at the end of the long-range simulation as the core criterion, combined with free energy calculation, key residue decomposition, and quantum chemical analysis, to analyze the antioxidant mechanism of small molecules from multiple levels of dynamics, energy, and electronic structure. This not only effectively reduces the false positive rate of static docking but also provides comprehensive theoretical support for the design of subsequent biological experiments. Attached Figure Description
[0021] Figure 1 This is a schematic diagram of the process for screening small antioxidant molecules in yellow skin based on machine learning and molecular simulation, as described in this invention. Detailed Implementation
[0022] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0023] Example 1: As Figure 1 As shown, this invention provides a technical solution: a method for screening small antioxidant molecules in yellow skin based on machine learning and molecular simulation. The small molecule screening method includes the following steps: Establish a database of small molecules from the yellow peel, and generate comprehensive predictive activity indicators based on the structural characteristics of small molecules from the yellow peel in the database; The construction process of the small molecule database of yellow peel is as follows: by searching literature databases and public chemical databases, small molecule chemical components of yellow peel are collected, their molecular structure representation information is recorded, and the structural representation information of each small molecule is standardized and duplicates are removed. The antioxidant activity includes direct scavenging of reactive oxygen free radicals and indirect antioxidant activity generated by regulating antioxidant-related signaling pathways; The yellow peel mentioned refers to plants of the genus *Pyrantelia* in the family Rutaceae.
[0024] The process of generating the comprehensive predictive activity index: The molecular structure features of small molecules in yellow peel were extracted from the yellow peel small molecule database and used as input variables for machine learning models. Various types of machine learning models were used to predict antioxidant activity, and the average value of the prediction results was used as a comprehensive prediction activity index. The machine learning model includes traditional machine learning models and deep learning models. The traditional machine learning models include multilayer perceptrons, random forests, and support vector machines. The deep learning models include graph neural networks and knowledge graph pre-trained transformer models. The traditional machine learning model uses Morgan fingerprints as molecular fingerprint feature inputs, while the deep learning model uses molecular graph structures with atoms as nodes and chemical bonds as edges as molecular graph feature inputs.
[0025] For example, by searching databases such as PubMed, ScienceDirect, and CNKI, high-quality research literature on the chemical components of Clausenalansium (Lour.) Skels was systematically collected, and the chemical components of the reported fruit, kernel, peel, stem, and leaves were manually extracted. The SMILES representations of relevant compounds were obtained and recorded using the PubChem database, a database of active small molecules of Clausenalansium was constructed, and the structures were standardized using RDkit to remove duplicate small molecules. Finally, a database of 474 small molecules of Clausenalansium was constructed for subsequent activity prediction and virtual screening. For example, based on the established database and the open-source antioxidant database, traditional machine learning models such as Multilayer Perceptron (MLP), Random Forest (RF), and Support Vector Machine (SVM), as well as Graph Neural Network (WeaveGNN) and Knowledge Graph Pre-trained Transformer (KPGT) models, were used to obtain prediction information for small molecules in yellow skin. Traditional machine learning models extracted molecular structure features using RDkit and used Morgan fingerprints (radius = 2, length 1024 bits) as input; WeaveGNN and KPGT models directly used molecular graph structures with atoms as nodes and chemical bonds as edges as input. All models were trained and evaluated using five-fold cross-validation, and the final model performance was mainly characterized by ROC-AUC.
[0026] In this embodiment, the average score (LBP_avg) of the predicted values of the above five models is used as the comprehensive predicted activity index for each molecule, which is one of the important input parameters for subsequent virtual screening steps and cluster analysis.
[0027] The five machine learning models (MLP, RF, SVM, WeaveGNN, and KPGT) all maintained high ROC-AUC scores on different antioxidant datasets. Meanwhile, to avoid excessively high scores due to smaller datasets, multi-model ensemble and averaging of predictions effectively balanced the fluctuations of individual models on specific datasets, making the predictions more robust.
[0028] We performed dual-principle molecular docking between small molecules of yellow skin and antioxidant-related target proteins to obtain the final docking scores of the first type of small molecules and the final docking scores of the second type of small molecules. The dual-principle molecular docking specifically refers to: the first type of molecular docking is a binding energy calculation-based molecular docking based on physical force fields or empirical potential functions; the second type of molecular docking is a scoring-based molecular docking based on binding pocket hotspots or spatial adaptation principles; and the structural information of antioxidant-related target proteins is obtained by searching literature databases and public protein databases. Final docking score for the first type of small molecules: The first type of molecular docking involves at least N independent dockings. The independent dockings are calculated using different random seeds and different initial ligand conformations, and the median of the N docking results is taken. Final docking score for the second type of small molecules: The highest score among the multiple conformations generated in the second type of molecular docking is taken as the final docking score for the second type of small molecules. The conformation described represents all the results of type II molecular docking.
[0029] For example, by searching public protein databases, the three-dimensional structural information of antioxidant-related target proteins was obtained. In this embodiment, Keap1 protein (PDBID: 4IQK) was selected as the key target. To ensure the reliability of the results, this embodiment simultaneously used software with different docking principles (AutodockVina and Libdock) for molecular docking: AutodockVina was used to perform N=10 independent molecular dockings on each ligand in the yellow skin active small molecule database, and the binding energy of the optimal binding conformation for each ligand was recorded and the median (Vina_med) was calculated; Libdock was used to perform molecular docking on the same set of ligands and extract the highest score. The above docking energies and scoring results serve as important reference indicators for subsequent screening and comparison.
[0030] Candidate small molecules were selected based on comprehensive predicted activity indicators, final docking scores of Class I and Class II small molecules; The screening process for the candidate small molecules: Histograms and scatter plots were plotted for the comprehensive predicted activity index, the final docking score of the first type of small molecules, and the final docking score of the second type of small molecules. The data distribution characteristics of each index were analyzed. Based on the preset thresholds of the three indexes, yellow-skinned small molecules that simultaneously meet the preset thresholds of the three indexes were retained to form a preliminary screening candidate set. The preset thresholds were determined based on the data distribution and industry experience. Chemical similarity clustering is performed based on molecular fingerprints and similarity. Those with similarity higher than the threshold are removed and only K are retained. All retained yellow skin small molecules are collected and labeled as candidate small molecules. The chemical similarity clustering is performed using Morgan molecular fingerprinting and Tanimoto similarity, with the similarity threshold preset based on industry experience; the number of small molecules entering the candidate set is limited during the clustering process.
[0031] For example, in this embodiment, histograms and scatter plots of LBP_avg and LibDock are plotted. The LBP_avg and LibDock scores show an approximately normal distribution, while the Vina_med binding energy shows a clear bimodal distribution, with the first peak at approximately -7.5 kcal / mol and the second peak at approximately -6.0 kcal / mol.
[0032] A strategy of "in-library percentile + empirical cutoff" was adopted to set the screening criteria: the 70th percentile (LBP_avg≥0.45), the 60th percentile (LibDock≥80), and -7 kcal / mol were selected as the binding energy thresholds (Vina_med≤-7.0 kcal / mol) to quickly screen a batch of candidate small molecules. After preliminary screening, a total of 47 yellow-skinned small molecules met the above screening criteria.
[0033] The scatter plot shows that the machine learning prediction results have a low correlation with the binding energy or score obtained from the docking of the two types of molecules. This indicates that the molecular properties of the reactions at different evaluation scales are significantly complementary, which is consistent with the objective fact that the results obtained from the docking of the two types of molecules have a low correlation due to the different docking principles.
[0034] Based on this, combined with chemical similarity analysis, 47 molecules screened through multiple indicators were subjected to Morgan fingerprint (radius=2, length of 2048 bits), chemical similarity (Tc=0.7), and the number of molecules selected in each cluster was limited to no more than 2 (K=2) to control structural redundancy and maintain chemical diversity. Finally, 30 candidate small molecules with potential antioxidant activity were selected and entered into hierarchical molecular dynamics simulation.
[0035] Hierarchical molecular dynamics simulations were used to screen candidate small molecules, and small molecules with potential antioxidant mechanisms were identified.
[0036] The hierarchical molecular dynamics simulation screening includes: A unified force field distribution, solvation treatment, and temperature and pressure balance control were applied to candidate small molecules. The simulation phases are set with three simulation duration gradients: T1, T2, and T3. The simulation duration is gradually increased, and the thresholds of the kinetic and hydrogen bond-related screening indicators are increased accordingly. The screening indicators include classical kinetic indicators, hydrogen bond occupancy, and the criteria for sustained occupancy of the terminal ligand. The criteria for sustained occupancy of the terminal ligand include calculating the centroid distance between the ligand and the binding pocket during the last time period T of the long-term simulation. By calculating the centroid distance between the ligand and protein binding pocket within a specific time interval, small yellow-skinned molecules with a centroid distance smaller than a preset centroid distance are marked as passed. For small yellow-skinned molecule that consistently occupies the criterion, we combined free energy calculation to evaluate the binding strength of ligand-protein, key residue energy decomposition to identify core interaction sites, principal component analysis and free energy landscape construction to verify the conformational stability of the complex, and analyzed the antioxidant potential of small yellow-skinned molecule in free radical scavenging through quantum chemical electronic structure parameters. We then labeled them as small yellow-skinned molecule with potential antioxidant mechanism characteristics and elucidated their potential antioxidant molecular mechanism.
[0037] The free energy calculation adopts the MM-PBSA binding energy calculation method, and the key residue energy decomposition adopts the MM-GBSA key residue decomposition method; the quantum chemical electronic structure parameters are obtained through geometric optimization and frequency analysis, and the solvent effect is simulated in an aqueous environment using a polarized continuum model.
[0038] For example, in this embodiment, the AMBER24 software package was used to perform molecular dynamics simulations on the complexes of 30 candidate small molecules with antioxidant-related target proteins. The total simulation time, including 500 ns for unbound proteins and complex stratification simulations, was 4800 ns. After 20 ns and 100 ns simulations, the 30 candidate complexes were reduced to 12 and 5 small molecules, respectively. Finally, the 5 small molecules were included in the 500 ns long-range simulation. The screening criteria for the 20ns stage are: average Cα-RMSD ≤ 2.5 Å, Rg variation within ±3%, SASA variation within ±10%, average hydrogen bond count ≥ 0.7 (per frame), major hydrogen bond occupancy ≥ 20%, and ≥ 1 hydrogen bond with Frac ≥ 0.1. The screening criteria for the 100ns stage are: average Cα-RMSD ≤ 2.5 Å, Rg variation within ±3%, SASA variation within ±10%, total average hydrogen bond count (ligand-to-protein and protein-to-ligand per frame) ≥ 1.0, major hydrogen bond occupancy ≥ 30%, and ≥ 2 hydrogen bonds with Frac ≥ 0.1. The screening criteria for the 500ns stage are: average Cα-RMSD ≤ 2.5 Å, Rg variation ≤ ±3%, SASA relative variation ≤ ±10%, total average hydrogen bond count (ligand-to-protein and protein-to-ligand per frame) ≥ 1.0, major hydrogen bond occupancy ≥ 30%, and ≥ 2 hydrogen bonds with Frac ≥ 0.1. In this embodiment, the aforementioned threshold is the preferred screening condition used in this embodiment. The protein was simulated using the ff19SB force field, and the ligands using the GAFF2 force field. Before the formal simulation, each step underwent 6000 steps of energy minimization (the first 3000 steps used the steepest descent method, and the last 3000 steps used the conjugate gradient method) to ensure system stability. The water model used TIP3P solvation. The atomic partial charges of each ligand were calculated using the AM1-BCC method. Subsequently, the system was heated slowly from 0K to 310K in the NVT system, and a Langevin thermostat was used to maintain temperature stability. Equilibrium and production simulations were then performed in the NPT system, with the pressure maintained at 1 atm using a Monte Carlobarostat, and the system volume allowed to vary freely to achieve full equilibrium. For the complex system, the simulation lengths for the production stage were 20 ns, 100 ns, and 500 ns, with a time step of 2 fs. The trajectory was saved every 0.1 ns during the simulation, resulting in 200, 1000, and 5000 frames of trajectory data, respectively. In the long-range simulation, long-range electrostatic interactions were handled using the Ewald (PME) method with a cutoff distance of 8 Å. Based on the stability conditions described above, the last 50 ns of the 500 ns simulated trajectory was selected as the analysis interval. The centroid distance of the ligand-binding pocket for candidate ligands was calculated to assess whether the ligand showed no sustained upward trend within the binding pocket. Only ligands showing no significant detachment trend within this time interval were considered stably bound and used for subsequent energy calculations and experimental verification.If two candidate ligands have a ligand-binding pocket centroid distance that is always less than 4.5 Å within the analysis interval, calculate their MM-PBSA binding free energy separately. In this embodiment, based on a 500 ns simulation, MM-PBSA energy decomposition analysis was performed on the last 50 ns of the candidate complex. The MM-PBSA results showed that the van der Waals interaction energy ΔE_vdw of the first ligand in this interval was -45.44±3.27 kcal / mol, the electrostatic interaction energy ΔE_ele was -67.88±8.02 kcal / mol, and the gas-phase interaction energy ΔG_gas was -113.32 kcal / mol. The polar solvation energy ΔG_polar was +88.54±5.71 kcal / mol, the nonpolar solvation energy ΔG_nonpolar was -4.70±0.09 kcal / mol, and the total solvation energy ΔG_solv was +83.84 kcal / mol. The total binding free energy ΔG_bind was calculated to be -29.48 ± 4.70 kcal / mol, while the van der Waals interaction energy ΔE_vdw of another candidate ligand in this range was -36.54 ± 3.07 kcal / mol, the electrostatic interaction energy ΔE_ele was -24.82 ± 6.54 kcal / mol, and the gas-phase interaction energy ΔG_gas was -61.36 kcal / mol. The polar solvation energy ΔG_polar was +52.41 ± 6.05 kcal / mol, the nonpolar solvation energy ΔG_nonpolar was -3.95 ± 0.18 kcal / mol, and the total solvation energy ΔG_solv was +48.46 kcal / mol. The total binding free energy ΔG_bind was calculated to be -12.90 ± 4.69 kcal / mol. In this embodiment, the ligand with the lower total binding free energy was selected as a representative ligand for subsequent mechanism analysis and experimental verification. Based on the stable trajectory of the final segment of the complex, further MM-GBSA residue energy decomposition analysis was performed to identify key residues that significantly contribute to ligand binding. The residue energy decomposition results showed that several residues, including ARG-415, can serve as important residue references for ligand-protein interactions in this complex. Subsequently, principal component analysis was performed based on the protein Cα atom, projecting the trajectory onto the first two principal components and constructing a free energy landscape accordingly. During the analysis time interval, the candidate complex exhibited a clear clustering distribution of its trajectory in the PC1–PC2 projection space, corresponding to a single and continuous low free energy region in the free energy landscape. This result indicates that, under the premise of satisfying the sustained occupancy criterion at the final segment, the complex is mainly in a stable conformational state in the later stages of long-range simulations, which corroborates the screening results based on the aforementioned binding free energy.
[0039] In this embodiment, the mean and standard deviation of Cα-RMSD, radius of gyration (Rg), and solvent-accessible surface area (SASA) for both systems were statistically analyzed over a 500 ns simulation period to evaluate whether the overall conformational stability of the protein changed significantly before and after complex formation. The statistical results showed that the Cα-RMSD of the complex was slightly lower than that of the Apo protein, mainly distributed in the range of 1.00–1.25 Å, indicating that the binding of the ligand to the protein did not cause significant perturbation to the overall conformation. Comparing the radii of gyration (Rg) of the two systems, it was found that the changes in both systems were generally stable over 500 ns, maintaining relatively stable overall compactness and fluctuating within a narrow range, indicating that the binding of the ligand did not significantly affect the compactness of the overall protein structure. Meanwhile, the results showed that the solvent-exposed surface area distribution of the Apo protein and the complex remained within a similar range, while the SASA value of the complex system was slightly lower overall, indicating that the binding of the ligand may have reduced the exposed surface area of the protein to some extent. Furthermore, we compared the RMSF results of the two systems. The results showed that the flexibility distributions of the two systems were generally similar, with larger fluctuations mainly concentrated at the protein ends and flexible loop regions. The complex system exhibited lower RMSF values at some residues near the binding pocket, while no significant differences were observed in the main secondary structure regions. Finally, quantum chemical calculations were performed on the small molecule to characterize its electronic structure. The candidate ligand was geometrically optimized and frequency-analyzed at the B3LYP / 6-31G(d) level, and the solvent effect was simulated in an aqueous environment using a polarized continuum model (PCM). The HOMO energy of the ligand was –0.2109 au (–5.74 eV), and the LUMO energy was –0.0620 au (–1.69 eV), corresponding to a HOMO–LUMO band gap of 0.1489 au (4.05 eV). The highest occupied orbital (HOMO) of the candidate ligand was mainly distributed in the phenolic hydroxyl region, indicating that this region readily loses electrons and possesses strong free radical scavenging potential. These results demonstrate that the candidate small molecule exhibits stable binding characteristics at both the kinetic and energy levels and possesses a potential antioxidant activity mechanism, providing a theoretical basis for subsequent experimental design and verification.
[0040] The activity of the small molecules of yellow skin with potential antioxidant mechanisms was verified by in vitro antioxidant experiments, which included chemical experiments to directly scavenge reactive oxygen free radicals and antioxidant experiments based on cell models.
[0041] For example, in this embodiment, the results of virtual screening can be used to design and conduct in vitro antioxidant experiments to verify the selected small molecules. First, a chemical method for directly scavenging reactive oxygen species (ROS) is employed: the DPPH and ABTS free radical scavenging assay. Second, at the cellular level, a hydrogen peroxide-induced HepG2 and Caco-2 two-cell oxidative damage model is constructed to conduct cellular antioxidant experiments. After safety is evaluated using CCK-8, intracellular ROS levels are measured.
[0042] It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the invention can be implemented in other specific forms without departing from its spirit or essential characteristics. Therefore, the embodiments should be considered in all respects as exemplary and non-limiting, and the scope of the invention is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be included within the present invention. No reference numerals in the claims should be construed as limiting the scope of the claims.
Claims
1. A method for screening small antioxidant molecules in yellow skin based on machine learning and molecular simulation, characterized by: The small molecule screening method includes the following steps: Establish a database of small molecules from the yellow peel, and generate comprehensive predictive activity indicators based on the structural characteristics of small molecules from the yellow peel in the database; We performed dual-principle molecular docking between small molecules of yellow skin and antioxidant-related target proteins to obtain the final docking scores of the first type of small molecules and the final docking scores of the second type of small molecules. Candidate small molecules were selected based on comprehensive predicted activity indicators, final docking scores of Class I and Class II small molecules; Hierarchical molecular dynamics simulations were used to screen candidate small molecules, and small molecules with potential antioxidant mechanisms were identified.
2. The method for screening small antioxidant molecules in yellow skin based on machine learning and molecular simulation according to claim 1, characterized in that: The construction process of the small molecule database of yellow peel is as follows: by searching literature databases and public chemical databases, small molecule chemical components of yellow peel are collected, their molecular structure representation information is recorded, and the structural representation information of each small molecule is standardized and duplicates are removed. The antioxidant activity includes direct scavenging of reactive oxygen free radicals and indirect antioxidant activity generated by regulating antioxidant-related signaling pathways; The yellow peel mentioned refers to plants of the genus *Pyrantelia* in the family Rutaceae.
3. The method for screening small antioxidant molecules in yellow skin based on machine learning and molecular simulation according to claim 1, characterized in that: The process of generating the comprehensive predictive activity index: The molecular structure features of small molecules in yellow skin were extracted from the yellow skin small molecule database and used as input variables for machine learning models. Various types of machine learning models were used to predict antioxidant activity, and the average value of the prediction results was used as a comprehensive prediction activity index.
4. The method for screening small antioxidant molecules in yellow skin based on machine learning and molecular simulation according to claim 3, characterized in that: The machine learning model includes traditional machine learning models and deep learning models. The traditional machine learning models include multilayer perceptrons, random forests, and support vector machines. The deep learning models include graph neural networks and knowledge graph pre-trained transformer models. The traditional machine learning model uses Morgan fingerprints as molecular fingerprint feature inputs, while the deep learning model uses molecular graph structures with atoms as nodes and chemical bonds as edges as molecular graph feature inputs.
5. The method for screening small antioxidant molecules in yellow skin based on machine learning and molecular simulation according to claim 1, characterized in that: The dual-principle molecular docking specifically refers to: the first type of molecular docking is a binding energy calculation-based molecular docking based on physical force fields or empirical potential functions; the second type of molecular docking is a scoring-based molecular docking based on binding pocket hotspots or spatial adaptation principles; and the structural information of antioxidant-related target proteins is obtained by searching literature databases and public protein databases. Final docking score for the first type of small molecules: The first type of molecular docking involves no less than N independent dockings. The independent dockings are calculated using different random seeds and different initial ligand conformations, and the median of the N docking results is taken. Final docking score for the second type of small molecules: The highest score among the multiple conformations generated in the second type of molecular docking is taken as the final docking score for the second type of small molecules. The conformation described represents all the results of type II molecular docking.
6. The method for screening small antioxidant molecules in yellow skin based on machine learning and molecular simulation according to claim 1, characterized in that: The screening process for the candidate small molecules: Histograms and scatter plots were plotted for the comprehensive predicted activity index, the final docking score of the first type of small molecules, and the final docking score of the second type of small molecules. The data distribution characteristics of each index were analyzed. Based on the preset thresholds of the three indexes, yellow-skinned small molecules that simultaneously meet the preset thresholds of the three indexes were retained to form a preliminary screening candidate set. The preset thresholds were determined based on the data distribution and industry experience. Chemical similarity clustering is performed based on molecular fingerprints and similarity. Those with similarity higher than a threshold are removed, and only K are retained. All retained yellow skin small molecules are collected and labeled as candidate small molecules.
7. The method for screening small antioxidant molecules in yellow skin based on machine learning and molecular simulation according to claim 6, characterized in that: The chemical similarity clustering is performed using Morgan molecular fingerprinting and Tanimoto similarity, with the similarity threshold preset based on industry experience; the number of small molecules entering the candidate set is limited during the clustering process.
8. The method for screening small antioxidant molecules in yellow skin based on machine learning and molecular simulation according to claim 1, characterized in that: The hierarchical molecular dynamics simulation screening includes: A unified force field distribution, solvation treatment, and temperature and pressure balance control were applied to candidate small molecules. The simulation phases are set with three simulation duration gradients: T1, T2, and T3. The simulation duration is gradually increased, and the thresholds of the kinetic and hydrogen bond-related screening indicators are increased accordingly. The screening indicators include classical kinetic indicators, hydrogen bond occupancy, and the criteria for sustained occupancy of the terminal ligand. The criteria for sustained occupancy of the terminal ligand include calculating the centroid distance between the ligand and the binding pocket during the last time period T of the long-term simulation. By calculating the centroid distance between the ligand and protein binding pocket within a specific time interval, small yellow-skinned molecules with a centroid distance smaller than a preset centroid distance are marked as passed. For small yellow-skinned molecule that consistently occupies the criterion, we combined free energy calculation to evaluate the binding strength of ligand-protein, key residue energy decomposition to identify core interaction sites, principal component analysis and free energy landscape construction to verify the conformational stability of the complex, and analyzed the antioxidant potential of small yellow-skinned molecule in free radical scavenging through quantum chemical electronic structure parameters. We then labeled them as small yellow-skinned molecule with potential antioxidant mechanism characteristics and elucidated their potential antioxidant molecular mechanism.
9. The method for screening small antioxidant molecules in yellow skin based on machine learning and molecular simulation according to claim 8, characterized in that: The free energy calculation adopts the MM-PBSA binding energy calculation method, and the key residue energy decomposition adopts the MM-GBSA key residue decomposition method; the quantum chemical electronic structure parameters are obtained through geometric optimization and frequency analysis, and the solvent effect is simulated in an aqueous environment using a polarized continuum model.
10. The method for screening small antioxidant molecules in yellow skin based on machine learning and molecular simulation according to claim 1, characterized in that: The activity of the small molecules of yellow skin with potential antioxidant mechanisms was verified by in vitro antioxidant experiments, which included chemical experiments to directly scavenge reactive oxygen free radicals and antioxidant experiments based on cell models.