Method and system for generating broad-spectrum antibacterial peptide based on conditional feedback generative adversarial network
By constructing a conditional feedback generative adversarial network, utilizing composite conditional information vectors and reinforcement feedback loops, and combining them with a Brownian motion controller, the problems of broad-spectrum activity control and training stability of the antimicrobial peptide generation model were solved, achieving efficient and stable antimicrobial peptide generation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ANHUI UNIV
- Filing Date
- 2026-03-10
- Publication Date
- 2026-06-19
AI Technical Summary
Existing antimicrobial peptide generation models struggle to achieve broad-spectrum activity control, suffer from unstable training processes, lack feedback mechanisms, and have weak attribute control, making it difficult to meet the needs of customized drug design.
A conditional feedback generative adversarial network is constructed to achieve precise control and stable training of the antimicrobial peptide generation process through composite conditional information vectors, reinforced feedback loops, and Brownian motion controllers. Multiple activity analysis models are introduced for real-time evaluation and reward mechanisms to optimize the generation strategy.
It achieves precise control over broad-spectrum activity, enhances the functionality of the generated sequence and the stability of the training process, significantly increases the proportion of multi-target peptides generated, and improves the diversity and structural stability of generated antimicrobial peptides.
Smart Images

Figure CN122245507A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of bioinformatics and artificial intelligence-assisted drug design, and in particular to a method and system for generating broad-spectrum antimicrobial peptides based on conditional feedback generative adversarial networks. Background Technology
[0002] Antimicrobial peptides (AMPs) are widely present in the innate immune systems of many plants and animals, and are a key defense mechanism against pathogenic microorganisms. With the increasing severity of antibiotic resistance, AMPs are considered highly promising candidates for replacing traditional antibiotics and developing novel anti-infective drugs due to their broad-spectrum antibacterial activity, high selectivity, low toxicity, and low tendency to induce resistance.
[0003] However, despite the enormous therapeutic potential of AMPs, their widespread application faces numerous challenges. On the one hand, natural AMP resources are limited, and large-scale peptide synthesis is economically expensive and time-consuming, restricting the progress of peptide-based drug development. On the other hand, AMPs typically consist of amino acid sequences of variable length, resulting in an extremely large theoretical combinatorial space (e.g., a peptide of only 50 amino acids has approximately 1.1 × 10⁶⁵ possible combinations), making it difficult for traditional experimental screening methods to effectively cover this vast sequence space to discover novel functional AMPs.
[0004] To address these challenges, researchers have developed various computational methods based on machine learning and deep learning. Early predictive models (such as AmPEP and iAMP-Attenpred) were primarily used to screen potential AMPs from existing databases, improving screening efficiency. In recent years, deep generative models (such as neural language models, variational autoencoders (VAEs), generative adversarial networks (GANs), and diffusion models) have been introduced into the de novo design of AMPs. Among these, while diffusion models offer high sample quality, they suffer from slow generation speed, high computational requirements, and complex architecture. In contrast, GANs have attracted attention due to their fast generation speed and high computational efficiency.
[0005] Despite the progress made by existing GAN-based AMP generative models, the following significant technical bottlenecks and shortcomings still exist: Lack of precise control over broad-spectrum activity: Existing models often struggle to precisely control antimicrobial activity against specific pathogens during the generation process, resulting in peptides that often lack broad-spectrum efficacy and are ill-suited to dealing with multiple pathogen infections.
[0006] Lack of optimized feedback mechanism: Most existing generative models lack iterative feedback loops, making it impossible to adjust the generative strategy in real time based on the functional quality of the peptide (such as the strength of antibacterial activity) during the generative process, resulting in insufficient functional optimization of the generated sequence.
[0007] Training instability: The inherent training instability issues of GAN architecture (such as mode collapse and convergence difficulties) still exist in peptide sequence generation tasks, affecting the robustness of the model and the diversity of generated peptides.
[0008] Weak attribute control: Existing methods are weak in encoding complex biological attributes (such as specific activities for multiple bacteria, MIC value gradients, and precise length control), making it difficult to meet the needs of customized drug design. Summary of the Invention
[0009] To address the shortcomings of existing technologies, this invention provides a method and system for generating broad-spectrum antimicrobial peptides based on conditional feedback generative adversarial networks. This solves the problems in existing antimicrobial peptide generation models, such as difficulty in balancing broad-spectrum activity and generation efficiency, unstable training processes, and lack of precise control over the activity of specific pathogens. It achieves the goals of precise control over broad-spectrum activity, optimized functionality of generated sequences, and improved stability of the training process.
[0010] To address the aforementioned technical problems, this invention provides the following technical solution: a method for generating broad-spectrum antimicrobial peptides based on conditional feedback generative adversarial networks, comprising the following steps: S1. Construct a training dataset and perform preprocessing. The preprocessing includes constructing a composite conditional information vector, which contains at least target bacterial species information, minimum inhibitory concentration level information, and peptide chain length information. S2. Construct a generative adversarial network model containing a generator and a discriminator, and a reinforcement feedback loop containing multiple pre-trained activity analysis models targeting different specific pathogens. S3. Iterative adversarial training is performed on the generative adversarial network model. During the training process, the composite conditional information vector is used as a constraint input to the generator and discriminator. The reinforcement feedback loop is used to evaluate the broad-spectrum activity of the peptide sequence generated by the generator and calculate the reward value. The reward value is fed back to the generator to guide its parameter update. At the same time, a Brownian motion controller is introduced to dynamically adjust the learning rate during the training process. S4. Receive the target composite condition information vector input by the user, combine it with random noise input to the trained generator, and output a broad-spectrum antimicrobial peptide sequence that meets the target conditions.
[0011] Furthermore, the specific steps for constructing the training dataset are as follows: Extract initial peptide sequences with confirmed antibacterial activity; Initial peptide sequences with amino acid sequence lengths less than or equal to a preset length threshold are selected to obtain standard peptide sequences; Use sequence clustering tools to remove redundant sequences in standard peptide sequences that have a consistency higher than a preset consistency threshold to obtain an antimicrobial peptide dataset for training.
[0012] Furthermore, the specific steps for constructing the composite conditional information vector are as follows: Constructing target bacterial signatures: using binary encoding to represent the inhibitory activity of peptide sequences against various bacteria; Constructing MIC-level feature bits: Based on the MIC values of peptide sequences, the normalized interval discretization process is performed, and binary bit encoding is used to represent the antibacterial strength level to which the sequence belongs; Construct peptide chain length feature bits: Establish a fixed-length binary vector and use a mask to represent the position information of effective amino acids; The target bacterial feature bits, MIC level feature bits, and peptide chain length feature bits are spliced together to form a composite conditional information vector with a preset dimension.
[0013] Furthermore, the specific steps for constructing the enhanced feedback loop are as follows: We acquired peptide sequence sample data containing known antibacterial activities of various pathogens, trained independent deep learning prediction models based on the sample data, and obtained multiple activity analysis models. Multiple activity analysis models are integrated into the training loop of a generative adversarial network. During training, the activity analysis model receives peptide sequences generated by the generator and outputs the predicted activity probability of the peptide sequence for the corresponding pathogen.
[0014] Furthermore, the specific steps for using reward signals to guide generator optimization are as follows: The peptide sequences generated by the statistical generator are predicted as the total number of active pathogens in the reinforcement feedback loop; Based on the total number of pathogens, a basic reward value is calculated using a segmented reward function; wherein, the segmented reward function is configured to assign a step-by-step increasing basic reward value when the total number of pathogens reaches different preset quantity thresholds; A baseline term is introduced to calculate the advantage function, which transforms the base reward value into an advantage value. This advantage value is then combined with the adversarial loss of the generative adversarial network to update the generator's parameters.
[0015] Furthermore, the specific steps for introducing a Brownian motion controller to dynamically adjust the learning rate are as follows: A stochastic differential equation containing multiplicative noise terms is constructed, and the Brownian motion term in the stochastic differential equation is used as a perturbation variable to simulate the dynamic process of model parameter optimization. During the training iterations of the generative adversarial network, a learning rate adjustment factor is calculated in real time based on a preset first control parameter and a second control parameter. Based on the learning rate adjustment factor, the learning rates of the generator and the discriminator are dynamically adjusted to synchronize their convergence speed.
[0016] Furthermore, the iterative adversarial training process for the generative adversarial network model also includes an adaptive sample replacement step, specifically: A reinforced feedback loop is used to perform a broad-spectrum activity score on the batch peptide sequences generated by the generator in the current iteration cycle. Based on the scoring results, highly active generating peptide sequences that meet the first preset condition for broad-spectrum activity scores were selected. The real peptide sequences in the training dataset that meet the second preset condition for broad-spectrum activity scores are replaced with highly active generated peptide sequences to dynamically construct an enhanced training set for the discriminator to perform subsequent iterative learning.
[0017] Furthermore, the target composite condition information vector input by the user, combined with random noise, is input into the trained generator to output a broad-spectrum antimicrobial peptide sequence that meets the target conditions, specifically including: Receive a target complex condition information vector containing information on the target bacterial species, minimum inhibitory concentration level, and peptide chain length. The target composite conditional information vector is fused with the random noise vector and then input into the generator whose parameters have been updated through iterative adversarial training. The generator outputs an amino acid sequence corresponding to the target complex condition information vector, which serves as a broad-spectrum antimicrobial peptide sequence that meets the target conditions.
[0018] A broad-spectrum antimicrobial peptide generation system, characterized by comprising: a data processing module for collecting data and encoding composite conditional information vectors; a model building module for constructing a generative adversarial network and a reinforcement feedback loop containing multiple analyzers; a training control module for integrating a Brownian motion controller to adjust the learning rate and calculating feedback rewards to update model parameters; and a generation application module for outputting broad-spectrum antimicrobial peptide sequences based on user input conditions. A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements a method for generating broad-spectrum antimicrobial peptides.
[0019] By employing the above technical solution, the present invention provides a method and system for generating broad-spectrum antimicrobial peptides based on conditional feedback generative adversarial networks, which has at least the following beneficial effects: 1. This invention utilizes composite conditional information vectors and, through effective data collection and encoding methods, transforms target bacterial species, MIC levels, and peptide chain lengths into computer-processable structured constraints. This enables precise control over the broad-spectrum activity and biological properties of generated antimicrobial peptides, providing technical support for customized drug design.
[0020] 2. This invention introduces a reinforced feedback loop and a segmented reward mechanism to evaluate the activity of generated sequences against multiple pathogens in real time during training, guiding the model to optimize towards broad-spectrum activity. This significantly increases the proportion of multi-target peptides effective against more than three bacteria, verifying the effectiveness of the feedback mechanism in improving broad-spectrum activity.
[0021] 3. This invention utilizes a Brownian motion controller (BMC) to dynamically adjust the learning rate, effectively solving the gradient oscillation and mode collapse problems commonly encountered in traditional GAN training. After introducing BMC, the convergence speed of the model is accelerated, the entropy value of the generated sequence remains at a high level, and the diversity and structural stability of the generated antimicrobial peptides are significantly improved. Attached Figure Description
[0022] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings: Figure 1 This is a schematic flowchart of the broad-spectrum antimicrobial peptide generation method based on conditional feedback generative adversarial networks of the present invention. Figure 2 This is a schematic diagram of the overall principle structure of the CFGAN model of this invention; Figure 3 This is a schematic diagram of the encoding structure of the composite conditional information vector (CI) of the present invention; Figure 4 This is a flowchart illustrating the enhanced feedback loop and reward mechanism of the present invention. Figure 5 This is a schematic diagram illustrating the stability analysis of the antimicrobial peptide generated in this invention during molecular dynamics simulation. Detailed Implementation
[0023] To make the above-mentioned objects, features, and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. This will allow for a full understanding of how the present application uses technical means to solve technical problems and achieve technical effects, and to facilitate its implementation.
[0024] Existing technologies struggle to simultaneously achieve broad-spectrum antimicrobial activity, structural stability, and generation efficiency in the generation of antimicrobial peptides. To achieve precise control over broad-spectrum activity, optimize the functionality of the generated sequences, and improve the stability of the training process, this invention proposes a broad-spectrum antimicrobial peptide generation method based on conditional feedback generative adversarial networks. This method constructs a closed-loop generation system that not only utilizes adversarial training to learn the distribution characteristics of peptide sequences but also introduces explicit biological constraints and a reinforcement learning-based feedback mechanism, such as… Figure 1 As shown, the method includes the following steps: During the data collection phase, this invention primarily utilizes publicly available antimicrobial peptide databases, such as DBAASP (Database of Antimicrobial Activity and Structure of Peptides) v3. The screening criteria were set to retain peptides with amino acid sequence lengths less than or equal to a preset length threshold (which can be 50), as shorter peptides typically offer advantages in terms of synthesis cost and pharmacokinetic properties. To eliminate data redundancy and prevent model overfitting, a sequence clustering tool (e.g., CD-HIT) was used to cluster the original sequences. A preset sequence identity threshold, which can be set to 0.9, was used to remove redundant sequences with similarity exceeding a preset consistency threshold (which can be 90%). The resulting training dataset (denoted as Dataset_1) contains thousands of unique peptide sequences with confirmed antimicrobial activity. In addition to sequence information, the activity labeling for each peptide against specific bacteria and the minimum inhibitory concentration (MIC) value were also extracted simultaneously.
[0025] To support subsequent broad-spectrum predictions, a prediction dataset (Dataset_2) was also constructed. This dataset, based on sources such as AMPActiPred, contains binary classification data for ten representative pathogenic bacteria, including E. coli, S. aureus, and Pseudomonas aeruginosa, which is used to train the activity analyzer in the feedback loop.
[0026] In traditional generative adversarial networks (GANs), the generation process is often random or controlled only by simple class labels. To achieve precise "customization" of antimicrobial peptide function, this invention designs a high-dimensional composite conditional information (CI) vector, such as... Figure 3 As shown. This vector It is a 65-bit binary vector whose structure is designed based on the quantitative description of biological properties.
[0027] The first step is the construction of the target bacterial feature bits. This invention selects ten clinically representative pathogenic bacteria, including Gram-positive and Gram-negative bacteria. In the first 10 bits of the composite conditional information vector, each bit corresponds to one bacterium. If the target peptide needs to have inhibitory activity against a certain bacterium, the corresponding position is set to 1; otherwise, it is 0. This encoding method allows the model to simultaneously receive generation instructions for a single bacterium or a combination of multiple bacteria.
[0028] Secondly, the MIC level feature bits are constructed. The MIC value is a core quantitative indicator for measuring antibacterial activity. To integrate continuous MIC values into discrete composite conditional information vectors, this invention employs a standardized discrete coding process. The specific steps are as follows: First, the log-median of the MIC values for each peptide in the database records is calculated to eliminate fluctuations caused by experimental errors; then, based on the distribution range of MIC values in the dataset, it is divided into several standardized intervals; finally, these intervals are one-hot or interval-coded using 5-bit binary codes. For example, a high-activity interval might correspond to the code "10000", while a low-activity interval might correspond to "00001". This approach enables the model to perceive gradient changes in antibacterial intensity.
[0029] Finally, the construction of peptide chain length feature bits is performed. To precisely control the physical length of the peptide chain during generation and prevent the generation of invalid padding characters, this invention employs a bitmask mechanism for the last 50 bits of the composite conditional information vector. For a length of... The target peptide, the vector front The bit is set to 1, and the remaining bits are... The bit is set to 0. This not only tells the generator the number of effective amino acids required, but also provides the discriminator with a clear basis for length determination.
[0030] like Figure 2 As shown, the CFGAN model constructed in this invention contains two core neural networks: a generator (G) and a discriminator (D).
[0031] Generator The input consists of two parts: a latent noise vector sampled from a prior distribution (usually a Gaussian distribution). and the composite conditional information vector constructed above These two vectors are concatenated at the input layer, and then undergo preliminary feature transformation and dimensionality reshaping through a fully connected layer. To capture the local dependencies of the amino acid sequence, the generator core employs a multi-layer one-dimensional convolutional neural network (Conv1D). For positional information processing, a coordinate channel (Coordinate-Channel1D) is introduced to enhance the model's ability to perceive sequence positions. The generator's output layer uses the Tanh activation function, outputting a dimensionless vector. The matrix represents the maximum sequence length (50) and the probability distribution of 20 natural amino acids and special markers (such as fillers, start symbols, etc.).
[0032] Discriminator Its function is to distinguish whether the input sequence is a natural antimicrobial peptide from a real dataset or a synthetic peptide generated by the generator. The discriminator also receives a composite conditional information vector. As input, it is fused with the peptide sequence features to be discriminated. The discriminator's network architecture includes multiple one-dimensional convolutional layers for extracting sequence features, followed by global average pooling to compress the features, and finally outputting a probability value between 0 and 1 through a multilayer perceptron (MLP), representing the probability that the input sample is "real".
[0033] During training, the generator and discriminator engage in a minimax game. To address potential gradient vanishing or mode collapse issues during training, this invention introduces a gradient penalty term and employs the Adam optimizer for parameter updates. The specific objective function... The expression is as follows: in, Let G represent the objective function of the game between the generator G and the discriminator D. The expected value is the mathematical value, representing the average performance under the corresponding probability distribution. This indicates the score given by the discriminator for genuine natural antimicrobial peptides. This represents a vector of peptide sequences and their composite conditional information sampled from a real data distribution. This represents the probability that the discriminator determines the input sequence X to be a true natural sequence when it sees condition C. The output value of this function is between 0 and 1. This represents the expected score for the synthetic data (fake data) generated by the generator. This represents the fake samples generated by the generator. The generator attempts to minimize... This means deceiving the discriminator; while the discriminator tries to maximize the probability of correct classification.
[0034] Training stability of generative adversarial networks (GANs) has always been a challenge, especially when dealing with discrete sequence data. To further stabilize training dynamics, this invention innovatively introduces a Brownian Motion Controller (BMC) during the optimization process. As a noise-driven controller, the core function of the BMC is to dynamically adjust the learning rates of the generator and discriminator, thereby introducing controlled random perturbations into the parameter space. This helps the model escape local optima and matches the convergence speed of both.
[0035] The dynamic control behavior of the BMC is described by the following stochastic differential equation: in, This represents the control input (i.e., the amount of adjustment to the learning rate). This represents the current system state (such as the difference in loss between the generator and the discriminator, or the gradient norm). and It is an independent Brownian motion term, representing the introduced multiplicative noise. and It is a non-negative control gain constant. These are control parameters. Through this equation, when the system state... When deviating from the equilibrium point, the noise intensity increases nonlinearly according to the amplitude of the state, thus generating a strong restoring or exploratory force; when the system tends to converge, the noise term automatically decays, ensuring fine convergence.
[0036] After introducing BMC, the system's state evolution equation becomes: in, This represents the differential change of a state variable within an infinitesimal time step. It is the system dynamics brought about by the original gradient descent. The differential component representing the time variable. This represents the control signal or noise regulation term output by the introduced Brownian motion controller (BMC) at time t. Theoretical analysis shows that as long as the condition is met... (in (where BMC is a system-related constant), this controller can guarantee the global exponential stability of the system in a probabilistic sense. In practical deployment, this invention utilizes BMC to calculate the adjustment factor in real time, making the loss function curves of the generator and discriminator smoother and effectively avoiding mode collapse.
[0037] To endow the generated antimicrobial peptides with broad-spectrum properties, relying solely on the true / false discrimination of a discriminator is insufficient. This invention constructs a reinforcement feedback loop, directly incorporating functional evaluation into the generator's optimization objective.
[0038] The core components of this feedback loop are ten pre-trained deep learning classifiers, called "analysors." Each analyzer is specifically designed to perform binary classification predictions for a particular type of bacteria (such as Escherichia coli, Staphylococcus aureus, etc.). In each iteration of training, the generator produces a batch of sequences. It will be fed into this set of analyzers. For each sequence We count the number of bacterial species predicted to be valid, denoted as . .
[0039] like Figure 4 As shown, in order to guide the model to evolve towards a "broad spectrum" approach, this invention designs a step-like segmented reward function. This function not only rewards a single activity, but also provides a high, non-linear reward for multiple activities. The specific reward function design is shown in the table and formula below: in, This indicates a specific synthetic antimicrobial peptide sequence generated by the generator in the current iteration. This represents the piecewise reward function. This represents the maximum number of target bacteria (can be 10). This indicates the number of pathogens that the activity analysis model predicts to have inhibitory activity for a given set of target bacteria.
[0040] The design logic of this reward mechanism is as follows: when the sequence is effective against at least one bacterium, a base reward of 0.2 is given to help the model break through the bottleneck of "zero activity"; when the activity covers more than three bacteria, the base reward jumps to 0.5; when it covers more than five bacteria, the reward reaches 0.8. This hierarchical design decomposes the complex broad-spectrum optimization problem into progressive sub-objectives.
[0041] Due to the reward function Since it is non-differentiable, the generator cannot be directly updated via backpropagation. Therefore, this invention employs the Policy Gradient algorithm to transform the reward signal into an advantage function. ,in, It is the advantage function, which indicates how much better the generator performs than the "system average" when generating candidate sequences s in the current state (how great its advantage is). It is the exponential moving average baseline of historical rewards.
[0042] Total loss function of the generator Losses due to confrontation and the adjusted reinforcement learning loss function Weighted composition: in, This represents the total loss function of the generator in a single training iteration. The adversarial loss, derived from the standard conditional generative adversarial network (GAN) mechanism, represents the degree to which the generator failed to successfully deceive the discriminator. The hyperparameter representing the reinforcement learning weights is used to adjust the generator's weighting between two optimization objectives: natural antimicrobial peptides (authenticity) and broad-spectrum antimicrobial ability (functionality). This represents the adjusted reinforcement learning loss. Let represent the expected value of the computation, where the candidate sequence s is generated by the generator G by sampling under given random noise z and control conditions c.
[0043] In this way, the generator is forced to maximize broad-spectrum antibacterial activity (to obtain a high reward) while maintaining the authenticity of the generated sequences (deceiving the discriminator). In addition, to prevent the discriminator from falling behind, this invention also employs an adaptive sample replacement strategy: periodically replacing the low-activity real peptides in the training set with the generated highly active peptides as "false positive" samples, forcing the discriminator to learn to distinguish more subtle functional features.
[0044] To verify the effectiveness of the method of the present invention, extensive comparative experiments and molecular dynamics simulations were conducted.
[0045] In terms of quality assessment, this invention employs three key indicators: PepFID (Peptide Fréchet Inception Distance), Instability Index, and Antimicrobial Score. PepFID measures the distance between the generated peptide distribution and the actual peptide distribution; a lower value is better. The Instability Index is calculated based on the chemical properties of the dipeptide; a lower value indicates greater stability. The Antimicrobial Score represents the proportion of generated peptides predicted to be positive.
[0046] The table below shows the performance comparison results of this invention (CFGAN) with existing mainstream methods (such as LSTM-RNN, AMPGAN, HydrAMP, and Diff-AMP): Data shows that the antimicrobial peptide generated by this invention is superior to the comparative method in terms of structural authenticity (PepFID=3.97) and structural stability (Instability=32.64), and the antimicrobial activity score reaches 90.33%, which is significantly higher than other models.
[0047] Further ablation experiments validated the role of the enhanced feedback loop (RFM). When the RFM was removed, the model's antimicrobial property score decreased from 92.97% to 81.93%, and the number of generated multi-target peptides (effective against more than three types of bacteria) was significantly reduced. This directly demonstrates the decisive role of the feedback mechanism in the generation of broad-spectrum activity.
[0048] like Figure 5 As shown, to verify the mechanism of action of the generated peptides at the microscopic level, this embodiment also selected representative generated sequences (such as the sequence KWKKWLKCIWKRVAKKIL) for molecular dynamics (MD) simulation. The simulation system constructed a bilayer membrane model containing POPE and POPG lipid molecules to simulate the cell membrane environment of *E. coli* and *Staphylococcus aureus*. The simulation duration was set to 500 nanoseconds (ns). The results showed that the generated peptides could rapidly adsorb and insert into the membrane surface in the initial stage of the simulation (0-200 ns), and subsequently, during the period of 200-500 ns, their root mean square deviation (RMSD) curve tended to stabilize, indicating that the peptide molecules reached a stable conformation in the membrane environment. Simultaneously, the radius of gyration (Rg) analysis showed that the peptides targeting the *E. coli* membrane exhibited a more compact structure (lower Rg value), which is conducive to their penetration into the membrane core and causing damage; while the peptides targeting the *Staphylococcus aureus* membrane exhibited an adaptive surface-binding conformation. These physicochemical behaviors are highly consistent with the bactericidal mechanism of natural antimicrobial peptides, confirming that the peptides generated in this invention have practical pharmaceutical potential.
[0049] This invention also provides a broad-spectrum antimicrobial peptide generation system, including a data processing module for collecting data and encoding composite conditional information vectors, a model building module for constructing a generative adversarial network and a reinforcement feedback loop containing multiple analyzers, a training control module for integrating a Brownian motion controller to adjust the learning rate and calculate feedback rewards to update model parameters, and a system for outputting broad-spectrum antimicrobial peptide sequences based on user input conditions.
[0050] Those skilled in the art will understand that all or part of the steps in the methods of the above embodiments can be implemented by a program instructing related hardware. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application can take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0051] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. Since the above embodiments are substantially similar to the method embodiments, their descriptions are relatively simple; relevant parts can be referred to the descriptions of the method embodiments.
[0052] The above embodiments provide a detailed description of the present invention. Specific examples have been used to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.
Claims
1. A method for generating broad-spectrum antimicrobial peptides based on conditional feedback generative adversarial networks, characterized in that, The method includes the following steps: S1. Construct a training dataset and perform preprocessing, the preprocessing including constructing a composite conditional information vector containing at least target bacterial species information, minimum inhibitory concentration level information, and peptide chain length information. S2. Construct a generative adversarial network model containing a generator and a discriminator, and a reinforcement feedback loop containing multiple activity analysis models that are pre-trained and analyze different specific pathogens respectively. S3. Iterative adversarial training of the generative adversarial network model is performed based on the training dataset. During the training process, the composite conditional information vector is used as a constraint input to the generator and discriminator. The reinforcement feedback loop is used to evaluate the broad-spectrum activity of the peptide sequence generated by the generator and calculate the reward value. The reward value is fed back to the generator to guide its parameter update. At the same time, a Brownian motion controller is introduced to dynamically adjust the learning rate during the training process. S4. Receive the target composite condition information vector input by the user, combine it with random noise input to the trained generator, and output a broad-spectrum antimicrobial peptide sequence that meets the target conditions.
2. The method for generating broad-spectrum antimicrobial peptides according to claim 1, characterized in that, The specific steps for constructing the training dataset are as follows: Extract initial peptide sequences with confirmed antibacterial activity; Initial peptide sequences with amino acid sequence lengths less than or equal to a preset length threshold are selected to obtain standard peptide sequences; Use sequence clustering tools to remove redundant sequences in standard peptide sequences that have a consistency higher than a preset consistency threshold to obtain an antimicrobial peptide dataset for training.
3. The method for generating broad-spectrum antimicrobial peptides according to claim 1, characterized in that, The specific steps for constructing the composite conditional information vector are as follows: Constructing target bacterial signatures: using binary encoding to represent the inhibitory activity of peptide sequences against various bacteria; Constructing MIC-level feature bits: Based on the MIC values of peptide sequences, the normalized interval discretization process is performed, and binary bit encoding is used to represent the antibacterial strength level to which the sequence belongs; Construct peptide chain length feature bits: Establish a fixed-length binary vector and use a mask to represent the position information of effective amino acids; The target bacterial feature bits, MIC level feature bits, and peptide chain length feature bits are spliced together to form a composite conditional information vector with a preset dimension.
4. The method for generating broad-spectrum antimicrobial peptides according to claim 1, characterized in that, The specific steps for constructing the enhanced feedback loop are as follows: We acquired peptide sequence sample data containing known antibacterial activities of various pathogens, trained independent deep learning prediction models based on the sample data, and obtained multiple activity analysis models. Multiple activity analysis models are integrated into the training loop of a generative adversarial network. During training, the activity analysis model receives peptide sequences generated by the generator and outputs the predicted activity probability of the peptide sequence for the corresponding pathogen.
5. The method for generating broad-spectrum antimicrobial peptides according to claim 1, characterized in that, The specific steps for using reward signals to guide generator optimization are as follows: The peptide sequences generated by the statistical generator are predicted as the total number of active pathogens in the reinforcement feedback loop; Based on the total number of pathogens, a basic reward value is calculated using a segmented reward function; wherein, the segmented reward function is configured to assign a step-by-step increasing basic reward value when the total number of pathogens reaches different preset quantity thresholds; A baseline term is introduced to calculate the advantage function, which transforms the base reward value into an advantage value. This advantage value is then combined with the adversarial loss of the generative adversarial network to update the generator's parameters.
6. The method for generating broad-spectrum antimicrobial peptides according to claim 1, characterized in that, The specific steps for introducing a Brownian motion controller to dynamically adjust the learning rate are as follows: A stochastic differential equation containing multiplicative noise terms is constructed, and the Brownian motion term in the stochastic differential equation is used as a perturbation variable to simulate the dynamic process of model parameter optimization. During the training iterations of the generative adversarial network, a learning rate adjustment factor is calculated in real time based on a preset first control parameter and a second control parameter. Based on the learning rate adjustment factor, the learning rates of the generator and the discriminator are dynamically adjusted to synchronize their convergence speed.
7. The method for generating broad-spectrum antimicrobial peptides according to claim 1, characterized in that, The iterative adversarial training process for the generative adversarial network model also includes an adaptive sample replacement step, specifically: A reinforced feedback loop is used to perform a broad-spectrum activity score on the batch peptide sequences generated by the generator in the current iteration cycle. Based on the scoring results, highly active generating peptide sequences that meet the first preset condition for broad-spectrum activity scores were selected. The real peptide sequences in the training dataset that meet the second preset condition for broad-spectrum activity scores are replaced with highly active generated peptide sequences to dynamically construct an enhanced training set for the discriminator to perform subsequent iterative learning.
8. The method for generating broad-spectrum antimicrobial peptides according to claim 1, characterized in that, The target composite condition information vector input by the user is combined with random noise and input into the trained generator to output a broad-spectrum antimicrobial peptide sequence that meets the target conditions, specifically including: Receive a target complex condition information vector containing information on the target bacterial species, minimum inhibitory concentration level, and peptide chain length. The target composite conditional information vector is fused with the random noise vector and then input into the generator whose parameters have been updated through iterative adversarial training. The generator outputs an amino acid sequence corresponding to the target complex condition information vector, which serves as a broad-spectrum antimicrobial peptide sequence that meets the target conditions.
9. A broad-spectrum antimicrobial peptide generation system for implementing the method according to any one of claims 1-8, characterized in that, include: The data processing module is used to collect data and encode composite conditional information vectors; The model building module is used to build generative adversarial networks and reinforcement feedback loops containing multiple analyzers. The training control module is used to integrate a Brownian motion controller to adjust the learning rate and calculate feedback rewards to update model parameters; The application generation module is used to output broad-spectrum antimicrobial peptide sequences based on user input conditions.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method for generating broad-spectrum antimicrobial peptides as described in any one of claims 1 to 8.