An anti-hypertensive peptide prediction model, a construction method thereof and an application thereof

By constructing an antihypertensive peptide prediction model and utilizing multi-model fusion and BiLSTM network, the optimal feature combination was selected, solving the problem of time-consuming and labor-intensive screening of antihypertensive peptides in traditional methods, and achieving efficient and accurate peptide prediction.

CN122201803APending Publication Date: 2026-06-12XIAMEN YUANZHIDAO BIOTECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIAMEN YUANZHIDAO BIOTECHNOLOGY CO LTD
Filing Date
2026-01-29
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Traditional experimental techniques for identifying antihypertensive peptides are time-consuming, expensive, and have low throughput, making it difficult to quickly screen for bioactive peptides with blood pressure regulation functions.

Method used

A predictive model for antihypertensive peptides is constructed. By integrating feature importance evaluation through random forest, gradient boosting decision tree, lightweight gradient boosting machine and XGBoost model, the optimal feature combination is screened out. The predictive model is then constructed by combining BiLSTM network and aggregation layer, and amino acid sequence features are used for rapid screening.

🎯Benefits of technology

It significantly improves the predictive accuracy of antihypertensive peptides, reaching 0.872, which is better than existing models. It is suitable for large-scale rapid screening of antihypertensive peptides and has important application value.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122201803A_ABST
    Figure CN122201803A_ABST
Patent Text Reader

Abstract

The application discloses an anti-hypertensive peptide prediction model and a construction method and application thereof. The construction method comprises the following steps: obtaining a training data set containing positive and negative samples; extracting multiple initial features from a peptide sequence; screening an optimal feature combination composed of amino acid composition (AAC), k-spaced amino acid pairs (CKSAAGP), AAindex physicochemical properties and tertiary structure features based on weighted fusion evaluation and a greedy search strategy of four models of random forest, gradient boosting decision tree, light gradient boosting machine and XGBoost; and taking the optimal feature combination as input, constructing and training a neural network model combined with a bidirectional long short-term memory network (BiLSTM) and an aggregation layer, wherein the aggregation layer is used for fusing sequence features extracted by the BiLSTM and static manual features such as the number of charges and hydrophobicity. Through the optimized feature combination and model structure, high-precision and high-efficiency prediction of the anti-hypertensive peptide is realized.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of bioinformatics technology, and in particular to an antihypertensive peptide prediction model, its construction method, and its application. Background Technology

[0002] Hypertension is a common cardiovascular disease characterized by persistently elevated arterial blood pressure at rest. Long-term hypertension can damage vital organs such as the heart, brain, and kidneys, significantly increasing the risk of serious illnesses such as heart disease, stroke, and kidney failure. Epidemiological surveys show that the prevalence of hypertension among adults in both China and the United States is relatively high.

[0003] Anti-hypertensive peptides (AHPs) are bioactive peptides derived from the enzymatic hydrolysis of food proteins. They possess the ability to regulate blood pressure through multiple mechanisms, including inhibiting angiotensin-converting enzyme (ACE). Therefore, AHPs have significant application value in the development of functional foods and drugs. However, identifying AHPs using traditional experimental techniques is typically time-consuming, expensive, and has low throughput. Developing efficient predictive models using artificial intelligence technology is crucial for the rapid screening of AHPs. Summary of the Invention

[0004] To achieve the above objectives, the solution of the present invention is: a method for constructing an antihypertensive peptide prediction model, comprising the following steps:

[0005] Obtain the training dataset: The dataset contains positive samples of antihypertensive peptides and negative samples of non-antihypertensive peptides; Feature extraction: Extract an initial feature set from the amino acid sequence of the peptide, including amino acid composition AAC, k-spacer amino acid pair composition CKSAAGP, k-mer, AAindex, PC6, BLOSUM62, PAAC, charge number, hydrophobicity, atomic and bond composition ATC, secondary structure features, and tertiary structure features. Optimal feature combination selection: Based on the comprehensive evaluation results of feature importance by four models, namely Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Lightweight Gradient Boosting Machine (LGBM), and XGBoost, the feature combination with the strongest discriminative ability to distinguish between antihypertensive peptides and non-antihypertensive peptides is determined from the initial feature set as the optimal feature combination. The optimal feature combination consists of amino acid composition AAC, k-spacer amino acid pair composition CKSAAGP, AAindex physicochemical properties, and tertiary structure features. Predictive model training: Using the optimal feature combination as input, a prediction model combining a BiLSTM network and an aggregation layer is constructed; the aggregation layer is used to fuse the sequence features extracted by the BiLSTM network with additional static handcrafted features. The constructed prediction model is trained using the training dataset to obtain the antihypertensive peptide prediction model.

[0006] Preferably, 914 positive AHPs were collected from a public database; 914 non-AHP peptide sequences were selected from the UniProt database as negative samples; the negative samples were consistent with the positive samples in terms of length and amino acid frequency; the collected 914 AHPs and 914 non-AHPs constituted a dataset containing 1828 peptides, and the dataset was randomly divided into a training set and a test set in a 4:1 ratio.

[0007] Preferably, the optimal feature combination screening specifically includes: The initial feature set was evaluated using the RF, GBDT, LGBM, and XGBoost models respectively, and the importance score of each sub-feature under each model was obtained. Based on the accuracy (ACC) value of each model on the validation set, weights are assigned to them, and the importance scores obtained from the four models are weighted and fused to obtain the fused importance score of each sub-feature. All sub-features are sorted according to the fusion importance score, and the top N feature subsets are selected; A greedy search strategy is adopted to combine and optimize the feature subsets according to categories, and the optimal feature combination is determined by using cross-validation performance as an indicator.

[0008] Preferably, before using the four models to evaluate feature importance, the initial feature set is first standardized using Z-score standardization based on the mean and standard deviation of the training set data.

[0009] Preferably, the prediction model sequentially includes an embedding layer, a BiLSTM layer, the aggregation layer, and a fully connected layer: Embedding layers are used to transform the input feature vectors into dense vectors; BiLSTM layers are used to analyze sequence data from both forward and reverse directions, capturing the contextual dependencies in the sequence data. Fully connected layers are used to process the integrated features and output predicted probabilities.

[0010] Preferably, the fully connected layer includes a Dropout layer, a Linear layer, a ReLU layer, and a Sigmoid layer.

[0011] Preferably, the static handcrafted features incorporated in the polymer layer include charge number and hydrophobicity.

[0012] An antihypertensive peptide prediction model was constructed using the method described above.

[0013] The application of the above-mentioned antihypertensive peptide prediction model for predicting and screening antihypertensive peptides includes the following steps: Obtain the amino acid sequence of the peptide to be predicted; The feature vector of the optimal feature combination is extracted from the amino acid sequence. The optimal feature combination is composed of amino acid composition AAC, k-spacer amino acid pair composition CKSAAGP, AAindex physicochemical property features and tertiary structure features. The feature vector is input into the prediction model to obtain the predicted probability that the peptide is an antihypertensive peptide.

[0014] After adopting the above solution, the beneficial effects of the present invention are as follows: This invention employs a feature importance evaluation and greedy search strategy that integrates four models—Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Lightweight Gradient Boosting Machine (LGBM), and XGBoost—to screen the most effective feature combinations for characterizing antihypertensive peptides, overcoming the redundancy and noise problems inherent in high-dimensional features. By constructing a prediction model combining BiLSTM and aggregation layers, and utilizing both dynamic contextual information and key static physicochemical properties of the sequence, the model's discriminative ability is significantly improved. Testing shows that the prediction accuracy of this method reaches 0.872, outperforming existing mainstream models such as AHTpin and StackAHTPs, making it suitable for rapid and accurate screening of large-scale antihypertensive peptides and possessing significant application value. Attached Figure Description

[0015] Figure 1 This is a partial flowchart (I) of the method for constructing the antihypertensive peptide prediction model of the present invention.

[0016] Figure 2 This is a partial flowchart (II) of the method for constructing the antihypertensive peptide prediction model of the present invention. Figure 2 Next Figure 1 after.

[0017] Figure 3 This is a flowchart of the antihypertensive peptide prediction method of the present invention. Detailed Implementation

[0018] The following is in conjunction with the appendix Figures 1 to 3 The present invention will be described in detail with reference to specific embodiments.

[0019] This invention provides an antihypertensive peptide prediction model, its construction method, and its application. The construction method includes the following steps: I. Dataset Construction and Preprocessing: We construct a high-quality, balanced, and challenging dataset for training and evaluating the proposed BiLSTM prediction model.

[0020] Antihypertensive peptides (AHPs) were collected from public databases including AHTPDB, ACEPEPDB, EROP-Moscow, FermFooDb, SATPdb, and UniDL4BioPep. To ensure data quality, only AHP peptide sequences containing natural amino acids were retained, and repetitive sequences were removed to prevent data leakage and model overfitting, ultimately yielding 914 positive AHP samples.

[0021] 914 non-AHP peptide sequences were selected from the UniProt database as negative samples to ensure data balance. The negative samples have the same distribution as the positive samples in terms of length and amino acid frequency to increase the interference of non-AHPs.

[0022] The collected 914 AHPs and 914 non-AHPs constitute a dataset containing 1828 peptides. This dataset is randomly divided into a training set (1462 peptides) and a test set (366 peptides) in a 4:1 ratio. This process is repeated 10 times to evaluate robustness.

[0023] II. Basic Prediction Model The basic prediction model described in this invention is a bidirectional long short-term memory (BiLSTM) model, which includes an embedding layer, a BiLSTM layer, and a fully connected layer.

[0024] First, the model uses an embedding layer to convert the input peptide sequence into a dense vector representation, which facilitates processing by the neural network.

[0025] Then, the BiLSTM layer processes these vectors, connecting the final hidden states in both directions to form a unified representation. The BiLSTM layer can process sequence features from both the forward and reverse directions, capturing bidirectional contextual dependencies (such as the impact of the sequential association of amino acids in a peptide sequence on performance).

[0026] Structurally, the BiLSTM layer contains forward LSTM and reverse LSTM. Each LSTM unit contains an input gate, a forget gate, an output gate, and a cell state, which can solve the problem of long sequence dependencies.

[0027] This model sets the hidden layer dimension of the BiLSTM to 64 and the number of layers to 2. To avoid overfitting, a dropout mechanism is introduced between the input and hidden state of each LSTM layer, with a dropout rate of 0.3. The BiLSTM layers extract the hidden states of the forward and backward LSTMs at the last time step, and concatenate them to form a 128-dimensional vector (64-dimensional forward + 64-dimensional backward), which serves as the global contextual feature representation of the entire peptide sequence.

[0028] Next comes a series of fully connected layers that refine and integrate sequence and physicochemical information extracted from the peptide sequence. These fully connected layers include ReLU activation and dropout regularization to prevent overfitting and enhance the model's generalization ability. Linear layers are used for further integration and processing of features. The final sigmoid layer uses a sigmoid activation function to predict the peptide's activity, outputting a probability value between 0 and 1. A threshold (e.g., 0.5) can be set; if the output probability is greater than this threshold, the input peptide is considered an AHP (Actively Hidden Power Peptide); otherwise, it is considered a non-AHP.

[0029] III. Selection of the Optimal Feature Combination This section details how to obtain the optimal feature combination of "AAC, CKSAAGP, AAindex, and tertiary structure". Here, "optimal" refers to the feature combination in the initial feature set that has the strongest discriminative ability to distinguish between antihypertensive peptides and non-antihypertensive peptides for the classification task. Its optimality is determined by the performance of the multi-model fusion evaluation and greedy search strategy on the validation set. (1) Feature extraction and standardization: Each peptide's amino acid sequence contains over 1000 handcrafted features, categorized into 12 main groups. These features cover the sequence's composition, evolution, physicochemical, and structural properties, specifically including: ① Amino Acid Composition (AAC): The frequency of each of the 20 natural amino acids in the peptide sequence (e.g., the proportion of alanine, lysine, etc.) is statistically analyzed to reflect the basic compositional characteristics of the sequence. The single-letter codes for the 20 natural amino acids are as follows: alanine (A), arginine (R), asparagine (N), aspartic acid (D), cysteine ​​(C), glutamine (Q), glutamic acid (E), glycine (G), histidine (H), isoleucine (I), leucine (L), lysine (K), methionine (M), phenylalanine (F), proline (P), serine (S), threonine (T), tryptophan (W), tyrosine (Y), and valine (V).

[0030] For example, a peptide sequence "AACD" contains two alanines (2 / 4 = 0.5), one cysteine ​​(1 / 4 = 0.25), and one aspartic acid (1 / 4 = 0.25), with the rest being 0. This peptide sequence is then converted into a vector containing 20 letters: [0.5, 0, 0, 0.25, 0.25, 0, ...].

[0031] ②k-spaced amino acid pair composition (CKSAAGP): Calculate the frequency of occurrence of amino acid pairs with a space of k (k=0,1,2) (such as "AA" "AB", where A and B are any amino acids) to capture short program column patterns (k=0 is adjacent amino acid pairs, k=1 is a pair with a space of 1 amino acid).

[0032] ③k-mer feature: The sequence is split into continuous subsequences of length k (e.g., k=2,3) (e.g., "ABC" is split into "AB" and "BC"), and the occurrence frequency of each subsequence is counted to reflect local sequence preference.

[0033] ④PAAC embedding features: Combining the correlation between amino acid composition and the physicochemical properties of adjacent amino acids, while considering the evolutionary conservation of the sequence, features containing global sequence information are generated.

[0034] ⑤AAindex features: Based on the amino acid physicochemical property database (AAindex), each amino acid is mapped to a numerical vector (such as hydrophobicity, molecular weight, isoelectric point, etc.), and then the peptide-level feature vector is obtained by sequence averaging or weighting.

[0035] ⑥PC6 features: High-dimensional physicochemical features are reduced in dimensionality by principal component analysis (PCA), retaining the first 6 principal components and reducing feature redundancy.

[0036] ⑦ BLOSUM62 embedding features: Based on the BLOSUM62 matrix (amino acid substitution score matrix), each amino acid is converted into a 20-dimensional vector (representing the probability of substitution with other amino acids), and then concatenated into a matrix representation of the sequence.

[0037] ⑧ Charge number: Calculate the total charge of the peptide sequence at physiological pH (based on the dissociation characteristics of amino acid side chains), reflecting its hydrophilicity / hydrophobicity balance.

[0038] ⑨ Hydrophobicity: Based on the Kyte-Doolittle scale, calculate the average hydrophobicity or hydrophobicity distribution of the sequence and the structural stability of the associated peptide.

[0039] ⑩ Atomic and Bond Composition (ATC): The percentage of total number of atoms (e.g., C, H, O, N) and chemical bond types (e.g., peptide bond, hydrophobic bond) in the statistical sequence, reflecting the characteristics of molecular structure.

[0040] Secondary structure features: The secondary structure (α-helix, β-sheet, random coil) of peptides is predicted using tools such as PSIPRED, and the structure type is encoded as a numerical feature.

[0041] Tertiary structural features: Based on homology modeling or fold recognition tools (such as I-TASSER), the three-dimensional structure is predicted, and features such as solvent accessible surface area (SASA) and number of hydrogen bonds are extracted to reflect spatial conformation information.

[0042] For each sub-feature among the 12 feature categories mentioned above, the mean (μ) and standard deviation (σ) are calculated using only the training set data: μ is the mean of all samples for this feature in the training set; σ is the standard deviation of all samples for this feature in the training set (if σ=0, it can be set to 1 to avoid division by zero error); x is the original feature value.

[0043] Perform Z-score normalization, standardizing all sub-features on both the training and validation sets using the μ and σ values ​​from the training set. The normalization formula is: x′ = (x - μ) / σ. Save the normalized data as input for all subsequent models.

[0044] (2) Feature importance assessment based on multi-model fusion: To obtain a robust feature importance assessment, we used four different tree models for independent evaluation and weighted fusion.

[0045] Random Forest Regressor (RF): The RF regressor is initialized based on the standardized training set data. The feature importance built into the RF model is extracted (based on the Gini coefficient reduction, i.e., the total contribution of each feature to reducing impurity during all decision tree splits). The original importance scores are normalized: the score of each feature is divided by the sum of all feature scores to obtain a standardized score of 0 to 1.

[0046] Gradient Boosting Decision Tree (GBDT): Initialize the GBDT regressor based on standardized training data. Extract the total gain of each feature in the GBDT during all tree splits (reflecting the degree to which the feature reduces the model loss). Directly use the total gain as the feature importance score under this model.

[0047] Lightweight Gradient Boosting Machine (LGBM): Initializes the LGBM regressor based on standardized training data. Extracts the total gain for each feature in the LGBM (reflecting the degree to which the feature reduces the model loss). Directly uses the total gain as the feature importance score for this model.

[0048] XGBoost Model: The XGBoost regressor is initialized based on standardized training data. Two native metrics of XGBoost are extracted: weight (the total number of times a feature is used for splits) and total gain (the total loss reduction of a feature across all splits). The two metrics for the two features are weighted and fused: first, the weights and total gain are normalized (total range 0-1); then, the weighted score is calculated: Feature Importance = 0.7 × Normalized Weights + 0.3 × Normalized Total Gain.

[0049] (3) Model performance evaluation (determining fusion weights): Using the validation set data, calculate the performance metrics for the four models mentioned above. Record the ACC value for each model, denoted as ACC_RF, ACC_GBDT, ACC_LGBM, and ACC_XGBoost.

[0050] The weights of each model are calculated based on the ACC values ​​of the validation set. The weight formula is w_model = ACC_model / ACC_total. The final result is four weights: w_RF, w_GBDT, w_LGBM, and w_XGBoost (which sum to 1).

[0051] (4) Feature importance weighted fusion and preliminary screening: The feature importance scores output by the four models are normalized (each model's feature score is divided by the sum of all feature scores within that model, ensuring that the sum of scores within the same model is 1). For each sub-feature, the fused total score is calculated as (w_RF×RF score + w_GBDT×GBDT score + w_LGBM×LGBM score + w_XGBoost×XGBoost score). All sub-features are sorted from highest to lowest total score, and the top 50 sub-features are selected.

[0052] A greedy strategy is used to combine the selected sub-features. Then, the features are sorted from highest to lowest according to their ACC (Adjustment Capability) to find the optimal feature combination.

[0053]

[0054] As shown in the table above, the combination "AAC+CKSAAGP+AAindex+three-level structure" achieved the highest average ACC (0.796) on the validation set, and subsequent feature additions either failed to provide significant improvements or led to performance degradation. Therefore, this combination was determined to be the final optimal feature combination.

[0055] IV. Model Optimization: This section details the specific scheme, experimental process, and results for optimizing the basic bidirectional long short-term memory network (BiLSTM) model. Through comparative verification, the optimal model structure of this invention is determined.

[0056] (1) Model optimization scheme On top of a basic BiLSTM model containing embedding layers, BiLSTM layers, and fully connected layers, different functional layers are integrated to construct a series of comparative models to evaluate the performance improvement of each component. All models use the optimal feature combination obtained above (AAC, CKSAAGP, AAindex, and tertiary structural features) as input. The specific comparative models constructed include: Number 0: Basic prediction model BiLSTM: serving as a performance benchmark. This model consists of an embedding layer, a BiLSTM layer, and a fully connected layer composed of a Dropout layer, a Linear layer, and a Sigmoid activation function.

[0057] Serial No. 1: BiLSTM combined with Gaussian noise layer model: A Gaussian noise layer is added after the embedding layer of the basic model. During training, random noise with a mean of 0 and a standard deviation of 0.1 is added to the data.

[0058] Serial No. 2: BiLSTM and Convolutional Layer Combination Model: A one-dimensional convolutional layer is added after the embedding layer of the basic model to extract local patterns in the input feature sequence. The convolutional output is processed by the ReLU activation function and then input into the BiLSTM layer.

[0059] Sequence No. 3: BiLSTM and normalization layer combination model: After the BiLSTM layer of the basic model, a layer normalization layer is added to normalize the output of the BiLSTM.

[0060] No. 4: BiLSTM and attention mechanism combined model: An attention mechanism layer is added after the BiLSTM layer of the basic model. This layer can learn and assign weights to features at different time steps to highlight key information.

[0061] No. 5: BiLSTM and pooling layer combination model: After the BiLSTM layer of the basic model, a max pooling layer is added to downsample the sequence dimension and retain the most significant features.

[0062] Serial No. 6: BiLSTM and Residual Connection Combination Model: Based on the basic model, a shortcut connection is established from the output of the embedding layer to the output of the BiLSTM layer, and the output vectors of the two parts are summed.

[0063] Serial No. 7: BiLSTM and Aggregation Layer Combination Model (Optimal Model of this Invention): After the BiLSTM layer of the basic model, an aggregation layer is added. This aggregation layer concatenates the high-level sequence features output by the BiLSTM layer with static handcrafted features directly extracted from the original peptide sequence to form a comprehensive feature vector that integrates dynamic contextual information and deep physicochemical properties, which is then input into the fully connected layer.

[0064] No. 8: BiLSTM combined with global predictor model: After the fully connected layer of the basic model, an additional simple classifier is added as a global predictor to perform secondary adjustment and calibration on the predicted probability output by the fully connected layer.

[0065] (2) Model training and evaluation methods To ensure a fair comparison of the performance of all models, a unified training and evaluation standard is adopted: Dataset: The training set (1462 data points) and the independent test set (366 data points) prepared above.

[0066] Training process: During each training session, 1169 data points are randomly selected from the training set for model parameter updates, and the remaining 293 data points are used as a validation set to monitor the training process. This training process is repeated 10 times for each model.

[0067] Performance Evaluation: The trained model is then evaluated using the test set. Evaluation metrics include: Accuracy (ACC), Matthews Correlation Coefficient (MCC), Area Under the Curve (AUC), Sensitivity (Sn), and Specificity (Sp).

[0068] (3) Performance comparison results The table below lists the average performance metrics obtained by evaluating all models on the test set.

[0069]

[0070] As shown in Table 2, the systematic comparison results indicate that the BiLSTM+ polymer layer model (serial number 7) significantly outperforms other comparative models in multiple core performance indicators. In particular, its accuracy (ACC) reaches 0.872 and its Matthews correlation coefficient (MCC) reaches 0.863, demonstrating excellent comprehensive predictive ability and robustness. This model is the optimal model for antihypertensive peptides (AHPs) in this invention, and is named BiLSTM-AHPs.

[0071] V. Performance Comparison of BiLSTM-AHPs Model with Existing Models To further verify the performance of the present invention, we compared the performance of the optimized prediction model (BiLSTM-AHPs) of the present invention with currently published models, including AHTpin, StackAHTPs and UniDL4BioPep.

[0072] The results are shown in Tables 3 and 4. The model of the present invention shows higher accuracy and generalization ability compared with existing models.

[0073]

[0074]

[0075] The above description is only a preferred embodiment of the present invention and is not intended to limit the design of this case. All equivalent changes made based on the key design features of this case shall fall within the protection scope of this case.

Claims

1. A method for constructing an antihypertensive peptide prediction model, characterized in that: Includes the following steps: Obtain the training dataset: The dataset contains positive samples of antihypertensive peptides and negative samples of non-antihypertensive peptides; Feature extraction: Extract an initial feature set from the amino acid sequence of the peptide, including amino acid composition AAC, k-spacer amino acid pair composition CKSAAGP, k-mer, AAindex, PC6, BLOSUM62, PAAC, charge number, hydrophobicity, atomic and bond composition ATC, secondary structure features, and tertiary structure features. Optimal feature combination selection: Based on the comprehensive evaluation results of feature importance by four models, namely Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Lightweight Gradient Boosting Machine (LGBM), and XGBoost, the feature combination with the strongest discriminative ability to distinguish between antihypertensive peptides and non-antihypertensive peptides is determined from the initial feature set as the optimal feature combination. The optimal feature combination consists of amino acid composition AAC, k-spacer amino acid pair composition CKSAAGP, AAindex physicochemical properties, and tertiary structure features. Predictive model training: Using the optimal feature combination as input, a prediction model combining a BiLSTM network and an aggregation layer is constructed; the aggregation layer is used to fuse the sequence features extracted by the BiLSTM network with additional static handcrafted features. The constructed prediction model is trained using the training dataset to obtain the antihypertensive peptide prediction model.

2. The method for constructing the antihypertensive peptide prediction model as described in claim 1, characterized in that: 914 positive AHPs were collected from public databases; 914 non-AHP peptide sequences were selected from the UniProt database as negative samples; the negative samples were consistent with the positive samples in terms of length and amino acid frequency; the collected 914 AHPs and 914 non-AHPs constituted a dataset containing 1828 peptides, and the dataset was randomly divided into training and test sets in a 4:1 ratio.

3. The method for constructing the antihypertensive peptide prediction model as described in claim 1, characterized in that: The optimal feature combination screening specifically includes: The initial feature set was evaluated using the RF, GBDT, LGBM, and XGBoost models respectively, and the importance score of each sub-feature under each model was obtained. Based on the accuracy (ACC) value of each model on the validation set, weights are assigned to them, and the importance scores obtained from the four models are weighted and fused to obtain the fused importance score of each sub-feature. All sub-features are sorted according to the fusion importance score, and the top N feature subsets are selected; A greedy search strategy is adopted to combine and optimize the feature subsets according to categories, and the optimal feature combination is determined by using cross-validation performance as an indicator.

4. The method for constructing the antihypertensive peptide prediction model as described in claim 3, characterized in that: Before using the four models to evaluate feature importance, the initial feature set is first standardized using Z-score standardization based on the mean and standard deviation of the training set data.

5. The method for constructing the antihypertensive peptide prediction model as described in claim 1, characterized in that: The prediction model sequentially includes an embedding layer, a BiLSTM layer, the aggregation layer, and a fully connected layer: Embedding layers are used to transform the input feature vectors into dense vectors; BiLSTM layers are used to analyze sequence data from both forward and reverse directions, capturing the contextual dependencies in the sequence data. Fully connected layers are used to process the integrated features and output predicted probabilities.

6. The method for constructing the antihypertensive peptide prediction model as described in claim 5, characterized in that: The fully connected layer includes a Dropout layer, a Linear layer, a ReLU layer, and a Sigmoid layer.

7. The method for constructing the antihypertensive peptide prediction model as described in claim 5, characterized in that: The static handcrafted features incorporated in the polymer layer include charge number and hydrophobicity.

8. A predictive model for antihypertensive peptides, characterized in that: It is constructed using the method described in any one of claims 1 to 7.

9. The application of the antihypertensive peptide prediction model as described in claim 8, characterized in that, The steps for predicting and screening antihypertensive peptides include: Obtain the amino acid sequence of the peptide to be predicted; The feature vector of the optimal feature combination is extracted from the amino acid sequence. The optimal feature combination is composed of amino acid composition AAC, k-spacer amino acid pair composition CKSAAGP, AAindex physicochemical property features and tertiary structure features. The feature vector is input into the prediction model to obtain the predicted probability that the peptide is an antihypertensive peptide.