Lycium ruthenicum murr. polysaccharide immune activity prediction method and system

By using a multi-dimensional data fusion prediction model, the predicted value and confidence interval of the immune activity of black goji berry polysaccharide are directly output, which solves the problems of long testing cycle, high cost and low throughput in the existing technology and realizes efficient immune activity assessment.

CN122245445APending Publication Date: 2026-06-19LANZHOU UNIVERSITY OF TECHNOLOGY +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
LANZHOU UNIVERSITY OF TECHNOLOGY
Filing Date
2026-05-20
Publication Date
2026-06-19

Smart Images

  • Figure CN122245445A_ABST
    Figure CN122245445A_ABST
Patent Text Reader

Abstract

This application relates to the field of computational biology technology and discloses a method and system for predicting the immune activity of black goji berry polysaccharides. The method includes: acquiring sample data of black goji berry polysaccharides to be tested; performing data preprocessing on the sample data to obtain first basic multimodal features; if it is determined that the sample data contains first nuclear magnetic resonance (NMR) spectroscopy data, analyzing the first NMR spectroscopy data to obtain glycosidic bond type, anodic carbon configuration, and linkage site; constructing an atomic-level polysaccharide molecular map and generating microstructural features based on the glycosidic bond type, anodic carbon configuration, and linkage site; fusing the first basic multimodal features and microstructural features to obtain a first multimodal fusion feature; inputting the first multimodal fusion feature into a trained prediction model to obtain the first predicted immune activity value output by the prediction model and a confidence interval characterizing the reliability of the prediction result.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computational biology technology, and relates to, but is not limited to, a method and system for predicting the immune activity of black goji berry polysaccharides. Background Technology

[0002] Black goji berries (Lycium ruthenicum Murr.) are a unique medicinal and edible resource from the Qinghai-Tibet Plateau. Their active polysaccharides have been proven to have significant immunomodulatory functions, and they have broad prospects in the development of functional foods, special medical purpose formula foods, and immune adjuvants.

[0003] However, the immunomodulatory activity of polysaccharides is influenced by multiple coupled factors: microstructure, biosynthetic background, extraction process parameters, and associated components, resulting in a highly nonlinear and difficult-to-analyze structure-activity relationship. Current polysaccharide immunomodulatory activity assessment systems for black goji berries heavily rely on in vitro cell experiments, which suffer from long testing cycles, high testing costs, and low throughput, making it difficult to support high-throughput screening and process optimization requirements. Summary of the Invention

[0004] In view of this, embodiments of this application provide a method and system for predicting the immune activity of polysaccharides from black goji berries, which at least solves the problems of long testing cycle, high testing cost, and low detection throughput in the evaluation of the immune activity of polysaccharides from black goji berries.

[0005] The technical solution of this application embodiment is implemented as follows: In a first aspect, embodiments of this application provide a method for predicting the immune activity of black goji berry polysaccharides, the method comprising: Data of black goji berry polysaccharides to be tested are obtained, including monosaccharide composition, molecular weight parameters, basic physicochemical data, spectral data, transcriptome data, metabolome data, and process parameters. Data preprocessing is performed on the black goji berry polysaccharide sample data to be tested to obtain the first basic multimodal features, which include macroscopic structural features, transcriptome features, metabolome features, and process parameter features. If the data of the black goji berry polysaccharide sample to be tested contains first nuclear magnetic resonance spectroscopy data, the first nuclear magnetic resonance spectroscopy data is analyzed to obtain the glycosidic bond type, anodic carbon configuration and linkage site; based on the glycosidic bond type, the anodic carbon configuration and the linkage site, an atomic-level polysaccharide molecular map is constructed and microstructural features are generated. The first basic multimodal feature and the microstructure feature are fused to obtain the first multimodal fusion feature; the first multimodal fusion feature is input into the trained prediction model to obtain the first immune activity prediction value and the confidence interval characterizing the reliability of the prediction result output by the prediction model.

[0006] Secondly, embodiments of this application provide a black goji berry polysaccharide immune activity prediction system, the system comprising: The data acquisition and processing module is used to acquire sample data of black goji berry polysaccharides to be tested. The sample data of black goji berry polysaccharides to be tested includes monosaccharide composition, molecular weight parameters, basic physicochemical data, spectral data, transcriptome data, metabolome data, and process parameters. The sample data of black goji berry polysaccharides to be tested is preprocessed to obtain the first basic multimodal features. The first basic multimodal features include macroscopic structural features, transcriptome features, metabolome features, and process parameter features. The analysis module is used to analyze the first nuclear magnetic resonance spectrum data when it is determined that the black goji berry polysaccharide sample data contains the first nuclear magnetic resonance spectrum data, to obtain the glycosidic bond type, anodic carbon configuration and linkage site; based on the glycosidic bond type, the anodic carbon configuration and the linkage site, to construct an atomic-level polysaccharide molecular map and generate microstructural features; The first prediction module is used to fuse the first basic multimodal features and the microstructure features to obtain a first multimodal fusion feature; input the first multimodal fusion feature into the trained prediction model to obtain the first immune activity prediction value and the confidence interval characterizing the reliability of the prediction result output by the prediction model.

[0007] The beneficial effects of the technical solutions provided in this application include at least the following: By collecting multi-dimensional sample data including monosaccharide composition, molecular weight, physicochemical properties, spectra, transcriptomics, metabolomics, and process parameters, a first-level multimodal feature of macroscopic structure, transcriptomics, metabolomics, and process parameters is formed. When first-level nuclear magnetic resonance spectroscopy data is available, atomic-level polysaccharide molecular maps are further constructed to obtain microstructural features. After fusing the first-level multimodal feature and microstructural feature, the data is input into a trained prediction model to directly output the predicted value of immune activity and confidence interval. This eliminates the need for time-consuming and labor-intensive in vitro cell experiments, shortens the immune activity assessment cycle, reduces detection costs, and significantly increases detection throughput. It can efficiently support high-throughput screening and process optimization of black goji berry polysaccharides. Attached Figure Description

[0008] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort, wherein: Figure 1 A flowchart illustrating a method for predicting the immune activity of black goji berry polysaccharides provided in an embodiment of this application; Figure 2 This is a schematic diagram of the composition and structure of a black goji berry polysaccharide immune activity prediction system provided in an embodiment of this application. Detailed Implementation

[0009] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. The following embodiments are used to illustrate this application, but are not intended to limit the scope of this application. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0010] In the following description, references are made to “some embodiments,” which describe a subset of all possible embodiments. However, it is understood that “some embodiments” may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict.

[0011] It should be noted that the terms "first, second, and third" used in the embodiments of this application are merely to distinguish similar objects and do not represent a specific ordering of objects. It is understood that "first, second, and third" can be interchanged in a specific order or sequence where permitted, so that the embodiments of this application described herein can be implemented in an order other than that illustrated or described herein.

[0012] It will be understood by those skilled in the art that, unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiments of this application pertain. It should also be understood that terms such as those defined in general dictionaries should be understood to have a meaning consistent with their meaning in the context of the prior art, and should not be interpreted in an idealized or overly formal sense unless specifically defined as herein.

[0013] This application provides a method for predicting the immune activity of black goji berry polysaccharides, applied to electronic devices. These electronic devices include, but are not limited to, mobile phones, laptops, tablets, handheld internet devices, multimedia devices, streaming media devices, mobile internet devices, wearable devices, or other types of electronic devices. The function implemented by this method can be achieved by a processor in the electronic device calling program code. The program code can be stored in a computer storage medium; therefore, the electronic device includes at least a processor and a storage medium. The processor can be used to process the prediction of the immune activity of black goji berry polysaccharides, and the storage medium can be used to store the data required and generated during the prediction process.

[0014] Figure 1 A flowchart illustrating a method for predicting the immune activity of black goji berry polysaccharides provided in this application embodiment is shown below. Figure 1 As shown, the method includes at least the following steps: Step S110: Obtain sample data of black goji berry polysaccharides to be tested. The sample data of black goji berry polysaccharides to be tested includes monosaccharide composition, molecular weight parameters, basic physicochemical data, spectral data, transcriptome data, metabolome data, and process parameters. Perform data preprocessing on the sample data of black goji berry polysaccharides to be tested to obtain the first basic multimodal features. The first basic multimodal features include macroscopic structural features, transcriptome features, metabolome features, and process parameter features. The data for the black goji berry polysaccharide samples to be tested refers to a multi-source isomeric dataset of a single black goji berry polysaccharide sample used for immune activity prediction. The monosaccharide composition was determined using high-performance liquid chromatography (HPLC) to determine the molar ratio of eight monosaccharides. The molecular weight parameters can be weight-average molecular weight, number-average molecular weight, and molecular weight distribution index determined using gel permeation chromatography (GPC). The transcriptome data can be obtained by determining the FPKM values ​​of 50 key synthetic genes using RNA-seq. These key synthetic genes were screened based on the KEGG ko00500 and ko00520 pathways and include UGPase, CSLA, SUS, and INV. The metabolome data was obtained using liquid chromatography-mass spectrometry (LC-MS / MS). The results were obtained by Chromatography-Mass Spectrometry (LC-MS), containing a total of 20 associated components, including 6 anthocyanins, 8 flavonoids, 4 phenolic acids, and 2 others. The process parameters included ultrasonic power of 100 to 500 W, time of 10 to 60 min, enzymatic hydrolysis pH of 4.0 to 7.0, and an extraction method represented by 4D coding.

[0015] Simultaneously, basic physicochemical properties and full-spectral scanning were performed on the black goji berry polysaccharide sample data to construct basic physicochemical and spectral data. The basic physicochemical data included uronic acid content, sulfate content, and protein residue. The spectral data included full-band ultraviolet-visible spectroscopy (UV-Vis) and Fourier transform infrared spectroscopy (FT-IR) data. Specifically, the uronic acid content was determined using the m-hydroxybiphenyl method; the sulfate content was determined using the barium chloride-gelatin turbidimetric method; and the protein residue was determined using the Coomassie brilliant blue method. After freeze-drying, the black goji berry polysaccharide sample data underwent full-band UV-Vis and FT-IR spectroscopy scanning to extract peak area and peak intensity. The extraction process was implemented using batch scripts, eliminating the need for additional on-machine testing.

[0016] Each type of data can be preprocessed using standardization. Monosaccharide composition is normalized using Min-Max, mapping the values ​​to the interval [0,1]. Molecular weight parameters are normalized using Z-score. Transcriptome and metabolome data are first log-2 transformed and then normalized using Z-score. Process parameters are normalized using linear normalization, mapping the values ​​to the interval [0,1]. Missing values ​​in the data are imputed using the K-Nearest Neighbors (KNN) method, with K set to 5.

[0017] Step S120: If it is determined that the sample data of the black goji berry polysaccharide to be tested contains first nuclear magnetic resonance spectroscopy data, the first nuclear magnetic resonance spectroscopy data is analyzed to obtain the glycosidic bond type, anodic carbon configuration and linkage site; based on the glycosidic bond type, the anodic carbon configuration and the linkage site, an atomic-level polysaccharide molecular map is constructed and microstructural features are generated. Specifically, the first nuclear magnetic resonance (NMR) spectroscopy data is analyzed to obtain structural information including glycosidic bond type, anomeric carbon configuration, and linkage site, which is used to construct an atomic-level polysaccharide molecular map and generate microstructural features.

[0018] When the data of the black goji berry polysaccharide sample to be tested does not include the first nuclear magnetic resonance spectroscopy data, all NMR-related processing procedures are skipped. The monosaccharide composition, molecular weight parameters, transcriptome data, metabolome data, and process parameters are preprocessed according to the same rules. The 32-dimensional macroscopic structural features can be directly calculated and generated from the 8-dimensional monosaccharide molar ratio, 3-dimensional molecular weight parameters, 1-dimensional uronic acid content, 1-dimensional sulfate content, 1-dimensional protein residue, 3-dimensional ultraviolet characteristic peak area, and 15-dimensional infrared characteristic peak intensity.

[0019] Step S130: The first basic multimodal feature and the microstructure feature are fused to obtain the first multimodal fusion feature; the first multimodal fusion feature is input into the trained prediction model to obtain the first immune activity prediction value and the confidence interval characterizing the reliability of the prediction result output by the prediction model.

[0020] Feature fusion can be performed using the following formula (1): Formula (1); in, This is the first multimodal fusion feature. As a macroscopic structural feature, For microscopic structural features, mask is a binary control mask. When the sample data of the black goji berry polysaccharide to be tested includes the first nuclear magnetic resonance spectrum data, the mask value is 1. For macroscopic structural features... and microstructure features The data will be spliced. When the sample data of the black goji berry polysaccharide to be tested does not include the first nuclear magnetic resonance spectroscopy data, the mask value is 0, and the microstructural features are masked. At this time, the first multimodal fusion feature is calculated only based on the first basic multimodal features. FC is used to map the spliced ​​features to a specified dimension. , Represent the space of real numbers; The parameter is a learnable parameter with an initial value of 0.5. It is constrained to the range of [0,1] by sigmoid and can be fixed to 0.6 after optimization on the validation set to simplify inference. Transcriptome features Attention weights for transcriptome features This is a metabolomic characteristic. Attention weights for metabolomics features For process parameter characteristics, Attention weights are assigned to process parameter features. Transcriptomic features, metabolomic features, and process parameter features are all mapped to 64 dimensions through independent fully connected layers; residual terms... To ensure that the core structural information is not diluted, the mask mechanism guarantees that when the first nuclear magnetic resonance spectral data is not included, the system relies only on the first basic multimodal features.

[0021] Attention weights for transcriptomic features, metabolomic features, and process parameter features can be generated through the following steps: As shown in formula (2), with Generate a query vector q for the baseline: Formula (2); in, For learnable weight matrix, key / value matrix Correspondingly, the attention weights for transcriptomic features, metabolomic features, and process parameter features can be calculated using formula (3): Formula (3); in, This represents the attention weights obtained by substituting the key matrices of different features into the formula, where T is the transpose of q and K represents the key matrix.

[0022] The prediction head of the trained prediction model is constructed using a two-layer fully connected network with a specific structure of 64→32→1. The activation function is ReLU, and the dropout probability is set to 0.3 to output a standardized first immune activity prediction value.

[0023] The confidence interval is a reliable range of values ​​for the predicted true value of immune activity, used to quantify the reliability and uncertainty of the prediction results. The confidence interval can be a 95% confidence interval, indicating that there is a 95% confidence level that the true value of the immune activity of black goji berry polysaccharide falls within the range centered on the first predicted immune activity value.

[0024] By collecting multi-dimensional sample data including monosaccharide composition, molecular weight, physicochemical properties, spectra, transcriptomics, metabolomics, and process parameters, a first-level multimodal feature of macroscopic structure, transcriptomics, metabolomics, and process parameters is formed. When first-level nuclear magnetic resonance spectroscopy data is available, atomic-level polysaccharide molecular maps are further constructed to obtain microstructural features. After fusing the first-level multimodal feature and microstructural feature, the data is input into a trained prediction model to directly output the predicted value of immune activity and confidence interval. This eliminates the need for time-consuming and labor-intensive in vitro cell experiments, shortens the immune activity assessment cycle, reduces detection costs, and significantly increases detection throughput. It can efficiently support high-throughput screening and process optimization of black goji berry polysaccharides.

[0025] In some embodiments, step S120, "analyzing the first nuclear magnetic resonance spectroscopy data to obtain the glycosidic bond type, anodic carbon configuration, and linkage site; constructing an atomic-level polysaccharide molecular map and generating microstructural features based on the glycosidic bond type, the anodic carbon configuration, and the linkage site," includes: Step S1201: The first nuclear magnetic resonance spectrum data is standardized and preprocessed, and the standardized and preprocessed first nuclear magnetic resonance spectrum data is analyzed by a convolutional neural network to obtain the glycosidic bond type, anodic carbon configuration and linkage site. A set of monosaccharide residue nodes is constructed based on the monosaccharide composition and the linkage site. Specifically, the first NMR spectral data can be subjected to baseline correction, phase correction, chemical shift calibration, intensity normalization, and uniform sampling in sequence as a standardization preprocessing step. The first NMR spectral data after standardization preprocessing is then input into a lightweight convolutional neural network with a MobileNetV2 structure. This convolutional neural network is fine-tuned based on 500 NMR structures derived from CarbBank and automatically outputs the probability distribution of glycosidic bond type, anomeric carbon configuration, and connection site.

[0026] Based on the connection topology of the connection sites parsed by the convolutional neural network, and combined with the types and molar ratios of monosaccharides in the monosaccharide composition determined by HPLC, a specific monosaccharide type is assigned to the node through a chemical shift-assisted determination and molar ratio constraint allocation algorithm, thereby generating a set of monosaccharide residue nodes.

[0027] Step S1202: Based on each monosaccharide residue node in the set of monosaccharide residue nodes, determine the nodes of the initial polysaccharide molecule graph; based on the glycosidic bond type and the connection site, determine the edges of the initial polysaccharide molecule graph; The connection sites determine which two nodes are connected, and the glycosidic bond type determines the connection relationship between nodes. Using monosaccharide residues as nodes and glycosidic bonds as edges, an initial polysaccharide molecule graph G=(V,E) can be constructed, where V represents the set of nodes and E represents the set of edges.

[0028] Step S1203: Based on the glycosidic bond type, the anomeric carbon configuration, and the linkage site, define node features and edge features according to preset rules; Among them, node features This includes 8-dimensional unique thermal encoding corresponding to monosaccharide types, 1-dimensional binary encoding corresponding to anomeric carbon configurations, and 4-dimensional binary vectors corresponding to C2, C3, C4, and C6 substitution states. Edge features This includes the 5-dimensional one-heat encoding corresponding to the connection type, the 1-dimensional bond length assigned according to the standard value of the glycosidic bond equilibrium bond length based on the GLYCAM06j force field, and the dihedral angles that characterize the stereoconfiguration of the connection site. With each of the ψ features being 1-dimensional, the GLYCAM06j force field is a universal parameter set for simulating polysaccharide structures, used to provide standard geometric parameters for glycosidic bonds, avoiding noise and inconsistencies in experimental measurements.

[0029] The above encoding method transforms microscopic structural features that cannot be quantified by traditional methods such as glycosidic bond stereochemistry and branched topology into topological data that can be computed by graph neural networks.

[0030] Step S1204: Based on the glycosidic bond type and the coupling constant read from the first nuclear magnetic resonance spectroscopy data, a dual-path assignment strategy is used to calculate the dihedral conformation information; The dual-path assignment strategy includes primary path assignment and backup path assignment. In the primary path assignment, for glycosidic bonds with obtainable coupling constants, the dihedral angle is inverted using the Haasnoot-Altona equation as shown in formula (4) based on the coupling constant and the type of glycosidic bond. : Formula (4); Among them, the parameters of the Haasnoot-Altona equation , , , Used to fit coupling constants under different glycosidic bond types. and dihedral The correspondence between them is as follows: for example, the β-(1→3) bond is a glycosidic bond formed by the connection between the C1 position of a monosaccharide residue and the C3 position of an adjacent monosaccharide residue, which usually constitutes the backbone of the polysaccharide; the β-(1→6) bond is a glycosidic bond formed by the connection between the C1 position of a monosaccharide residue and the C6 position of an adjacent monosaccharide residue, which usually constitutes the branch point of the polysaccharide; the parameters corresponding to the β-(1→3) bond are... , , , The parameter corresponding to the β-(1→6) bond is .

[0031] The corresponding glycosidic bond type can be used to determine the corresponding glycosidic bond type. , , , The dihedral angle is inverted using the coupling constant read from the first nuclear magnetic resonance spectral data. .

[0032] The universality of this method was experimentally verified by the following process: obtaining the polysaccharide from 14 black goji berry samples through inversion using this equation. The average deviation between the angle and the measured value by X-ray crystallography (XRD) is only 0.3°. The calculated angle value is -59.8°. The measured angle was -60.1°, and the correlation coefficient r = 0.92 (p < 0.001), indicating that the general parameters do not need to be specifically optimized for black goji berry polysaccharides and can be applied to β-(1→3) bond structures (such as β-(1→3)-D-glucan); the dihedral angle ψ was calculated by molecular dynamics simulation.

[0033] In the assignment of alternative paths, when the coupling constant is missing, or when a linker site is marked as uncertain due to low confidence (e.g., C6 linker site confidence < 0.7), an alternative path is used. In the constructed atomic-level polysaccharide molecular diagram, the glycosidic bond is marked with a dashed line, and its corresponding dihedral angle is indicated by the dashed line. ψ is taken from the GLYCAM06j force field standard library, glycosidic bond The dihedral angle takes the following values: , glycosidic bond The dihedral angle takes the following values: , ; Glycosidic bond The dihedral angle takes the following values: , wait.

[0034] Step S1205: Based on the monosaccharide residue nodes and the glycosidic bond type, construct the adjacency matrix corresponding to the initial polysaccharide molecule graph; The adjacency matrix can be constructed based on the connection relationships between monosaccharide residue nodes.

[0035] Step S1206: Integrate the monosaccharide residue nodes, the edges of the initial polysaccharide molecule graph, the node features, the edge features, the dihedral conformation information, and the adjacency matrix to obtain an atomic-level polysaccharide molecule graph; In summary, by integrating the monosaccharide residue nodes, edges, node features, edge features, dihedral conformation information, and adjacency matrix constructed above, a complete atomic-level polysaccharide molecular graph is formed.

[0036] Step S1207: Based on the atomic-level polysaccharide molecular diagram, generate microstructural features.

[0037] In the above embodiments, the first nuclear magnetic resonance spectrum data is automatically analyzed by a convolutional neural network to accurately obtain the glycosidic bond type, anodic carbon configuration and connection site. Based on this, an atomic-level polysaccharide molecular map containing nodes, edges, node features, edge features, dihedral conformation information and adjacency matrix is ​​constructed. This realizes the digital and refined characterization of the microscopic three-dimensional structure of polysaccharides, solves the problem that traditional methods are difficult to quantify the spatial conformation of polysaccharides, provides a reliable data foundation for the subsequent accurate extraction of microstructural features, and improves the completeness and accuracy of structural analysis.

[0038] In some embodiments, step S1207, "generating microstructural features based on the atomic-level polysaccharide molecular map," includes: Step S12071: Input the atomic-level polysaccharide molecular map into a two-layer graph attention network. The spatial topology and microscopic chemical features of the atomic-level polysaccharide molecular map are extracted and fused layer by layer through the two-layer graph attention network to obtain the deep features of each monosaccharide residue node. Step S12072: Perform global attention pooling on the deep features of each monosaccharide residue node to obtain microstructural features.

[0039] The microstructure feature encoding is implemented using a two-layer Graph Attention Network (GAT). The first layer contains four attention heads, each outputting a 16-dimensional feature, which is concatenated to obtain a 64-dimensional node representation. The second layer also uses four attention heads, taking the 64-dimensional feature output from the first layer as input. After linear projection, each head outputs a 32-dimensional feature, which is concatenated to obtain a 128-dimensional node feature. The attention coefficients are calculated using a LeakyReLU activation function with a negative slope of 0.2. Global attention pooling is implemented based on the attention weights of the second layer, normalized using Softmax, and finally outputs a 128-dimensional feature vector (i.e., microstructure features) to fully encode the microstructure and chemical characteristics of the polysaccharide.

[0040] In the above embodiments, a two-layer graph attention network is used to extract features from atomic-level polysaccharide molecular graphs, and global attention pooling is combined to obtain microstructural features. This can fully explore the spatial topological structure and microscopic chemical correlation information of polysaccharide molecules, effectively capture deep structural features that play a key role in immune activity, avoid the loss of structural information, significantly improve the expression ability and discriminability of microstructural features, and thus improve the accuracy and reliability of immune activity prediction.

[0041] In some embodiments, the method further includes: Step S141: Based on the first basic multimodal features and the first predicted immune activity value, calculate the first contribution of each feature in the first basic multimodal features to the first predicted immune activity value using the Kernel SHAP algorithm. Step S142: Based on the node features and the edge features, the Kernel SHAP algorithm, the attention weights of the two-layer graph attention network, and the first immune activity prediction value are used to calculate the second contribution of each feature in the microstructure features to the first immune activity prediction value. The Kernel SHAP algorithm can be used to calculate the hierarchical contribution, which can be used to calculate the SHAP value of the first basic multimodal feature. The SHAP value represents the first contribution of each feature in the first basic multimodal features to the predicted value of the first immune activity. Simultaneously, the node and edge features of the atomic-level polysaccharide molecular graph are treated as independent feature groups, combined with the attention weights of the two-layer graph attention network. Calculate the SHAP value of microstructural features The SHAP value is the second contribution of each feature in the microstructure to the predicted value of the first immune activity.

[0042] Step S143: Based on the first contribution and the second contribution, sort the corresponding features according to their contribution to obtain the feature order used to characterize the strength of immune activity. The feature ranking includes both basic multimodal features and microstructural features. Ranking them according to their contribution can clarify the degree of influence of each feature on the level of immune activity.

[0043] Step S144: Filter out several key features with the highest contribution, and generate corresponding optimization suggestions based on the contribution level and feature source category.

[0044] The feature source categories include macroscopic structural feature categories and microscopic structural feature categories. Based on the features of different feature source categories, process optimization suggestions or structural optimization suggestions are generated respectively. For example, for macroscopic process parameters with high contribution, operation suggestions for maintaining or adjusting process parameters are given, such as "the proportion of the corresponding component reaches the peak when the ultrasonic time is 30 min, and it is recommended to maintain it"; for microscopic structural fragments with significant contribution, structural optimization suggestions for retaining or modifying the corresponding glycosidic bonds and monosaccharide residues are given.

[0045] In the above embodiments, based on the Kernel SHAP algorithm and graph attention network weights, the contributions of the first basic multimodal features and microstructural features to the predicted immune activity values ​​are calculated and ranked. After screening key features, targeted optimization suggestions are generated to make the prediction results interpretable and to clarify the core factors affecting the immune activity of polysaccharides. This not only reveals the structure-activity relationship, but also provides direct and practical guidance for the optimization of extraction process, structural modification and product development of black goji berry polysaccharides, enhancing the practicality and guidance of the method.

[0046] In some embodiments, the method further includes: Step S151: Map the second contribution of each node feature and edge feature to the corresponding position of the atomic-level polysaccharide molecular graph to generate a molecular graph heatmap. Different colors in the molecular graph heatmap are used to characterize the positive or negative contribution of each feature to immune activity. Based on the node index and edge index of node features and edge features, the contribution of each feature is accurately mapped to the corresponding position in the atomic-level polysaccharide molecular graph. For example, the contribution of a certain glycosidic bond type, "+0.31", is mapped to the corresponding position in the atomic-level polysaccharide molecular graph. The top three key features with the highest contribution can be screened and identified. Molecular graph heatmaps are used on the atomic-level polysaccharide molecular graph to highlight the nodes and edges of the key features. For example, red indicates a positive contribution to immune activity, and blue indicates a negative contribution to immune activity.

[0047] Step S152: Determine the confidence level of each connection site. For connection sites with a confidence level lower than a preset threshold, mark the first prompt information. The first prompt information is used to indicate that the confidence level of the connection site is low and it is recommended to supplement the nuclear magnetic resonance spectroscopy verification.

[0048] For glycosidic bond linkage sites with low confidence, they are marked with dashed borders in the atomic-level polysaccharide molecular diagram, and the first prompt message may be "It is recommended to supplement with 2D NMR verification".

[0049] In the above embodiments, the contribution of each structural feature is mapped to an atomic-level polysaccharide molecule map to generate a heatmap, which intuitively displays the positive and negative contributions of the features and provides prompts and annotations for low-confidence connection sites. This visualizes the impact mechanism of microstructure on immune activity, making it easier for researchers to intuitively understand key structural sites. At the same time, it provides early warning prompts for uncertain structural information, effectively reducing prediction bias caused by structural analysis errors and improving the reliability and safety of the results.

[0050] In some embodiments, the method further includes: Step S153: If it is determined that the data of the black goji berry polysaccharide sample to be tested does not contain the first nuclear magnetic resonance spectrum data, a second prompt message is marked. The second prompt message is used to indicate that there is no first nuclear magnetic resonance spectrum data and the glycosidic bond type and branch topology have not been resolved.

[0051] For the black goji berry polysaccharide sample data with mask=0 (no first NMR spectral data), only the first contribution is calculated. The second prompt message can be "No NMR structural data, glycosidic bond type and branch topology not analyzed. The current conclusion is based on macroscopic features, and it is recommended to supplement NMR verification to improve prediction accuracy."

[0052] The final output can be a standardized PDF or HTML report containing the first predicted immune activity value, confidence interval, molecular map heatmap (generated when first nuclear magnetic resonance spectroscopy data is available), structural uncertainty indication, process optimization suggestions, and relevant literature references.

[0053] In the above embodiments, when there is no NMR spectroscopy data, the corresponding prompt information is automatically marked to clearly indicate that the glycosidic bond and branch topology information is currently unresolved, ensuring the transparency and rigor of the prediction results, avoiding misjudgment due to missing data, and adapting to actual application scenarios without NMR data, while taking into account the universality and standardization of the method.

[0054] In some embodiments, the method further includes: Step S1011: Obtain multiple sets of training black goji berry polysaccharide sample data. Each set of training black goji berry polysaccharide sample data includes monosaccharide composition, molecular weight parameters, basic physicochemical data, spectral data, transcriptome data, metabolome data, process parameters, and corresponding true values ​​of immune activity. The first set of training black goji berry polysaccharide sample data also includes second nuclear magnetic resonance spectroscopy data. Based on the completeness of the sample data, all training black goji berry polysaccharide samples were divided into two categories, with the specific grouping and uses as follows: One category consists of 200 training black goji berry polysaccharide samples with additional 1D / 2D nuclear magnetic resonance (NMR) spectroscopy data, used for polysaccharide microstructure analysis, of which 130 samples were used for model training, 50 for model validation, and 20 for independent testing; the other category consists of 100 training black goji berry polysaccharide samples without NMR spectroscopy data, of which 80 were used for supplementary training to verify the effectiveness of the mask mechanism, and 20 were used for independent testing to verify the model's generalization ability in scenarios without NMR data.

[0055] In summary, all samples were ultimately divided into: a training set of 210 samples (130 samples containing NMR and 80 samples without NMR), a validation set of 50 samples (containing NMR samples), and a test set of 40 samples (20 samples containing NMR and 20 samples without NMR).

[0056] Step S1012: Based on the training black goji berry polysaccharide sample data of each group, construct the corresponding second multimodal fusion feature; It should be noted that, similar to the process of obtaining the first multimodal fusion feature in steps S110 to S130, the training black goji berry polysaccharide sample data is first preprocessed to obtain the second basic multimodal feature. If the test black goji berry polysaccharide sample data is determined to contain second nuclear magnetic resonance (NMR) spectroscopy data, the second NMR spectroscopy data is analyzed to obtain the glycosidic bond type, anodic carbon configuration, and linkage site. Based on the glycosidic bond type, anodic carbon configuration, and linkage site, an atomic-level polysaccharide molecular map is constructed and microstructural features are generated. The second basic multimodal feature and the second NMR spectroscopy data are then fused to obtain the second multimodal fusion feature.

[0057] The preprocessing procedure is performed differently depending on whether the training black goji berry polysaccharide sample data contains the second nuclear magnetic resonance spectroscopy data.

[0058] For 200 training samples of black goji berry polysaccharides containing the second NMR spectroscopy data, the monosaccharide composition was normalized to the [0,1] interval using Min-Max, and the molecular weight parameter was normalized using Z-score. The second NMR spectroscopy data underwent baseline correction, phase correction, chemical shift calibration, intensity normalization, and uniform sampling normalization preprocessing in sequence. The normalized preprocessed second NMR spectroscopy data was input into a lightweight convolutional neural network with a MobileNetV2 structure. This convolutional neural network was fine-tuned based on 500 NMR structures derived from CarbBank, and automatically output the probability distribution of glycosidic bond type, anomeric carbon configuration, and connection site. The highest probability item was selected through argmax operation to generate deterministic hard labels (confidence threshold ≥0.7), and feature items with confidence <0.7 were marked as "uncertain". The backup values ​​of the GLYCAM06j force field library were used for assignment. Transcriptome and metabolome data were log2 transformed and then Z-score normalized. Process parameters were linearly normalized to [0,1]. Missing values ​​in the data were imputed using the K-nearest neighbor method, with K set to 5.

[0059] For 100 training samples of black goji berry polysaccharides without second NMR spectroscopy data, all NMR-related processing procedures were skipped. The monosaccharide composition, molecular weight parameters, transcriptome data, metabolome data, and process parameters were processed using the same preprocessing rules as described above. The 32-dimensional macroscopic structural features s were directly calculated from the monosaccharide molar ratio (8 dimensions), molecular weight parameters (3 dimensions), uronic acid content (1 dimension), sulfate content (1 dimension), protein residue (1 dimension), UV characteristic peak area (3 dimensions), and infrared characteristic peak intensity (15 dimensions).

[0060] All 300 samples shared a second basic multimodal feature output, including: s (32-dimensional macroscopic structural features), t (50-dimensional transcriptome data), m (20-dimensional metabolome data), p (4-dimensional process parameters), and y (1-dimensional activity tag, i.e., the true value of immune activity). 200 samples with second NMR spectroscopy data additionally output NMR resolution hard tags and monosaccharide residue node sets for subsequent molecular map construction. 100 samples without second NMR spectroscopy data did not have any NMR-related output, but the s, t, m, p, and y features remained intact.

[0061] To ensure that the input dimensions of the 300 samples are consistent, microstructural features g and corresponding masks are generated for 200 samples containing second nuclear magnetic resonance spectroscopy data; for 100 samples without second nuclear magnetic resonance spectroscopy data, a 128-dimensional zero vector is used as g, and a corresponding mask is set. This mask variable is synchronously passed to the subsequent calculation module.

[0062] Step S1013: Input each of the second multimodal fusion features and the corresponding true value of immune activity into the initial model, and obtain the corresponding predicted value of the second immune activity output by the initial model; Step S1014: Based on the loss function, calculate the error between the second predicted immune activity value and the true immune activity value; iteratively optimize the model parameters of the initial model through the backpropagation algorithm to minimize the loss function; and perform cross-validation. The loss function can be calculated using formula (5): Formula (5); in, For loss function, This is the second predicted value for immune activity. This represents the true value of the immune activity. The set of all model parameters. This is an L2 regularization term.

[0063] Step S1015: If the loss function is less than a preset threshold and the cross-validation accuracy reaches a preset standard, training is terminated, and the trained prediction model is obtained.

[0064] In the above embodiments, a training set is constructed by collecting multiple sets of samples with different data completeness. The initial model is iteratively optimized based on multimodal fusion features and real immune activity values. Cross-validation is combined to ensure model performance, thus constructing a black goji berry polysaccharide immune activity prediction model with strong generalization ability and high prediction accuracy. It can adapt to the prediction needs of multiple samples and multiple scenarios, and provides stable and reliable model support for achieving rapid, low-cost, and high-throughput immune activity prediction.

[0065] In some embodiments, to address the limitation of small sample size in this application, a transfer learning strategy can be employed. First, based on the CarbBank database (containing 2847 experimentally resolved polysaccharide structures) and the GlyTouCan database (containing over 15000 theoretical glycan structures), 3500 representative glycan structures are obtained through structural similarity clustering. Molecular dynamics simulations are performed using the GLYCAM06j force field to extract low-energy conformations. Based on the Karplus equation and the PPM chemical shift prediction algorithm, ¹H and ¹³C chemical shifts are calculated, generating 8500 theoretical NMR spectra. Further, data augmentation techniques (such as adding Gaussian noise with a signal-to-noise ratio of 20 dB, introducing ±5% baseline drift, and applying ±0.02 ppm chemical shift perturbations) are used to expand the dataset to 10000 samples, thus constructing a general polysaccharide library.

[0066] The graph attention network (GAT) encoder and modality alignment layer were pre-trained using the aforementioned general polysaccharide library; subsequently, the overall model was fine-tuned on 200 specific experimental data points of black goji berries. The training process employed the AdamW optimizer, combined with a cosine annealing learning rate scheduling strategy and an early stopping mechanism (patience=15), while the SMOTE oversampling method (k=5) was used to balance the distribution of immune activity of the samples.

[0067] During the model fine-tuning phase, a hierarchical learning rate strategy was adopted: the parameters of the first layer of GAT were frozen, and its learning rate was set to 0 to preserve the model's ability to encode the general sugar chain topology; the learning rate of the second layer of GAT was set to 1×10. -5 The attention weights were fine-tuned to adapt to the specific connectivity patterns of black goji berry polysaccharides; the learning rate of the modality alignment layer and the prediction head was set to 1×10. -4 This optimizes the fusion and prediction of multimodal features. The model undergoes a total of 200 training iterations, and the early stopping patience value can be set to 15.

[0068] The embodiments of this application were systematically validated on 300 samples of black goji berry polysaccharides, and the results were significantly better than those of related technologies, as shown in Table 1: Table 1 Evaluation Indicators ; As shown in Table 1, early weak activity signals are defined as samples with a comprehensive score of 0.3–0.5; small sample subset testing uses an average of 5 random partitions. The time consumed per sample includes the entire process of data preprocessing, model inference, and visualization generation; small sample subset testing uses an average of 5 random partitions.

[0069] The results of the module ablation experiment are shown in Table 2: Table 2. Experimental Results of Module Ablation ; As shown in Table 2, when only macroscopic structural features s are used, the baseline performance R² of the model is 0.71. After introducing a traditional GNN (without explicit edge features), R² increases to 0.76, indicating that the performance improvement brought by graph structure alone is limited. After introducing a GAT module with dihedral edge features, R² increases to 0.84, verifying the key role of microscopic structural representation. When using s+t+m+p without attention splicing fusion, R² is 0.81, indicating that the simple multimodal fusion effect is generally poor. After adding an equal attention mechanism, R² increases to 0.85. The complete scheme of the embodiment of this application, which uses structure-dominated attention + residual correction, achieves R² of 0.89. The complete scheme without pre-training has R² of 0.82, indicating that the transfer learning strategy can improve performance by 7%.

[0070] This application's embodiments overcome the bottleneck of microstructure representation in traditional methods by introducing a GAT module containing dihedral corner features. Combined with cross-modal attention and transfer learning strategies, the model achieves an R² of 0.89 on the test set. Based on this, SHAP analysis revealed that key features such as β-(1→3) connections significantly contributed (SHAP mean +0.29), improving the recall rate of early weak activity signals to 88.5% and reducing the false negative rate from 41.7% to 11.5%. Dynamic weight allocation across modal attention (mean weight of microstructural features was 0.52, and mean weight of transcriptome features was 0.38) resulted in an 18.7% improvement in R² after multimodal fusion compared to using only structural data. 92% of experts believed that structural highlighting maps and causal chains could directly guide structural modifications (e.g., reducing the proportion of arabinose can improve activity). The transfer learning strategy achieved an R² of 0.83 on a small dataset of 100 cases, a 43.1% improvement over SVR, effectively reducing 70% of invalid experiments. The generated actionable suggestions were experimentally verified, shortening the process optimization cycle by an average of 40% (traditionally requiring 3 rounds of experiments, this approach achieves optimal results in 1-2 rounds).

[0071] In one embodiment, process validation (including NMR data) was performed on a highly active sample from Qinghai. The sample, HLJ-QH-2023-114, was taken from Golmud, Qinghai (altitude 2800m). It was extracted using an ultrasound-assisted cellulase method under the following conditions: ultrasound power 300W, extraction time 30min, and pH 5.0. Input data for this sample included: HPLC analysis showing a monosaccharide composition of Glc:Gal:Ara = 5.2:3.1:1.7; GPC analysis showing a molecular weight Mw = 42.3kDa and a dispersion... =1.28; RNA-seq showed that the FPKM value of the CSLA gene was 18.6; NMR spectra were analyzed by CNN, confirming the β-(1→3) main chain and β-(1→6) branch structure, with confidence levels both higher than 0.85. Based on this, the system constructed a polysaccharide molecular graph containing 28 nodes and 27 edges, with edge features including a bond length of 1.54. and dihedral angle =-60°, ψ=-120°, mask marked as 1. Cross-modal attention weights show that the structural modality accounts for 0.52 and the transcriptome modality accounts for 0.38. The model predicts the immune activity value of this sample. =0.85, with a 95% confidence interval of [0.78, 0.91]. SHAP micropath analysis showed that the contribution of β-(1→3) linkage edges to activity was +0.31, which was highlighted in red on the heatmap. Based on the analysis results, the system provided optimization suggestions: maintain sonication time for 30 minutes to ensure a high proportion of β-(1→3) glycosidic bonds. In subsequent experimental verification, the comprehensive immune activity score of this sample measured by RAW264.7 cell experiments was 0.83, with an error of only 2.4% compared to the model prediction. After optimizing the process according to the suggestions, its immune activity increased to 0.91, an increase of 9.6%, while saving screening costs of 2400 yuan.

[0072] In one embodiment, structural correction verification (including uncertainty feature processing) of an active sample from Xinjiang Uygur Autonomous Region was performed. The sample was taken from Qiemo, Xinjiang Uygur Autonomous Region (altitude 1500m), with the number HLJ-XJ-2023-089, and was extracted using a hot water extraction process at 80℃ for 2 hours. The input data for this sample included: HPLC analysis showed a monosaccharide composition of Gal:Glc:Ara = 4.8:2.9:2.3; GPC analysis showed a molecular weight of Mw = 28.7kDa; RNA-seq showed an FPKM value of 8.2 for the CSLA gene; and NMR spectrum analysis using CNN determined that the main chain structure was α-(1→4), with the confidence level of the C6 linker site being 0.65, which was lower than the preset threshold of 0.7 and was marked as "uncertain". Based on this, the system constructed a polysaccharide molecular graph containing 22 nodes and 21 edges. The default values ​​of the C6 linker edge were assigned using the GLYCAM06j force field library and marked with dashed borders in the molecular graph, with a mask of 1. The model predicts the immune activity value of this sample. =0.72, with a confidence interval of [0.65, 0.79]. SHAP analysis showed that the contribution of the α-(1→4) linker to the immune activity was -0.28, which was highlighted in blue in the heatmap. The analysis report also indicated that "the confidence level of the C6 linker site feature is low, and it is recommended to supplement with 2DNMR verification." Based on the above analysis, the system provided optimization suggestions: use enzymatic digestion to destroy the α-main chain structure and increase the proportion of β-glucan. Subsequent experimental verification showed that after optimizing the process according to the suggestions, the proportion of β-(1→3) glycosidic bonds in the sample increased by 37%, the overall immune activity score rose to 0.86, the error from the model prediction was 2.9%, and the overall activity increased by 22.9%. This also avoided the need for three ineffective process attempts and effectively reduced R&D costs.

[0073] In one embodiment, mechanism analysis and decision guidance were performed on a low-activity sample from Gansu (without NMR data). The sample was taken from Minqin, Gansu (altitude 1300m), number HLJ-GS-2023-045, extracted using the traditional water extraction and alcohol precipitation method, and belonged to the test set of 100 NMR-free samples, with a mask marked as 0. The input data for this sample included: HPLC determination of monosaccharide composition as Ara:Gal:Rha=3.5:2.8:2.1; GPC determination of molecular weight Mw=15.2kDa and dispersity. =2.1; RNA-seq showed the CSLA gene FPKM value to be 3.1. Skipping NMR-related processing, a 32-dimensional macroscopic structural feature s was directly generated; a 128-dimensional zero vector was used as the microscopic structural feature g, with the mask set to 0; the structural features were obtained by mapping s through a fully connected layer. Due to mask constraints, the contribution of the microscopic structural features was 0. After fusing the remaining modalities, the model predicted the immune activity value. =0.45, with a confidence interval of [0.38, 0.52]. Only macroscopic characteristic SHAP analysis was performed, and the results showed that the proportion and dispersion of arabinose... Both factors were the main influencing factors, showing significant negative contributions. The analysis report prominently noted: "No NMR structural data; glycosidic bond type and branch topology not analyzed. Current conclusions are based on macroscopic characteristics; supplementary NMR verification is recommended to confirm structural defects." Based on these results, the following decision was made: high arabinose side chains generally correspond to weaker immunomodulatory activity; therefore, this sample is not recommended for immunomodulatory agent development, but rather should be used for antioxidant-related functional research. Experimental verification showed that the measured immunomodulatory activity score was 0.47, with a prediction error of 4.3%. The research team adopted the suggestion to shift to anthocyanin enrichment technology and develop it into a functional beverage, avoiding ineffective R&D investment of approximately 5000 yuan. Subsequent supplementary NMR tests confirmed that the polysaccharide branches were sparse and predominantly α-linked, consistent with the results inferred from macroscopic characteristics.

[0074] The three embodiments described above cover different data completeness scenarios, verifying the applicability of the model on a full range of samples. Comprehensive verification results show that the Pearson correlation coefficient between predicted and experimental values ​​is r=0.983 (p<0.001), and the mean absolute error (MAE) is 0.032. The uncertainty labeling mechanism keeps researchers alert to low-confidence features, and after supplementary verification, the prediction accuracy further improves to 91.2%. Effective predictions were achieved for 100 NMR-free samples under the mask mechanism (R²=0.79), and the "data missing information" in the report guides scientific decision-making. The embodiments of this application demonstrate high accuracy, strong interpretability, and practical guiding value on samples with different data completeness levels, fully proving the universality, robustness, and reliability of the technical solution.

[0075] Based on the foregoing embodiments, this application provides a black goji berry polysaccharide immune activity prediction system. The system includes various modules and sub-modules, and each unit of each sub-module can be implemented by a processor in an electronic device; of course, it can also be implemented by specific logic circuits. In the implementation process, the processor can be a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP), or a field-programmable gate array (FPGA), etc.

[0076] Figure 2 This is a schematic diagram of the composition and structure of a black goji berry polysaccharide immune activity prediction system provided in an embodiment of this application, as shown below. Figure 2 As shown, the system 200 includes: The data acquisition and processing module 21 is used to acquire the sample data of black goji berry polysaccharides to be tested. The sample data of black goji berry polysaccharides to be tested includes monosaccharide composition, molecular weight parameters, basic physicochemical data, spectral data, transcriptome data, metabolome data, and process parameters. The sample data of black goji berry polysaccharides to be tested is preprocessed to obtain the first basic multimodal features. The first basic multimodal features include macroscopic structural features, transcriptome features, metabolome features, and process parameter features. The analysis module 22 is used to analyze the first nuclear magnetic resonance spectrum data when it is determined that the sample data of the black goji berry polysaccharide to be tested contains the first nuclear magnetic resonance spectrum data, to obtain the glycosidic bond type, anodic carbon configuration and linkage site; and to construct an atomic-level polysaccharide molecular map and generate microstructural features based on the glycosidic bond type, the anodic carbon configuration and the linkage site. The first prediction module 23 is used to fuse the first basic multimodal features and the microstructure features to obtain a first multimodal fusion feature; input the first multimodal fusion feature into the trained prediction model to obtain the first immune activity prediction value and the confidence interval characterizing the reliability of the prediction result output by the prediction model.

[0077] In some possible embodiments, the parsing module 22 includes: a parsing submodule, used to perform normalization preprocessing on the first nuclear magnetic resonance spectroscopy data, and to parse the normalized preprocessed first nuclear magnetic resonance spectroscopy data using a convolutional neural network to obtain the glycosidic bond type, anodic carbon configuration, and linkage site, and to construct a set of monosaccharide residue nodes based on the monosaccharide composition and the linkage site; a determining submodule, used to determine the nodes of the initial polysaccharide molecular graph based on each monosaccharide residue node in the set of monosaccharide residue nodes; and to determine the edges of the initial polysaccharide molecular graph based on the glycosidic bond type and the linkage site; and a defining submodule, used to define the edges of the initial polysaccharide molecular graph based on the glycosidic bond type, the anodic carbon configuration, and the linkage site. The system comprises the following modules: a connection site, defining node and edge features according to preset rules; a calculation submodule, used to calculate dihedral conformation information using a dual-path assignment strategy based on the glycosidic bond type and the coupling constant read from the first NMR spectroscopy data; a construction submodule, used to construct the adjacency matrix corresponding to the initial polysaccharide molecular graph based on the monosaccharide residue nodes and the glycosidic bond type; an integration submodule, used to integrate the monosaccharide residue nodes, the edges of the initial polysaccharide molecular graph, the node features, the edge features, the dihedral conformation information, and the adjacency matrix to obtain an atomic-level polysaccharide molecular graph; and a generation submodule, used to generate microstructural features based on the atomic-level polysaccharide molecular graph.

[0078] In some possible embodiments, the generation submodule includes: an extraction unit, used to input the atomic-level polysaccharide molecular map into a two-layer graph attention network, and extract and fuse the spatial topology and microscopic chemical features of the atomic-level polysaccharide molecular map layer by layer through the two-layer graph attention network to obtain the deep features of each monosaccharide residue node; and a pooling unit, used to perform global attention pooling on the deep features of each monosaccharide residue node to obtain microstructural features.

[0079] In some possible embodiments, the system further includes: a first calculation module, configured to calculate, based on the first basic multimodal features and the first immune activity prediction value, a first contribution of each feature in the first basic multimodal features to the first immune activity prediction value using the Kernel SHAP algorithm; The second calculation module is used to calculate the second contribution of each feature in the microstructure features to the first immune activity prediction value based on the node features, the edge features, the Kernel SHAP algorithm, the attention weights of the two-layer graph attention network, and the first immune activity prediction value; the sorting module is used to sort the corresponding features according to their contribution based on the first contribution and the second contribution, to obtain the feature order used to characterize the strength of immune activity; the first generation module is used to filter several key features with high contribution and generate corresponding optimization suggestions according to the contribution size and feature source category.

[0080] In some possible embodiments, the system further includes: a second generation module, used to map the second contribution of each node feature and edge feature to the corresponding position of the atomic-level polysaccharide molecular graph to generate a molecular graph heatmap, wherein different colors in the molecular graph heatmap are used to characterize the positive or negative contribution of each feature to immune activity; and a first annotation module, used to determine the confidence level of each connection site, and to annotate the connection sites with confidence levels lower than a preset threshold with first prompt information, wherein the first prompt information is used to indicate that the confidence level of the connection site is low and it is recommended to supplement it with nuclear magnetic resonance spectroscopy verification.

[0081] In some possible embodiments, the system further includes: a second annotation module, used to annotate a second prompt message when it is determined that the sample data of the black goji berry polysaccharide to be tested does not contain the first nuclear magnetic resonance spectroscopy data, the second prompt message being used to indicate that there is no first nuclear magnetic resonance spectroscopy data and that the glycosidic bond type and branch topology have not been resolved.

[0082] In some possible embodiments, the system further includes: an acquisition module for acquiring multiple sets of training black goji berry polysaccharide sample data, each set of training black goji berry polysaccharide sample data including monosaccharide composition, molecular weight parameters, basic physicochemical data, spectral data, transcriptome data, metabolome data, process parameters, and corresponding true values ​​of immune activity, wherein the first number of training black goji berry polysaccharide sample data also includes second nuclear magnetic resonance spectroscopy data; a construction module for constructing a corresponding second multimodal fusion feature based on each set of training black goji berry polysaccharide sample data; a second prediction module for inputting each second multimodal fusion feature and the corresponding true value of immune activity into an initial model, and obtaining the corresponding predicted value of second immune activity output by the initial model; an optimization module for calculating the error between the predicted value of second immune activity and the true value of immune activity based on a loss function, iteratively optimizing the model parameters of the initial model through a backpropagation algorithm to minimize the loss function, and performing cross-validation; and a training termination module for terminating training when the loss function is less than a preset threshold and the cross-validation accuracy reaches a preset standard, thereby obtaining a trained prediction model.

[0083] It should be understood that the phrase "one embodiment" or "an embodiment" throughout the specification means that a specific feature, structure, or characteristic related to the embodiment is included in at least one embodiment of this application. Therefore, "in one embodiment" or "in an embodiment" appearing throughout the specification does not necessarily refer to the same embodiment. Furthermore, these specific features, structures, or characteristics can be combined in any suitable manner in one or more embodiments. It should be understood that in the various embodiments of this application, the sequence numbers of the above-described processes do not imply a sequential order of execution; the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application. The sequence numbers of the above-described embodiments are merely descriptive and do not represent the superiority or inferiority of the embodiments.

[0084] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.

[0085] In the several embodiments provided in this application, it should be understood that the disclosed devices and methods can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods, such as: multiple units or components can be combined, or integrated into another system, or some features can be ignored or not executed. In addition, the coupling, direct coupling, or communication connection between the various components shown or discussed can be through some interfaces, and the indirect coupling or communication connection between devices or units can be electrical, mechanical, or other forms.

[0086] The units described above as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; they may be located in one place or distributed across multiple network units; some or all of the units may be selected to achieve the purpose of the embodiments of this application according to actual needs. In addition, each functional unit in the embodiments of this application may be fully integrated into one processing unit, or each unit may be a separate unit, or two or more units may be integrated into one unit; the integrated unit may be implemented in hardware or in the form of hardware plus software functional units.

[0087] Alternatively, if the integrated units described above are implemented as software functional modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of this application, or the parts that contribute to related technologies, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause the device automatic test line to execute all or part of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, ROMs, magnetic disks, or optical disks.

[0088] The methods disclosed in the several method embodiments provided in this application can be arbitrarily combined to obtain new method embodiments without conflict. The features disclosed in the several method or device embodiments provided in this application can be arbitrarily combined to obtain new method embodiments or device embodiments without conflict.

[0089] The above description is merely an embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A method for predicting the immunomodulatory activity of polysaccharides from black wolfberry, characterized in that, The method includes: Data of black goji berry polysaccharides to be tested are obtained, including monosaccharide composition, molecular weight parameters, basic physicochemical data, spectral data, transcriptome data, metabolome data, and process parameters. Data preprocessing is performed on the black goji berry polysaccharide sample data to be tested to obtain the first basic multimodal features, which include macroscopic structural features, transcriptome features, metabolome features, and process parameter features. If the data of the black goji berry polysaccharide sample to be tested contains first nuclear magnetic resonance spectroscopy data, the first nuclear magnetic resonance spectroscopy data is analyzed to obtain the glycosidic bond type, anodic carbon configuration and linkage site; based on the glycosidic bond type, the anodic carbon configuration and the linkage site, an atomic-level polysaccharide molecular map is constructed and microstructural features are generated. The first basic multimodal feature and the microstructure feature are fused to obtain the first multimodal fusion feature; the first multimodal fusion feature is input into the trained prediction model to obtain the first immune activity prediction value and the confidence interval characterizing the reliability of the prediction result output by the prediction model.

2. The method according to claim 1, characterized in that, The first nuclear magnetic resonance spectroscopy data is analyzed to obtain the glycosidic bond type, anodic carbon configuration, and linkage site; based on the glycosidic bond type, the anodic carbon configuration, and the linkage site, an atomic-level polysaccharide molecular map is constructed and microstructural features are generated, including: The first nuclear magnetic resonance spectrum data is standardized and preprocessed, and the standardized and preprocessed first nuclear magnetic resonance spectrum data is analyzed by a convolutional neural network to obtain the glycosidic bond type, anodic carbon configuration and linkage site. A set of monosaccharide residue nodes is constructed based on the monosaccharide composition and the linkage site. Based on each monosaccharide residue node in the set of monosaccharide residue nodes, the nodes of the initial polysaccharide molecule graph are determined; based on the glycosidic bond type and the connection site, the edges of the initial polysaccharide molecule graph are determined. Based on the glycosidic bond type, the anomeric carbon configuration, and the linkage site, node features and edge features are defined according to preset rules; Based on the glycosidic bond type and the coupling constant read from the first nuclear magnetic resonance spectroscopy data, a dual-path assignment strategy is used to calculate the dihedral conformation information; Based on the monosaccharide residue nodes and the glycosidic bond type, construct the adjacency matrix corresponding to the initial polysaccharide molecule graph; The monosaccharide residue nodes, the edges of the initial polysaccharide molecule graph, the node features, the edge features, the dihedral conformation information, and the adjacency matrix are integrated to obtain an atomic-level polysaccharide molecule graph. Based on the atomic-level polysaccharide molecular diagram, microstructural features are generated.

3. The method according to claim 2, characterized in that, Based on the atomic-level polysaccharide molecular map, microstructural features are generated, including: The atomic-level polysaccharide molecular map is input into a two-layer graph attention network. The spatial topology and microscopic chemical features of the atomic-level polysaccharide molecular map are extracted and fused layer by layer through the two-layer graph attention network to obtain the deep features of each monosaccharide residue node. Global attention pooling is performed on the deep features of each monosaccharide residue node to obtain the microstructural features.

4. The method according to claim 3, characterized in that, The method further includes: Based on the first basic multimodal features and the first predicted immune activity value, the first contribution of each feature in the first basic multimodal features to the first predicted immune activity value is calculated using the Kernel SHAP algorithm. Based on the node features and the edge features, using the Kernel SHAP algorithm, the attention weights of the two-layer graph attention network, and the first immune activity prediction value, the second contribution of each feature in the microstructure features to the first immune activity prediction value is calculated. Based on the first contribution and the second contribution, the corresponding features are sorted according to their contribution to obtain the feature order used to characterize the strength of immune activity. Select several key features with the highest contribution and generate corresponding optimization suggestions based on the magnitude of contribution and the category of feature source.

5. The method according to claim 4, characterized in that, The method further includes: The second contribution of each node feature and edge feature is mapped to the corresponding position of the atomic-level polysaccharide molecular graph to generate a molecular graph heatmap. Different colors in the molecular graph heatmap are used to characterize the positive or negative contribution of each feature to immune activity. Determine the confidence level of each connection site. For connection sites with a confidence level lower than a preset threshold, mark them with a first prompt message. The first prompt message is used to indicate that the confidence level of the connection site is low and it is recommended to supplement it with nuclear magnetic resonance spectroscopy verification.

6. The method according to claim 4, characterized in that, The method further includes: If it is determined that the data of the black goji berry polysaccharide sample to be tested does not contain the first nuclear magnetic resonance spectroscopy data, a second prompt message is marked. The second prompt message is used to indicate that there is no first nuclear magnetic resonance spectroscopy data and the glycosidic bond type and branch topology have not been resolved.

7. The method according to claim 1, characterized in that, The method further includes: Multiple sets of training black goji berry polysaccharide sample data were obtained. Each set of training black goji berry polysaccharide sample data included monosaccharide composition, molecular weight parameters, basic physicochemical data, spectral data, transcriptome data, metabolome data, process parameters, and corresponding true values ​​of immune activity. The first set of training black goji berry polysaccharide sample data also included second nuclear magnetic resonance spectroscopy data. Based on the training black goji berry polysaccharide sample data of each group, a corresponding second multimodal fusion feature is constructed; Each second multimodal fusion feature and its corresponding true immune activity value are input into the initial model to obtain the corresponding predicted value of the second immune activity output by the initial model. Based on the loss function, the error between the second predicted immune activity value and the true immune activity value is calculated. The model parameters of the initial model are iteratively optimized through the backpropagation algorithm to minimize the loss function, and cross-validation is performed. If the loss function is less than a preset threshold and the cross-validation accuracy reaches a preset standard, training is terminated, and a trained prediction model is obtained.

8. A system for predicting the immune activity of black goji berry polysaccharides, characterized in that, The system includes: The data acquisition and processing module is used to acquire sample data of black goji berry polysaccharides to be tested. The sample data of black goji berry polysaccharides to be tested includes monosaccharide composition, molecular weight parameters, basic physicochemical data, spectral data, transcriptome data, metabolome data, and process parameters. The sample data of black goji berry polysaccharides to be tested is preprocessed to obtain the first basic multimodal features. The first basic multimodal features include macroscopic structural features, transcriptome features, metabolome features, and process parameter features. The analysis module is used to analyze the first nuclear magnetic resonance spectrum data when it is determined that the black goji berry polysaccharide sample data contains the first nuclear magnetic resonance spectrum data, to obtain the glycosidic bond type, anodic carbon configuration and linkage site; based on the glycosidic bond type, the anodic carbon configuration and the linkage site, to construct an atomic-level polysaccharide molecular map and generate microstructural features; The first prediction module is used to fuse the first basic multimodal features and the microstructure features to obtain a first multimodal fusion feature; input the first multimodal fusion feature into the trained prediction model to obtain the first immune activity prediction value and the confidence interval characterizing the reliability of the prediction result output by the prediction model.

9. The system according to claim 8, characterized in that, The parsing module includes: The parsing submodule is used to perform standardized preprocessing on the first nuclear magnetic resonance spectrum data, and to parse the standardized preprocessed first nuclear magnetic resonance spectrum data through a convolutional neural network to obtain the glycosidic bond type, anodic carbon configuration and linkage site, and to construct a set of monosaccharide residue nodes based on the monosaccharide composition and the linkage site. The determination submodule is used to determine the nodes of the initial polysaccharide molecule graph based on each monosaccharide residue node in the set of monosaccharide residue nodes; and to determine the edges of the initial polysaccharide molecule graph based on the glycosidic bond type and the connection site. A submodule is defined to define node features and edge features according to preset rules based on the glycosidic bond type, the anomeric carbon configuration, and the linkage site. The calculation submodule is used to calculate the dihedral conformation information based on the glycosidic bond type and the coupling constant read from the first nuclear magnetic resonance spectral data, using a dual-path assignment strategy. A submodule is constructed to build the adjacency matrix corresponding to the initial polysaccharide molecule graph based on the monosaccharide residue nodes and the glycosidic bond type. An integration submodule is used to integrate the monosaccharide residue nodes, the edges of the initial polysaccharide molecule graph, the node features, the edge features, the dihedral conformation information, and the adjacency matrix to obtain an atomic-level polysaccharide molecule graph. A generation submodule is used to generate microstructural features based on the atomic-level polysaccharide molecular map.

10. The system according to claim 8, characterized in that, The generation submodule includes: The extraction unit is used to input the atomic-level polysaccharide molecular map into a two-layer graph attention network, and to extract and fuse the spatial topology and microscopic chemical features of the atomic-level polysaccharide molecular map layer by layer through the two-layer graph attention network to obtain the deep features of each monosaccharide residue node. Pooling units are used to perform global attention pooling on the deep features of each monosaccharide residue node to obtain microstructural features.