Cancer survival prediction method based on multi-omics data and graph attention transformer
By constructing the MOGAT deep neural network model and combining multi-omics data with graph attention Transformer, the heterogeneity and small sample size problems in multi-omics survival prediction were solved, achieving survival prediction among different cancers and improving the model's pan-cancer generality and prediction accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NORTHEAST FORESTRY UNIV
- Filing Date
- 2026-02-11
- Publication Date
- 2026-06-12
AI Technical Summary
Existing multi-group survival prediction methods suffer from limitations such as 'one model per cancer', lack of topological structure, and insufficient handling of heterogeneity. They cannot effectively utilize the biological pathways and prognostic signals between different cancers, resulting in severe overfitting on small cancer samples. Furthermore, directly mixing pan-cancer data can obfuscate the specific baseline risks of different cancers.
A pan-cancer survival prediction method based on multi-omics data and graph attention Transformer is adopted. By acquiring multi-omics data and biological network data, a MOGAT deep neural network model is constructed, including a stacked denoising autoencoder module, a graph neural network module and a Transformer module, for feature extraction and cross-modal attention interaction. End-to-end training is performed using the Cox partial likelihood loss function to achieve survival risk score prediction.
It enables survival prediction of multiple cancers under the same set of parameters, has topology awareness and robustness, can focus on core oncogenic pathways in the case of sparse data, and shows the model’s dependence on different modalities through the attention weight matrix of Transformer, thereby improving the interpretability and accuracy of prediction.
Smart Images

Figure CN122201752A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a pan-cancer survival prediction method based on multi-omics data and graph attention Transformer, belonging to the fields of bioinformatics and medical artificial intelligence. Background Technology
[0002] Cancer exhibits high intratumoral and intertumoral heterogeneity. In the era of precision oncology, accurate survival prediction is crucial for patient risk stratification and personalized treatment. Traditional TNM staging systems rely primarily on anatomical features and often fail to capture the complex molecular profile of tumors. With the widespread adoption of high-throughput sequencing technologies, multi-omics data (such as mRNA, miRNA, and protein expression) provide abundant resources for elucidating cancer mechanisms. However, existing multi-omics survival prediction methods face three main challenges:
[0003] 1. Limitations of “one cancer, one model”: Most methods (such as DeepSurv) only model a single cancer and cannot take advantage of the biological pathways and prognostic signals shared between different cancers, resulting in severe overfitting on small sample cancers.
[0004] 2. Lack of topological structure: Existing multi-omics fusion methods (such as simple feature splicing) ignore the complex intermolecular interaction network (such as protein-protein interaction PPI), and lose key biological topological information.
[0005] 3. Insufficient handling of heterogeneity: Directly mixing pan-cancer data can lead to model confusion of the specific baseline risk of different cancers, and there is a lack of effective mechanisms to distinguish between "common features" and "specific features". Summary of the Invention
[0006] To address the heterogeneity and small sample size issues in pan-cancer survival prediction, this invention proposes a pan-cancer survival prediction method based on multi-omics data and graph attention Transformer.
[0007] The technical solution adopted by the present invention to solve the above problems is as follows: The present invention includes the following steps: Step 1: Obtain multi-omics data, clinical survival data, and biological network data from pan-cancer patients; Step 2: Perform preprocessing and feature filtering on the acquired multi-omics data in sequence; Step 3: Construct the MOGAT deep neural network model, which includes a stacked denoising autoencoder module, a graph neural network module, and a Transformer module; Step 4: Input the multi-omics data after feature filtering into the stacked denoising autoencoder module and the graph neural network module respectively to extract low-dimensional latent features and extract topological aggregation features; Step 5: Construct a pseudo-semantic sequence containing a global classification token, cancer background embedding, and features of each omics. Input the pseudo-semantic sequence into the Transformer module for cross-modal attention interaction to obtain the updated global classification token. Calculate the survival risk value based on the updated global classification token. Step 6: Construct a Cox partial likelihood loss function based on clinical survival data and perform end-to-end training on the MOGAT model; Step 7: Based on the trained MOGAT model, predict the survival risk score and mine key biomarkers for the patients to be tested.
[0008] Furthermore, the multi-omics data in step 1 includes at least mRNA expression data, miRNA expression data, and proteomics data; the biological network data is protein-protein interaction network data.
[0009] Furthermore, step 2 specifically includes: Step 2.1: Construct input feature sets for mRNA expression data and miRNA expression data using a hybrid screening strategy that combines knowledge-driven and data-driven approaches; Step 2.2: Remove and impute outliers in the input feature set; Step 2.3: Obtain the pan-cancer driver gene list by querying, and forcibly retain the pan-cancer driver gene list as the first-tier feature; Step 2.4: Calculate the variance of ascending genes in the input feature set after outlier removal and imputation, remove low-variance genes, perform univariate Cox regression analysis on the remaining genes, and select the Top-K genes as the second-tier features based on the significance of the P-values calculated by the Cox regression analysis. Step 2.5: Merge the first-tier features with the second-tier features and perform Z-score standardization to complete the preprocessing of multiple sets of scores and feature selection, and determine the final input dimension of the mRNA data.
[0010] Furthermore, in step 3, the stacked noise reduction autoencoder module is used to extract transcriptome features; the graph neural network module is used to extract protein topological features; and the Transformer module is used for global fusion of multimodal features. The stacked noise reduction autoencoder module adopts a pyramid-shaped compression structure, including an input layer, encoding layer, bottleneck layer, and decoding layer connected in sequence; a Dropout mechanism is introduced in the encoding layer to inject random noise into the input data; the dimensions of the intermediate hidden layers... Based on input dimensions and bottleneck layer dimension Dynamic computation, in which the reconstruction error of the decoder is used to assist the encoder in learning a robust nonlinear representation of the data during the training of the MOGAT deep neural network model; The graph neural network module includes graph convolutional layers, graph attention layers, and average pooling layers. The graph convolutional layers and graph attention layers are used to aggregate node features, and the average pooling layer is used to aggregate the features output by all nodes into the final output feature vector. Dimensions of intermediate hidden layers The calculation formula is: (1).
[0011] Furthermore, step 4 specifically includes: Step 4.1: Input the preprocessed mRNA and miRNA data into the stacked noise reduction autoencoder module to extract miRNA features. mRNA characteristics ; Step 4.2: Construct the graph structure G , , where V is a protein node, E is a PPI interaction edge selected based on confidence threshold, and the topological connectivity of the graph structure G remains static for all patient samples; Step 4.3: Assign the proteomics data of each patient sample as a node feature value to V, so that the attributes of the graph nodes change dynamically with the samples; Step 4.4: Aggregate node features using either a graph convolutional layer or a graph attention layer. This aggregation process includes information about the node itself and its neighbors, and symmetric normalization is used to balance node degree differences. Then, aggregate the features output by all nodes using a global average pooling layer to obtain a patient-level proteomics feature vector. Among them, proteomics feature vectors Including protein characteristics .
[0012] Furthermore, step 5 specifically includes: Step 5.1: Construct a learnable parameter vector as a global classification token It is used to aggregate global survival risk information; Step 5.2: Based on the patient's cancer type ID, obtain the cancer background embedding vector through embedding layer mapping. Cancer background embedding vector Used to provide contextual information about specific cancer types to the model; Step 5.3: Convert the miRNA features output by the stacked noise reduction autoencoder module mRNA characteristics And the protein features output by the graph neural network module The features are aligned to the same dimension through a linear projection layer, and then arranged sequentially to form the input sequence. ; Step 5.4: Sequence Input the Transformer module, which uses a multi-head self-attention mechanism to calculate the correlation weights between elements in the sequence. During the self-attention calculation process, a global classification token is used. As a query vector, it is used to dynamically focus on and aggregate data from... , , and Key information; Step 5.5: After processing by the Transformer module, extract only the vector of the first updated position in the sequence. As the final multimodal fusion feature Vectors at other positions are discarded during the prediction phase; Step 5.6: Based on multimodal fusion features Calculate the survival risk value; Input sequence The expression is: (2).
[0013] Furthermore, step 5.6 specifically includes: Extracted fusion features The input is fed into a nonlinear risk prediction head, which consists of fully connected layers, and the output is a scalar value representing the patient's log hazard ratio. This scalar value is then output as the survival risk value.
[0014] Furthermore, the expression for the Cox partial likelihood loss function in step 6 is: (3); In formula (3), This is an event occurrence indicator; 1 represents a death event, and 0 represents truncation. For patients i Observation time, In time A group of patients at risk A risk score is assigned to the model's predictions.
[0015] In addition, the present invention also proposes a computer device and a computer-readable storage medium.
[0016] The computer device includes a memory and a processor. The memory stores computer programs, and the processor executes the computer programs to implement the aforementioned pan-cancer survival prediction method.
[0017] A computer program is stored on a computer-readable storage medium, and when the computer program is executed by a processor, it implements the aforementioned pan-cancer survival prediction method.
[0018] The beneficial effects of this invention are: 1. Pan-cancer universality: By introducing Cancer Embedding, the model has learned to handle survival predictions for 18 different cancers under the same set of parameters, achieving "one model for multiple uses".
[0019] 2. Topology awareness: By integrating static PPI network knowledge into dynamic representation data using GNN, it is better able to capture anomalies in functional modules than traditional fully connected layers.
[0020] 3. Robustness: The mandatory preservation mechanism of Tier 1 genes ensures that the model can still focus on core oncogenic pathways even in the case of sparse data.
[0021] 4. Interpretability: The attention weight matrix of Transformer intuitively shows the degree of dependence of the model on different modalities, which helps to discover new prognostic markers. Attached Figure Description
[0022] Figure 1 The flowchart shows a pan-cancer survival prediction method based on multi-omics data and graph attention Transformer. Figure 2 This is a structural diagram of the MOGAT deep neural network model; Figure 3 This is a schematic diagram of data segmentation training; Figure 4 The structure and training principle diagram of a stacked noise reduction autoencoder; Figure 5 This is a schematic diagram of feature propagation in a graph neural network. Figure 6 This is a schematic diagram of the Transformer feature fusion module and multi-head attention mechanism; Figure 7 This is a schematic diagram illustrating the risk stratification of different cancers in the model; Figure 8 and Figure 9 A comparison graph showing the model against other models on a unified dataset; Figure 10 This is a schematic diagram showing the spatial distribution of samples after model processing; Figure 11 The results of the multivariate Cox regression analysis are shown in the figure. Figure 12 Calibration curves for the model when predicting 1-year, 3-year, and 5-year survival rates; Figure 13 The result graph of the intervention curve for type 2 treatment; Figure 14 This is a schematic diagram illustrating the model's attention to different modalities; Figure 15 The results of KEGG pathway enrichment analysis on high-contribution genes identified by the model are shown in the figure. Figure 16 and Figure 17 This is a schematic diagram showing the validation results of the model on an external dataset. Detailed Implementation
[0023] like Figure 1 As shown, the steps of the pan-cancer survival prediction method based on multi-omics data and graph attention Transformer described in this embodiment include: S1: Acquire multi-omics cancer data; Multi-omics data include at least mRNA expression data, miRNA expression data, and proteomics data; biological network data refers to protein-protein interaction (PPI) network data.
[0024] S2: Multi-omics data preprocessing; S201: Construct input feature sets for mRNA expression data and miRNA expression data using a hybrid screening strategy that combines knowledge-driven and data-driven approaches; S202: Remove and impute outliers from the input feature set; S203: Obtain the pan-cancer driver gene list by querying and forcibly retain the pan-cancer driver gene list as the first-tier feature; S204: Calculate the variance of ascending genes in the input feature set after outlier removal and imputation, remove low-variance genes, perform univariate Cox regression analysis on the remaining genes, and select the Top-K genes as the second-tier features based on the significance of the P-values calculated by the Cox regression analysis. S205: Merge the first-tier features with the second-tier features and perform Z-score standardization to complete the preprocessing of multiple sets of scores and feature selection, and determine the final input dimension of the mRNA data.
[0025] S3: Building deep learning models; like Figure 2 As shown, the MOGAT model includes a stacked denoising autoencoder module, a graph neural network module, and a Transformer module; the stacked denoising autoencoder module is used to extract transcriptome features; the graph neural network module is used to extract protein topological features; and the Transformer module is used for global fusion of multimodal features. The stacked noise reduction autoencoder module adopts a pyramid-shaped compression structure, including an input layer, encoding layer, bottleneck layer, and decoding layer connected in sequence; a Dropout mechanism is introduced in the encoding layer to inject random noise into the input data; the dimensions of the intermediate hidden layers... Based on input dimensions and bottleneck layer dimension Dynamic computation, in which the reconstruction error of the decoder is used to assist the encoder in learning a robust nonlinear representation of the data during the training of the MOGAT deep neural network model; Dimensions of intermediate hidden layers The calculation formula is: (1).
[0026] The graph neural network module includes graph convolutional layers, graph attention layers, and average pooling layers. The graph convolutional layers and graph attention layers are used to aggregate node features, while the average pooling layer is used to aggregate the features output by all nodes into the final output feature vector. Specifically, the MOGAT model consists of three parallel feature extraction branches: two SDAE branches process high-dimensional sparse mRNA and miRNA data, respectively, compressing them into low-dimensional dense vectors; a GNN branch utilizes a static protein-protein interaction (PPI) network structure and dynamic sample expression values to extract proteomics topological features. The outputs of these three branches, after dimensional alignment, are combined with learnable CLS tokens and cancer type embeddings, and then fed into a Transformer-based global fusion module. Through multi-layer self-attention mechanisms, the updated CLS vector is finally extracted and output as a patient risk score via a multilayer perceptron (MLP).
[0027] S4: Train the constructed deep learning model based on the preprocessed multi-omics data; like Figure 3 As shown, to prevent data leakage and evaluate the model's generalization ability, this implementation adopts a strict "5-fold cross-validation" strategy. Data from 5830 patients across 18 types of cancer are divided into five non-overlapping subsets, each containing all cancer types and their molecular subtypes. In each training round, four subsets are used for training, and the remaining subset is used solely for testing. Crucially, all data preprocessing steps (such as calculating the mean of missing values, calculating Z-score normalization parameters, and Cox regression-based feature selection) are performed only within the training set; the test set is passively processed, thus preventing the leakage of "future information" into the training phase and ensuring the reliability of the evaluation results.
[0028] S401: Input the preprocessed mRNA and miRNA data into the SDAE module to extract low-dimensional latent features; map the proteomics data onto PPI network nodes and input them into the GNN module to extract topological aggregation features; like Figure 4As shown, the input high-dimensional data is first mapped to a low-dimensional latent space through an encoding layer. To prevent overfitting and enhance robustness, a Dropout mechanism is introduced during the encoding process to denoise the input features. The training objective of the model is to minimize the reconstruction error, that is, to enable the decoder to reconstruct the original input data from the damaged low-dimensional features, thereby forcing the encoder to learn the most essential and robust nonlinear feature representation of the data.
[0029] like Figure 5 As shown, construct a graph structure Where V represents a protein node and E represents PPI interaction edges selected based on confidence thresholds; the topological connectivity of the graph structure G remains static across all patient samples; the protein expression level of each patient sample is assigned as a node feature value to V, allowing the attributes of the graph nodes to change dynamically with the samples; the node features are aggregated using a graph convolutional layer (GCN) or a graph attention layer (GAT), the aggregation process including information about the node itself and its neighboring nodes, and a symmetric normalization method is used to balance the differences in node degree; finally, through a global mean pooling operation, the features of all nodes are aggregated into a patient-level proteomics feature vector. .
[0030] S402: Construct a pseudo-semantic sequence containing a global classification token (CLS Token), cancer background embedding, and features from various omics, input it into the Transformer module for cross-modal attention interaction, and extract the updated global classification token to calculate the survival risk value; S40201: Construct a learnable parameter vector as a global classification token It is used to aggregate global survival risk information; S40202: Based on the patient's cancer type ID, obtain the cancer background embedding vector through the embedding layer. This is used to provide the model with contextual information about specific cancer types; S40203: Identify miRNA features output by SDAE mRNA characteristics and the protein features output by GNN Align them to the same dimension using a linear projection layer; concatenate the above vectors in the following order to form the input sequence. : (2).
[0031] S40204: As Figure 6 As shown, the sequence The input is a multi-layer Transformer encoder, and the correlation weights between elements in the sequence are calculated using a multi-head self-attention mechanism; during the self-attention calculation process... As a query vector, dynamically monitor and aggregate data from... , , and Key information; S40205: After processing by the Transformer layer, only the vector of the first updated position in the sequence is extracted. As the final multimodal fusion feature Vectors at other positions are discarded during the prediction phase.
[0032] S40206: Extracted fusion features The input is fed into a nonlinear risk prediction head; the risk prediction head consists of a fully connected layer (MLP) and outputs a scalar value representing the patient's log-hazard ratio.
[0033] S403: Construct a Cox partial likelihood loss function based on clinical survival data and perform end-to-end training of the MOGAT model; The expression for the Cox partial likelihood loss function is: (3); In formula (3), This is an event occurrence indicator; 1 represents a death event, and 0 represents truncation. For patients i Observation time, In time A group of patients at risk A risk score is assigned to the model's predictions.
[0034] S5: Perform risk assessment and biomarker detection on new patients based on the trained deep learning model. Example 1 This example details the construction and training process of the MOGAT model, as shown below: 1. Data Alignment and Preprocessing Mechanism This embodiment selects 18 cancer types from the TCGA database, totaling 5830 patients. To address the curse of dimensionality (>60,000 dimensions) in mRNA data, this invention designs a unique hybrid screening mechanism: Step 1 (Prior Retention): Based on the COSMIC and TCGA prior knowledge bases, 115 pan-oncology driver genes (such as TP53, KRAS, PTEN, etc.) are pre-identified. These genes are forcibly retained regardless of their statistical significance.
[0035] The second step (statistical screening): In each fold of the training set of the 5-fold cross-validation, the remaining genes are subjected to variance filtering (removing genes with variance <0.5) and univariate Cox regression screening (selecting the top-K genes with the smallest p-value).
[0036] Step 3 (Merging): Merge the two gene groups mentioned above and use them as input to the SDAE network.
[0037] Technical effect: This strategy not only utilizes the statistical patterns of big data, but also prevents the loss of key low-expression driver genes by purely data-driven methods.
[0038] 2. Static Graph and Dynamic Value GNN Design: For proteomics data, this invention constructs a graph neural network with "static topology and dynamic features": Graph Construction: A PPI network containing 487 nodes and edges filtered based on confidence is constructed using the STRING database. The adjacency matrix A is shared and fixed across all samples.
[0039] Feature propagation: For the i-th patient, their protein expression Vectors are mapped onto graph nodes. GNN layers use the formula... Perform feature propagation.
[0040] Global pooling: To interface with subsequent modules, global mean pooling is used to aggregate the features of all nodes in the graph into a fixed-dimensional vector. .
[0041] 3. Transformer fusion based on "pseudo-semantic sequences": This is a key step in addressing pan-cancer heterogeneity in this invention. Unlike traditional feature concatenation, this embodiment models multimodal fusion as a sequence interaction problem: Sequence construction: In this embodiment, a sequence S of length 5 is constructed.
[0042] Token1(CLS): A randomly initialized learnable vector, similar to [CLS] in BERT, specifically used to aggregate global information for classification / regression.
[0043] Token2 (Cancer ID): Maps the patient's cancer type (e.g., "BRCA", "GBM") to a 512-dimensional vector through an embedding layer. This tells the model: "We are currently processing breast cancer data; please focus on breast cancer-related features."
[0044] Token3-5(Omics): Feature vectors from miRNA, mRNA, and Protein branches, respectively.
[0045] Attentional Interaction: The Transformer's self-attention mechanism allows the CLS token to "question" other tokens. For example, when processing breast cancer samples, the Cancer Token guides the CLS to focus more on HER2 or BRCA1 / 2-related Omics features; while when processing gliomas, it guides the CLS to focus on IDH1-related features.
[0046] Predicted output: Ultimately, the model only uses the updated CLS vector for prediction, discarding other tokens. This mechanism greatly reduces noise interference.
[0047] Example 2: Model Training and Validation Loss function: The Cox partial likelihood loss function with L2 regularization is used.
[0048] Optimization strategy: Use the AdamW optimizer and introduce a learning rate scheduling strategy of Linear Warmup + Cosine Annealing to prevent the model from getting stuck in local optima in the early stages of training.
[0049] Leakage prevention verification: A rigorous 5-fold cross-validation is employed. In particular, the parameters of mean interpolation (Imputer) and standardization (Scaler) are calculated only on the training set and then applied to the test set, eliminating the leakage of "future data".
[0050] Example 3: Generalization Ability and Application Cross-platform validation: The model trained on TCGA (RNA-Seq sequencing) was directly applied to the GSE39582 colorectal cancer dataset (microarray sequencing). The results showed that the C-index reached over 0.7, demonstrating that the model learned robust biological characteristics unaffected by the sequencing platform.
[0051] Zero-sample transfer: When the model was applied to a previously unseen CLL (chronic lymphocytic leukemia) dataset, significant risk stratification was still achieved by borrowing the Cancer Embedding of bladder cancer (BLCA) as a prior background. This demonstrates the effectiveness of the Cancer Embedding mechanism.
[0052] Applications and Validation: 1. Risk Stratification: Patients were divided into high-risk and low-risk groups based on the median risk score output by the model. The significance of the difference in survival rates between the two groups was assessed using Kaplan-Meier survival curves and the Log-rank test. Figure 7 As shown, the model achieved significant stratification in 7 out of 18 cancers. Figure 7 Kaplan-Meier survival curves are presented, stratifying patients with 18 different cancers using risk scores predicted by the MOGAT model. Patients are divided into high-risk and low-risk groups based on median risk. Results show that in the vast majority of cancers, the survival curves of the two groups are significantly separated (P < 0.05), with the high-risk group exhibiting significantly shorter survival times than the low-risk group, demonstrating the model's strong risk stratification capability and clinical prognostic value.
[0053] 2. Performance Evaluation: The predictive accuracy of the model is evaluated using the consistency index (C-index). For example... Figure 8 Figure 9 The model shown was compared with other baseline models, including RSF, CoxPH, SLCGF, FGCNSurv, and DeepProg. Overall, the model in this embodiment performs better on the same dataset.
[0054] Figure 8 The bar chart shows the performance comparison of MOGAT with existing mainstream models such as CoxPH, RSF, DeepSurv, DeepProg, and SLCGF on the C-index metric. MOGAT achieved the highest prediction accuracy in most cancer types. Figure 9 The matrix diagram further illustrates the "win-loss" relationship between the models. In 90 one-on-one adversarial comparisons, MOGAT won 76 times, proving that the performance of this invention as a pan-cancer general model is significantly better than the traditional "single cancer single model" method.
[0055] 3. Black-box interpretability: This embodiment visualizes the distribution of sample data after passing through the entire framework, such as... Figure 10 As shown, the data is transformed from a chaotic state to an ordered state.
[0056] Figure 10 The distribution of hidden layer features extracted by the model was visualized using t-SNE technology. Compared with the chaotic distribution of the original multi-omics data, the feature space processed by the MOGAT model exhibits a clear manifold structure. Surviving patients and deceased patients show obvious gradient separation or clustering trends in the space, indicating that the model successfully compressed the messy high-dimensional omics data into a low-dimensional representation with clear biological significance and prognostic discrimination.
[0057] 4. Clinical Interpretability: To investigate whether the risk score in this embodiment is an additional variable independent of clinical findings, this embodiment was tested on 18 types of cancer. The results regarding whether the model is an additional decision variable in different cancers are as follows: Figure 11 As shown, the results indicate that in six types of cancer, the model in this embodiment represents clinically independent risk variables. And as... Figure 12 As shown, this embodiment tested the effectiveness of the calibration curve on different cancers, and the results showed that the model has a fairly high degree of consistency. Figure 13 As shown, in pan-cancer decision analysis, MOGAT has a higher benefit rate than ordinary decision in all 18 types of cancer.
[0058] Figure 11 The results of the multivariate Cox regression analysis are presented. After adjusting for traditional clinical covariates such as age, sex, pathological stage, and histological grade, the risk score output by MOGAT remained statistically significant in various cancers, with a Hazard Ratio > 1 and P < 0.05. This indicates that MOGAT captures independent molecular prognostic information beyond traditional clinical indicators and can serve as an independent prognostic factor to assist clinical decision-making.
[0059] Figure 12 The calibration curves of the model in predicting 1-year, 3-year, and 5-year survival rates are shown. The gray dashed line represents the ideal prediction, where the predicted probability equals the actual incidence rate, and the dark line represents the actual performance of the model. The results show that in most cancers, the model's calibration curve closely follows the ideal diagonal, indicating that the survival probability values output by the model are accurate and reliable, without significant overestimation or underestimation.
[0060] Figure 13 The results of the decision curve analysis are presented to evaluate the model's net clinical benefit. Compared to the extreme strategies of "treat everyone" or "don't treat anyone," intervention decisions based on the MOGAT model score (darker solid line) yielded higher net benefits across a wide range of threshold probabilities. This demonstrates that the model not only boasts superior statistical performance but also possesses significant practical value in real-world clinical applications, preventing unnecessary overtreatment.
[0061] 5. Biomarker discovery: such as Figure 14 As shown, the attention weight matrix of the Transformer layer is extracted, and the average attention score for each gene or pathway is calculated. Figure 15 As shown, genes with high scores are considered to have the greatest impact on prognosis and can be used as potential biomarkers for further wet experiments.
[0062] Figure 14The attention weight distribution in the Transformer module was visualized as a heatmap. The horizontal axis represents the various input modalities (Keys), and the vertical axis represents the query vector (Query, mainly CLS). The results show that the attention allocation of the CLS token to mRNA, miRNA, Protein, and cancer background is relatively balanced, around 0.2, without "modal collapse," i.e., over-reliance on a single modality. This demonstrates that the model successfully achieves complementary fusion of multimodal information.
[0063] Figure 15 The results of KEGG pathway enrichment analysis on high-contribution genes identified by the model are presented. The analysis revealed that these key genes are highly enriched in immune-related pathways at the pan-cancer scale, such as allogeneic graft rejection, type 1 diabetes, and the T-cell receptor signaling pathway. This demonstrates that the MOGAT model, in predicting survival, keenly captures the state of immune infiltration in the tumor microenvironment, particularly T-cell-mediated cytotoxic responses, indicating that immune activity is a core mechanism influencing patient prognosis.
[0064] 6. Additional external queue tests: such as Figure 16 , 17 As shown, due to the lack of multi-omics independent datasets, this embodiment uses a single modality to test the model on an additional dataset. The results show that the model performs well on the external dataset.
[0065] like Figure 16 As shown, in the GSE39582 colorectal cancer cohort (validated across sequencing platforms) and the CGGA glioma cohort (validated across ethnic groups), the model was still able to significantly distinguish between high- and low-risk patients (P<0.001), demonstrating that the model learned robust features across platforms and populations.
[0066] Figure 17 The model demonstrated a challenging "zero-sample transfer" test. In the untrained CLL leukemia cohort (liquid tumor), the model successfully achieved risk stratification by borrowing prior embeddings from bladder cancer. Simultaneously, in the METABRIC breast cancer cohort, the model achieved accurate prognostic differentiation within different molecular subtypes such as Luminal A and Basal. These results fully validate MOGAT's powerful generalization ability as a pan-cancer foundational model.
[0067] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Although the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the present invention. Any person skilled in the art can make some modifications or alterations to the above-disclosed technical content to create equivalent embodiments without departing from the scope of the present invention. Any simple modifications, equivalent substitutions, and improvements made to the above embodiments without departing from the scope of the present invention, based on the technical essence of the present invention and within the spirit and principles of the present invention, shall still fall within the protection scope of the present invention.
Claims
1. A pan-cancer survival prediction method based on multi-omics data and graph attention Transformer, characterized in that, include: Step 1: Obtain multi-omics data, clinical survival data, and biological network data from pan-cancer patients; Step 2: Perform preprocessing and feature filtering on the acquired multi-omics data in sequence; Step 3: Construct the MOGAT deep neural network model, which includes a stacked denoising autoencoder module, a graph neural network module, and a Transformer module; Step 4: Input the multi-omics data after feature filtering into the stacked denoising autoencoder module and the graph neural network module respectively to extract low-dimensional latent features and extract topological aggregation features; Step 5: Construct a pseudo-semantic sequence containing a global classification token, cancer background embedding, and features of each omics. Input the pseudo-semantic sequence into the Transformer module for cross-modal attention interaction to obtain the updated global classification token. Calculate the survival risk value based on the updated global classification token. Step 6: Construct a Cox partial likelihood loss function based on clinical survival data and perform end-to-end training on the MOGAT model; Step 7: Based on the trained MOGAT model, predict the survival risk score and mine key biomarkers for the patients to be tested.
2. The pan-cancer survival prediction method based on multi-omics data and graph attention Transformer according to claim 1, characterized in that, The multi-omics data in step 1 includes at least mRNA expression data, miRNA expression data, and proteomics data; the biological network data is protein-protein interaction network data.
3. The pan-cancer survival prediction method based on multi-omics data and graph attention Transformer according to claim 1, characterized in that, Step 2 specifically includes: Step 2.1: Construct input feature sets for mRNA expression data and miRNA expression data using a hybrid screening strategy that combines knowledge-driven and data-driven approaches; Step 2.2: Remove and impute outliers in the input feature set; Step 2.3: Obtain the pan-cancer driver gene list by querying, and forcibly retain the pan-cancer driver gene list as the first-tier feature; Step 2.4: Calculate the variance of ascending genes in the input feature set after outlier removal and imputation, remove low-variance genes, perform univariate Cox regression analysis on the remaining genes, and select the Top-K genes as the second-tier features based on the significance of the P-values calculated by the Cox regression analysis. Step 2.5: Merge the first-tier features with the second-tier features and perform Z-score standardization to complete the preprocessing of multiple sets of scores and feature selection, and determine the final input dimension of the mRNA data.
4. The pan-cancer survival prediction method based on multi-omics data and graph attention Transformer according to claim 1, characterized in that, In step 3, the stacked noise reduction autoencoder module is used to extract transcriptome features; the graph neural network module is used to extract protein topological features; and the Transformer module is used for global fusion of multimodal features. The stacked noise reduction autoencoder module adopts a pyramid-shaped compression structure, including an input layer, encoding layer, bottleneck layer, and decoding layer connected in sequence; a Dropout mechanism is introduced in the encoding layer to inject random noise into the input data; the dimensions of the intermediate hidden layers... Based on input dimensions and bottleneck layer dimension Dynamic computation, in which the reconstruction error of the decoder is used to assist the encoder in learning a robust nonlinear representation of the data during the training of the MOGAT deep neural network model; The graph neural network module includes a graph convolutional layer, a graph attention layer, and an average pooling layer; the graph convolutional layer and the graph attention layer are used to aggregate node features, and the average pooling layer is used to aggregate the features output by all nodes into the final output feature vector; Dimensions of intermediate hidden layers The calculation formula is: (1)。 5. The pan-cancer survival prediction method based on multi-omics data and graph attention Transformer according to claim 4, characterized in that, Step 4 specifically includes: Step 4.1: Input the preprocessed mRNA and miRNA data into the stacked noise reduction autoencoder module to extract miRNA features. mRNA characteristics ; Step 4.2: Construct the graph structure G , , where V is a protein node, E is a PPI interaction edge selected based on confidence threshold, and the topological connectivity of the graph structure G remains static for all patient samples; Step 4.3: Assign the proteomics data of each patient sample as a node feature value to V, so that the attributes of the graph nodes change dynamically with the samples; Step 4.4: Aggregate node features using either a graph convolutional layer or a graph attention layer. This aggregation process includes information about the node itself and its neighbors, and symmetric normalization is used to balance node degree differences. Then, aggregate the features output by all nodes using a global average pooling layer to obtain a patient-level proteomics feature vector. Among them, proteomics feature vectors Including protein characteristics .
6. The pan-cancer survival prediction method based on multi-omics data and graph attention Transformer according to claim 1, characterized in that, Step 5 specifically includes: Step 5.1: Construct a learnable parameter vector as a global classification token It is used to aggregate global survival risk information; Step 5.2: Based on the patient's cancer type ID, obtain the cancer background embedding vector through embedding layer mapping. Cancer background embedding vector Used to provide contextual information about specific cancer types to the model; Step 5.3: Convert the miRNA features output by the stacked noise reduction autoencoder module mRNA characteristics And the protein features output by the graph neural network module The features are aligned to the same dimension through a linear projection layer, and then arranged sequentially to form the input sequence. ; Step 5.4: Sequence Input the Transformer module, which uses a multi-head self-attention mechanism to calculate the correlation weights between elements in the sequence. During the self-attention calculation process, a global classification token is used. As a query vector, it is used to dynamically focus on and aggregate data from... , , and Key information; Step 5.5: After processing by the Transformer module, extract only the vector of the first updated position in the sequence. As the final multimodal fusion feature Vectors at other positions are discarded during the prediction phase; Step 5.6: Based on multimodal fusion features Calculate the survival risk value; Input sequence The expression is: (2)。 7. The pan-cancer survival prediction method based on multi-omics data and graph attention Transformer according to claim 6, characterized in that, Step 5.6 specifically includes: Extracted fusion features The input is fed into a nonlinear risk prediction head, which consists of fully connected layers, and the output is a scalar value representing the patient's log hazard ratio. This scalar value is then output as the survival risk value.
8. The pan-cancer survival prediction method based on multi-omics data and graph attention Transformer according to claim 1, characterized in that, The expression for the Cox partial likelihood loss function in step 6 is: (3); In formula (3), This is an event occurrence indicator; 1 represents a death event, and 0 represents truncation. For patients i Observation time, In time A group of patients at risk A risk score is assigned to the model's predictions.
9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the method of any one of claims 1 to 8.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method of any one of claims 1 to 8.