A liver cancer prognosis marker and application thereof

By integrating single-cell and spatial transcriptome data, genes such as TXNRD1, HAO1, C1S, OPTN, and SNX6 were screened as prognostic biomarkers for liver cancer. A prognostic prediction model was constructed, which solved the problem of low response rate to immunotherapy for liver cancer and enabled personalized treatment and prognostic prediction for liver cancer patients.

CN119709995BActive Publication Date: 2026-06-26SHENZHEN PEOPLES HOSPITAL

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN PEOPLES HOSPITAL
Filing Date
2024-11-14
Publication Date
2026-06-26

Smart Images

  • Figure SMS_1
    Figure SMS_1
  • Figure SMS_2
    Figure SMS_2
  • Figure SMS_3
    Figure SMS_3
Patent Text Reader

Abstract

The application discloses a liver cancer prognosis marker and application thereof, and the liver cancer prognosis marker comprises TXNRD1, HAO1, C1S, OPTN and SNX6. The application provides a group of liver cancer prognosis markers, and a liver cancer patient prognosis prediction model is constructed based on the liver cancer prognosis markers. The liver cancer prognosis markers are obtained through comprehensive gene expression levels, single cell analysis and spatial transcriptome analysis, and the diversity of the data set and the large sample quantity improve the reliability and universality of the liver cancer prognosis markers and the liver cancer patient prognosis prediction model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of bioinformatics technology, specifically relating to a prognostic biomarker for liver cancer and its application. Background Technology

[0002] In recent years, immunotherapy, especially immune checkpoint inhibitors such as PD-1 / PD-L1 inhibitors, has shown great potential in the treatment of various tumors, bringing new hope to the treatment of liver cancer. However, the liver cancer microenvironment has unique immunosuppressive properties, resulting in low response rates and limited clinical benefits from immunotherapy. Therefore, a deeper understanding of the characteristics of the liver cancer immune microenvironment and the discovery of biomarkers that can predict the efficacy of immunotherapy are of great significance for improving the level of immunotherapy for liver cancer.

[0003] The development of multi-omics technologies, including genomics, transcriptomics, and proteomics, has provided new avenues for elucidating the tumor immune microenvironment. New technologies such as single-cell sequencing and spatial transcriptome sequencing can be used to study tumor heterogeneity and immune cell infiltration at both the single-cell level and spatial dimensions, potentially leading to the discovery of new immunotherapy-related biomarkers. Integrating multi-omics data analysis can also reveal regulatory relationships between different omics, deepening our understanding of the immune regulatory mechanisms in liver cancer. Furthermore, the combination of omics big data and machine learning methods provides new insights for constructing efficient and accurate tumor diagnostic and prognostic prediction models. Diagnostic and prognostic models based on key molecular biomarkers hold the promise of enabling patient stratification and personalized treatment, improving the precision of liver cancer immunotherapy. Summary of the Invention

[0004] The present invention aims to solve at least one of the technical problems existing in the prior art.

[0005] Therefore, this invention proposes a prognostic biomarker for liver cancer, which can be used to construct a prognostic prediction model for liver cancer patients and further used to predict the efficacy of liver cancer immunotherapy, thereby promoting the precision of liver cancer immunotherapy.

[0006] This invention also proposes a method for constructing a prognostic prediction model for liver cancer patients.

[0007] This invention also proposes a prognostic prediction system for liver cancer patients.

[0008] The present invention also proposes an electronic device.

[0009] The present invention also proposes a storage medium.

[0010] According to a first aspect of the present invention, a prognostic biomarker for liver cancer is proposed, said liver cancer biomarker including TXNRD1, HAO1, C1S, OPTN and SNX6.

[0011] According to a second aspect of the present invention, a method for constructing a prognostic prediction model for liver cancer patients is proposed, the method comprising the following steps:

[0012] S1: Obtain gene expression datasets from liver cancer patients who have received immunotherapy, and screen for differentially expressed genes between patients who respond to and do not respond to immunotherapy;

[0013] S2: Obtain gene expression datasets of liver cancer patients who responded to immunotherapy before and after treatment, and screen out differentially expressed genes between cancer tissues before and after immunotherapy.

[0014] S3: Compare the differentially expressed genes obtained in steps S1 and S2 to obtain genes that are differentially expressed in both gene expression datasets, i.e. immune response-related genes.

[0015] S4: Obtain the gene expression dataset of liver cancer patient tissues, and perform univariate and multivariate logistic regression analysis on the immune response-related genes obtained in step S3 in this dataset to identify differentially expressed genes related to the prognosis of liver cancer patients and establish a risk model.

[0016] The differentially expressed genes associated with liver cancer include TXNRD1, HAO1, C1S, OPTN, and SNX6.

[0017] In some embodiments of the present invention, the gene expression dataset of liver cancer patient tissue after receiving immunotherapy in step S1 includes a spatial transcriptome dataset.

[0018] In some embodiments of the present invention, the immunotherapy described in steps S1 and S2 is immune checkpoint inhibitor therapy.

[0019] In some embodiments of the present invention, the immune checkpoint inhibitor treatment may be selected from one or more of cabozantinib, nivolumab, pembrolizumab, tesimumab, and durvalumab.

[0020] In some embodiments of the present invention, the risk model is a logistic regression model.

[0021] In some embodiments of the present invention, the calculation formula of the risk model is: Risk score = 0.0146 × TXNRD1 expression level - 0.0014 × C1S expression level + 0.0114 × OPTN expression level + 0.0304 × SNX6 expression level - 0.0019 × HAO1 expression level.

[0022] In some embodiments of the present invention, the expression levels of TXNRD1, HAO1, C1S, OPTN, and SNX6 are all values ​​obtained by globally scaling and normalizing the gene expression levels using a function.

[0023] In some embodiments of the present invention, the global scaling normalization employs the NormalizeData function.

[0024] In some embodiments of the present invention, a risk score greater than or equal to 0 is determined to be high risk, and a risk score less than 0 is determined to be low risk.

[0025] According to a third aspect of the present invention, a prognostic prediction system for liver cancer patients is provided, the liver cancer patient prognostic prediction system comprising:

[0026] Input unit: Used to input the data to be evaluated;

[0027] Analysis unit: The prognostic prediction model for liver cancer patients constructed by the construction method described in the second aspect of the present invention is used to analyze the data to be evaluated in the input unit;

[0028] Assessment unit: Used to display the risk score of the prognostic prediction model for liver cancer patients;

[0029] A risk score greater than or equal to 0 is considered high risk, and a risk score less than 0 is considered low risk.

[0030] According to a fourth aspect of the present invention, an electronic device is provided, the electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor; wherein the processor executes the computer program to implement the construction method as described in the second aspect of the present invention.

[0031] According to a fifth aspect of the invention, a storage medium is provided that stores processor-executable instructions, which, when executed by a processor, are used to perform the construction method described in the second aspect of the invention.

[0032] The present invention has at least the following beneficial effects:

[0033] This invention provides a set of prognostic biomarkers for liver cancer and constructs a prognostic prediction model for liver cancer patients based on these biomarkers. The above-mentioned prognostic biomarkers for liver cancer are obtained by combining gene expression levels, single-cell analysis and spatial transcriptome analysis. The diversity of the dataset and the large sample size improve the reliability and universality of the above-mentioned prognostic biomarkers for liver cancer and the prognostic prediction model for liver cancer patients.

[0034] The liver cancer patient prognosis prediction system provided by this invention has shown the potential to predict survival risk in different populations in an independently validated dataset, exhibiting good stability and universality, and providing valuable reference for clinical decision-making and personalized treatment.

[0035] The prognostic biomarkers and prognostic prediction models for liver cancer provided by this invention not only deepen our understanding of the mechanisms of liver cancer immunotherapy, but also offer new insights for screening suitable patients for immunotherapy and predicting treatment outcomes. Furthermore, combining the prognostic biomarkers provided by this invention with existing biomarkers and clinical indicators holds promise for developing more precise diagnostic and prognostic assessment tools, thereby promoting the precision of liver cancer immunotherapy.

[0036] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description

[0037] The present invention will be further described below with reference to the accompanying drawings and embodiments, wherein:

[0038] Figure 1 This is a schematic diagram showing the distribution of various cell types in the spatial transcriptome data of Embodiment 1 of the present invention;

[0039] Figure 2 The above figures show the results of differentially expressed gene analysis between the immunotherapy response group and the non-response group in the spatial transcriptome data of Example 1 of the present invention; Figure A shows the results of the Harmony algorithm merging malignant cells from 7 sample sources, Figure B shows the heatmap of the expression patterns of differentially expressed genes in the two sample groups, Figure C shows the GO gene enrichment results of upregulated genes in the response group, and Figure D shows the GO gene enrichment results of downregulated genes in the response group.

[0040] Figure 3 Figure 1 shows the results of differentially expressed gene analysis between samples before and after immunotherapy in single-cell transcriptome data in Example 1 of this invention; Figure A shows the single-cell annotation results of H68a (before treatment) and H68b (after treatment) samples, Figure B shows the GO gene enrichment results of upregulated genes in the post-treatment samples, and Figure C shows the GO gene enrichment results of downregulated genes in the post-treatment samples.

[0041] Figure 4 Figure A shows the results of differentially expressed gene analysis between pre-immunotherapy and post-immunotherapy samples in single-cell transcriptome data in Example 1 of this invention; where Figure A is a heatmap of the expression patterns of differentially expressed genes in the two groups of samples, and Figure B shows the expression levels of the two genes MT1E and SPP1 in the pre- and post-treatment samples.

[0042] Figure 5This is a graph showing the differential expression results of MT1E in spatial transcriptome data in Example 1 of the present invention;

[0043] Figure 6 This is a graph showing the differential expression results of SPP1 in spatial transcriptome data in Example 1 of the present invention;

[0044] Figure 7 The figures show the PCA dimensionality reduction analysis results of the four datasets used in Embodiments 1 and 2 of this invention; where A is GSE14520, B is GSE36376, C is GSE25097, and D is PXD006512.

[0045] Figure 8 The following figures show the results of screening prognostic biomarkers for liver cancer using LASSO regression in Example 1 of the present invention; Figure A shows the LASSO regression coefficients of the GSE14520 dataset as a function of the penalty parameter; Figure B shows the results of cross-validation to select the optimal model; Figure C shows the top 12 stable prognostic-related genes in 100 repeated modeling iterations.

[0046] Figure 9 This is a graph showing the correlation analysis results between TCGA prognostic data and the expression levels of hepatocellular carcinoma prognostic markers in Example 2 of the present invention;

[0047] Figure 10 This is a graph showing the correlation analysis results between TCGA prognostic data and the expression levels of hepatocellular carcinoma prognostic markers in Example 2 of the present invention;

[0048] Figure 11 Figure A shows the correlation analysis results of liver cancer prognostic markers and immune microenvironment scores in Example 2 of the present invention; Figure A is a heatmap showing the correlation between liver cancer prognostic markers and stromal score, immune score and ESTIMATE score, and Figure B is a heatmap showing the correlation between liver cancer prognostic markers and scores related to 24 types of immunotherapy responses.

[0049] Figure 12 The image shows the Kaplan-Meier survival curves used in Embodiment 4 of this invention to validate the prognostic prediction model for liver cancer patients using four datasets; where A is TCGA data, B is ICGC data, C is GSE14520 dataset, and D is GSE76427.

[0050] Figure 13The figure shows the correlation analysis results between the prognostic prediction model for liver cancer patients and TCGA clinical data in Example 4 of this invention; where A represents patient survival, B represents primary tumor stage, C represents lymph node involvement stage, D represents distant metastasis stage, E represents stage stage, F represents grade level, G represents whether the patient is a carrier of hepatitis B or hepatitis C, and H represents patient age.

[0051] Figure 14 The calibration curves are plotted based on three datasets in Embodiment 4 of the present invention; wherein, plot A is TCGA data, plot B is ICGC data, and plot C is GSE14520 dataset;

[0052] Figure 15 Here are the ROC curves for 1-year, 3-year, and 5-year survival times plotted based on the dataset GSE76427 in four embodiments of the present invention.

[0053] Figure 16 This is a statistical result graph showing the analysis of the expression levels of five liver cancer prognostic biomarkers and the risk scores predicted by the model in Example 4 of the present invention; wherein, Figure A is TCGA data, Figure B is ICGC data, and Figure C is GSE14520 dataset;

[0054] Figure 17 This is a graph showing the results of verifying the risk score of the liver cancer prognosis prediction model based on self-collected clinical liver cancer samples in Embodiment 4 of the present invention. Detailed Implementation

[0055] The following will describe the concept and technical effects of the present invention clearly and completely with reference to embodiments, so as to fully understand the purpose, features and effects of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are all within the scope of protection of the present invention.

[0056] In the description of this invention, "several" means one or more, "multiple" means two or more, "greater than," "less than," and "exceeding" are understood to exclude the stated number, while "above," "below," and "within" are understood to include the stated number. The use of "first" and "second" in the description is merely for distinguishing technical features and should not be construed as indicating or implying relative importance, or implicitly indicating the number of indicated technical features, or implicitly indicating the order of the indicated technical features.

[0057] The following will describe the concept and technical effects of the present invention clearly and completely with reference to embodiments, so as to fully understand the purpose, features and effects of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are all within the scope of protection of the present invention.

[0058] When a numerical range is disclosed herein, the range is considered continuous and includes the minimum and maximum values ​​of the range, as well as every value between the minimum and maximum values. Furthermore, when the range refers to integers, it includes every integer between the minimum and maximum values ​​of the range. Additionally, when multiple ranges are provided to describe a feature or characteristic, the ranges may be combined. In other words, unless otherwise specified, all ranges disclosed herein should be understood to include any and all subranges to which they are incorporated.

[0059] Unless otherwise specified, the reagents, methods and equipment used in this invention are all conventional reagents, methods and equipment in this technical field.

[0060] Example 1: Screening of prognostic biomarkers for liver cancer

[0061] This embodiment integrates and screens spatial transcriptome datasets, single-cell transcriptome datasets, and gene expression datasets to obtain a set of prognostic biomarkers for liver cancer. The dataset information used is shown in Table 1.

[0062] Table 1. Dataset information used for screening prognostic biomarkers for liver cancer.

[0063]

[0064] The specific filtering method is as follows:

[0065] 1. Identifying gene sets for immunotherapy responses based on spatial transcriptome data:

[0066] 1) Preprocessing of spatial transcriptome data:

[0067] To eliminate the influence of other non-malignant cells on the results, the spatial transcriptome spots were first screened, retaining only those annotated as malignant tumor cells. Immune infiltration regions within the tumor microenvironment were then identified for subsequent screening of differentially expressed genes. The screening process was as follows: Figure 1 As shown; among them, HCC01-HCC04 are samples that responded to immunotherapy, and HCC05-HCC07 are samples that did not respond to immunotherapy.

[0068] 2) Analysis of spatial transcriptome data:

[0069] The GSE238264 data were normalized using the SCTransform method of the R package 'Seurat'. This method not only standardizes the data but also detects highly variable genes and stores the results in a specific assay (spatial). The Harmony method was used to integrate the SCT data from seven patient samples. This method effectively corrects for technical differences between samples, ensuring data comparability. Using the Harmony algorithm, data from four immunotherapy-responsive samples and three non-responsive samples were unified into the same reference space (e.g., ...). Figure 2 (as shown in A);

[0070] Building upon this, 566 genes that showed significant differences in expression between patients who responded to and did not respond to immunotherapy were further identified, forming a gene set based on spatial transcriptomics of immunotherapy response (e.g., ...). Figure 2 (As shown in B); Of the genes in response to the above immunotherapy, 534 (94.34%) were upregulated in the response group, while only 32 (5.65%) were downregulated.

[0071] Functional enrichment analysis was performed on the 566 genes that responded to immunotherapy, and the results are as follows: Figure 2 C and Figure 2 As shown in D;

[0072] Depend on Figure 2 C indicates that the upregulated gene is enriched in biological processes such as leukocyte-mediated immune responses, cell adhesion, and T cell activation; Figure 2 As shown in D, the downregulated genes are significantly enriched in pathways such as humoral immune response and complement activation.

[0073] 2. Identifying gene sets for immunotherapy responses based on single-cell transcriptome data:

[0074] 1) Preprocessing of single-cell transcriptome data:

[0075] The 10X Genomics format data was processed using the R package 'Seurat'. After data import, quality control was performed, including detecting the UMI count, gene count, and proportion of mitochondrial genes for each cell, in order to identify and remove low-quality cells.

[0076] Then, the NormalizeData function was used to standardize the data, and the gene expression level of each cell was globally scaled and normalized. Identifying highly variable genes (HVGs) using the 'FindVariableFeatures' method is also an important step in single-cell preprocessing, which helps to highlight biological signals in subsequent analyses.

[0077] Finally, use tSNE or UMAP methods to perform linear or nonlinear dimensionality reduction on the data in order to visualize the results.

[0078] 2) Analysis of single-cell transcriptome data:

[0079] Since H68 patients in the single-cell dataset GSE151530 responded to immunotherapy, only two samples from H68 patients were retained, one from before treatment (H68a) and the other from after treatment (H68b). The results of the dimensionality reduction analysis are as follows: Figure 3 As shown in A;

[0080] By comparing and analyzing H68a and H68b, 593 significantly differentially expressed genes were identified, which is the gene set based on the immunotherapy response of single-cell transcriptome; among them, 406 genes were upregulated after treatment and 187 genes were downregulated after treatment.

[0081] GO functional enrichment analysis showed that the upregulated genes were mainly enriched in biological processes of metabolism such as terpenoids, fatty acids, and alcohols (e.g., Figure 3 As shown in B), the downregulated genes are mainly enriched in biological processes such as cytoplasmic translation, ribosome biogenesis, and ATP synthesis (e.g., Figure 3 (as shown in C);

[0082] Comparing the gene set of immunotherapy responses identified in single-cell data with the gene set of immunotherapy responses identified in spatial transcriptome data revealed that 51 genes showed significant differential expression in both datasets (e.g., ...). Figure 4 (As shown in A). These common differentially expressed genes include SPP1 and MT1E, etc. The differential expression of SPP1 and MT1E was validated in spatial transcriptome data and single-cell transcriptome data, and the results are shown in Figure A. Figure 4 B Figure 5 and Figure 6 As shown in the figure, MT1E was the most significantly differentially expressed gene in the spatial transcriptome, while SPP1 was significantly upregulated in malignant cells from the four responsive samples in the spatial transcriptome.

[0083] 3. Screening for prognostic biomarkers for liver cancer based on gene sets derived from immunotherapy responses:

[0084] The GSE14520 dataset involved in this embodiment contains 247 cancer tissue samples and 239 adjacent normal samples;

[0085] This embodiment uses the PCA method to demonstrate the structure of the above dataset (e.g. Figure 7 As shown), from Figure 7The results show a clear separation between cancerous tissue and adjacent normal samples, indicating that the dataset is highly representative.

[0086] 1) Data preprocessing: Remove samples with benign liver diseases (such as cirrhosis) and retain only cancer tissue samples.

[0087] 2) LASSO regression analysis to screen for prognostic biomarkers for liver cancer:

[0088] In order to screen suitable genes from the set of immune response-related genes as prognostic biomarkers for liver cancer, this embodiment uses the LASSO algorithm for feature selection.

[0089] This embodiment uses the R software package glmnet to perform LASSO analysis. Based on the GSE14520 dataset, the gene set of the aforementioned 51 immunotherapy responses was screened. The family parameter in the glmnet method was set to "cox" to meet the needs of survival analysis. In each construction of the Cox regression model, after selecting the optimal λ value through cross-validation, a set of key genes (coefficients not equal to 0) related to prognosis were obtained. To ensure the robustness of the selected genes, the above process was repeated 100 times, and then the frequency of gene occurrence in the 100 models was counted, selecting the gene with the highest frequency. The LASSO regression coefficients change with the penalty parameter as follows: Figure 8 As shown in A; the results of cross-validation to select the optimal model are as follows. Figure 8 As shown in B; the first 12 genes that consistently appeared in 100 repeated modeling iterations are as follows: Figure 8 As shown in C, this process successfully identified five key genes—TXNRD1, HAO1, C1S, OPTN, and SNX6—that are common to spatial transcriptome data GSE238264 and single-cell transcriptome data GSE151530 and play an important role in predicting the prognosis of HCC patients. These are the liver cancer prognostic biomarkers provided by this invention.

[0090] Example 2: Validation of prognostic biomarkers for liver cancer

[0091] This embodiment uses prognostic data from the TCGA and ICGC databases to validate the hepatocellular carcinoma prognostic biomarkers provided in Example 1. The data used are shown in Table 2.

[0092] Table 2. Dataset information used for validation of hepatocellular carcinoma prognostic biomarkers.

[0093]

[0094] 1. Validate hepatocellular carcinoma prognostic biomarkers in TCGA data:

[0095] The TCGA HCC dataset provides comprehensive clinical characteristics of patients, aiding in the study of the relationship between genes and clinical features. This example analyzes the relationship between the expression levels of hepatocellular carcinoma prognostic markers provided in Example 1 and HCC patient status (dead or alive), T / N / M stage, and pathological stage based on TCGA data. The results are as follows: Figure 9 and Figure 10 As shown.

[0096] Depend on Figure 9 It can be seen that although some analytical results were not statistically significant (p>0.05), TXNRD1, OPTN, and SNX6 were positively correlated with survival status, T / N / M stage, and pathological stage; while HAO1 and C1S were negatively correlated with survival status, T / N / M stage, and pathological stage. These results directly or indirectly demonstrate the usability and accuracy of the hepatocellular carcinoma prognostic markers provided in Example 1.

[0097] Depend on Figure 10 It can be seen that although these 5 genes are not statistically significant with some clinical features, they show a trend of correlation. Specifically, the expression levels of TXNRD1, OPTN, and SNX6 show a positive correlation with pathological stage, while the expression levels of OPTN and SNX6 show a negative correlation with pathological stage. The above results directly or indirectly demonstrate the usability and accuracy of the hepatocellular carcinoma prognostic markers provided in Example 1.

[0098] 2. Association analysis between prognostic markers of liver cancer and the immune microenvironment:

[0099] The ESTIMATE algorithm can provide researchers with scores for tumor purity, stromal cell levels, and immune cell infiltration levels in tumor tissue based on expression data. This example analyzes the correlation between TXNRD1, HAO1, C1S, OPTN, and SNX6 and the stromal score, immune score, and ESTIMATE score. The results are as follows: Figure 11 As shown in Figure A.

[0100] Depend on Figure 11 As shown in Figure A, TXNRD1, SNY6, and OPTN were significantly positively correlated with stromal score, immune score, and ESTIMATE score, while HAO1 and C1S were negatively correlated. These results suggest that these genes may play an important role in regulating the infiltration of stromal cells and immune cells in the tumor microenvironment.

[0101] This embodiment then evaluated the relationship between TXNRD1, HAO1, C1S, OPTN, and SNX6 and 24 immunotherapy response-related signatures (data on the 24 immunotherapy response-related signatures were obtained from the reference Thorsson, Vésteinn et al. “The Immune Landscape of Cancer.” Immunity vol.48,4(2018):812-830.e14.doi:10.1016 / j.immuni.2018.03.023), and the results are as follows. Figure 11 As shown in B.

[0102] Depend on Figure 11 As shown in Figure B, the expression of TXNRD1, OPTN, and SNX6 was significantly positively correlated with these signature scores, while HAO1 and C1S were significantly negatively correlated. These results reveal that these genes may play an important role in regulating the tumor immune microenvironment and influencing the efficacy of immunotherapy.

[0103] Example 3: Establishment of a prognostic prediction model for liver cancer

[0104] This embodiment establishes a liver cancer prognostic prediction model based on the liver cancer prognostic biomarkers and dataset GSE14520 provided in Embodiment 1. The specific method is as follows:

[0105] This embodiment constructs a Cox proportional hazards regression model based on the hepatocellular carcinoma prognostic biomarkers screened in Example 1. The R package glmnet is used to perform LASSO analysis, and the cv.glmnet function is used for cross-validation. This function automatically performs LASSO regression and fits multiple models on a series of regularization parameters λ. Cross-validation evaluates the model performance under different λ values, and the λ that minimizes the mean squared error (MSE) is selected as the optimal regularization parameter. Finally, after determining the optimal λ value, the glmnet function is used again, this time specifying the optimal λ value obtained from cross-validation, to obtain the final LASSO regression model. The risk score calculation formula for the obtained hepatocellular carcinoma prognostic prediction model is as follows:

[0106] HCC prognostic risk score = 0.0146 × TXNRD1 expression level - 0.0014 × C1S expression level + 0.0114 × OPTN expression level + 0.0304 × SNX6 expression level - 0.0019 × HAO1 expression level.

[0107] A risk score greater than or equal to 0 is considered high risk, indicating a poor prognosis; a risk score less than 0 is considered low risk, indicating a good prognosis. The expression levels of the above five genes are all data that have been globally normalized using the NormalizeData function.

[0108] Example 4: Validation of the Liver Cancer Prognostic Prediction Model

[0109] Based on the liver cancer prognosis prediction model provided in Example 3, this example uses the dataset GSE14520 and clinical data to validate it. The information of the data used is shown in Table 3.

[0110] Table 3. Data used for validating the liver cancer risk assessment model.

[0111]

[0112] To evaluate the predictive performance of the model, this embodiment calculated the C-index and plotted calibration curves. Furthermore, patients were divided into high-risk and low-risk groups, and the survival of the two groups was compared using Kaplan-Meier survival curves and the log-rank test. The results are as follows: Figure 12 As shown.

[0113] Depend on Figure 12 As can be seen, the C-index of the liver cancer prognostic prediction model provided in Example 3 is 0.68 on the TCGA dataset [9% CI: 0.63-0.73, p<0.001], indicating good predictive ability; when patients are divided into high-risk and low-risk groups, Kaplan-Meier survival analysis shows that there is a significant difference in the survival curves between the two groups (p<0.001); furthermore, this example was independently validated on the ICGC dataset, and the C-index of the model on the ICGC dataset is 0.70 [95% CI: 0.62-0.73, p<0.001]. [78, p < 0.001], and the survival curves of patients in the high-risk and low-risk groups also showed significant differences (p = 0.0043); this indicates that the prognostic prediction model constructed in this invention has good generalization ability; the C-index of this prognostic prediction model on GSE14520 is 0.62 [95% CI: 0.56-0.68, p < 0.001], and the survival curves of patients in the high-risk and low-risk groups also showed significant differences (p = 0.047); the survival curves of this prognostic prediction model on GSE76427 also showed significant differences (p = 0.013).

[0114] Figure 12The results further confirm that the hepatocellular carcinoma (HCC) patient prognostic prediction model constructed in Example 3 has good stability and universality. Through validation on three datasets, the HCC patient prognostic prediction model provided by this invention demonstrates its potential to predict the survival risk of HCC patients in different populations, providing valuable reference for clinical decision-making and individualized treatment. This example also verifies the practical clinical significance of this diagnostic model by analyzing the risk scores calculated based on the HCC patient prognostic prediction model with the clinical characteristics of different patients from the TCGA database. The results are as follows: Figure 13 As shown. By Figure 13 It can be seen that the risk score calculated by the prognostic prediction model for liver cancer patients is significantly higher in patients with dead HCC (dead group), and the prognostic risk score is also positively correlated with T stage, AJCC stage and grade.

[0115] To further evaluate the model's predictive accuracy, a calibration curve was plotted in this embodiment, and the results are as follows: Figure 14 As shown. By Figure 14 It can be seen that in the three datasets TCGA, ICGC and GSE14520, the predicted 1-year survival rate is in good agreement with the actual observed survival rate, and the agreement between 2-year and 3-year survival rates also shows a good trend, indicating that the prognostic model provided by this invention has high accuracy in short-term prediction.

[0116] This embodiment also analyzes the AUC values ​​of the liver cancer prognostic prediction model at 1 year, 3 years, and 5 years based on the dataset GSE76427, and the results are as follows: Figure 15 As shown. By Figure 15 It can be seen that the liver cancer prognosis prediction model provided by this invention has an AUC greater than 0.5 at 1 year, 3 years and 5 years, showing excellent classification performance.

[0117] This embodiment also analyzed the expression differences of five hepatocellular carcinoma prognostic markers in high- and low-risk groups, and the results are as follows: Figure 16 As shown. By Figure 16 It was found that TXNRD1, OPTN, and SNX6 were significantly upregulated in the high-risk group (SNX6 gene expression levels were missing on the GSE14520 dataset), while HAO1 and C1S were significantly downregulated in the high-risk group. These differences in expression patterns may reflect the different roles of these genes in HCC progression, providing clues for further understanding the molecular mechanisms of liver cancer.

[0118] Finally, this embodiment also collected surgical samples from 80 liver cancer patients at Shenzhen People's Hospital for experimental verification. Total RNA was extracted from the liver cancer samples and subsequently subjected to reverse transcription and qPCR detection. The model was validated based on the expression levels of TXNRD1, HAO1, C1S, OPTN, and SNX6 and the calculation formula of the liver cancer prognostic assessment model provided in Example 3. The results are as follows: Figure 17 As shown; by Figure 17 It can be seen that the liver cancer prognostic assessment model constructed in this invention has a good assessment effect on liver cancer (high-risk group indicates poor prognosis, P=0.0081); the combination of physical experiments and big data analysis further verifies the reliability of this prediction model and has certain clinical value.

[0119] Example 5: A Liver Cancer Risk Assessment System

[0120] This embodiment provides a prognostic prediction system for liver cancer patients, specifically including the following four units:

[0121] Input unit: Used to input the data to be evaluated;

[0122] Analysis Unit: Based on the liver cancer patient prognosis prediction model established in Example 3, analyzes the data to be evaluated in the input unit;

[0123] Assessment unit: Used to display the risk score of the prognostic prediction model for liver cancer patients;

[0124] A risk score greater than or equal to 0 is considered high risk, and a risk score less than 0 is considered low risk.

[0125] Example 6: An electronic device

[0126] This embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor;

[0127] When the processor executes the computer program, it implements the method for establishing a prognostic prediction model for liver cancer patients provided in Embodiment 3.

[0128] Example 7: A storage medium

[0129] This embodiment provides a storage medium storing processor-executable instructions, which, when executed by the processor, are used to perform the method for establishing a prognostic prediction model for liver cancer patients provided in Embodiment 3.

[0130] In summary, this invention focuses on multi-omics data related to immunotherapy, discovers a number of new immunotherapy-related genes, and constructs a prognostic prediction model for liver cancer patients. These findings deepen our understanding of the response mechanism of liver cancer immunotherapy, provide new ideas for screening patients suitable for immunotherapy and predicting efficacy, and are expected to promote the precision of liver cancer immunotherapy.

[0131] The embodiments of the present invention have been described in detail above with reference to the accompanying drawings. However, the present invention is not limited to the above embodiments, and various changes can be made within the scope of knowledge possessed by those skilled in the art without departing from the spirit of the present invention. Furthermore, the embodiments of the present invention and the features thereof can be combined with each other unless otherwise specified.

Claims

1. A prognostic prediction system for liver cancer patients, characterized in that, The prognostic prediction system for liver cancer patients includes: Input unit: Used to input the data to be evaluated; Analysis Unit: Constructs a prognostic prediction model for liver cancer patients to analyze the data to be evaluated in the input unit. This model is based on the expression levels of liver cancer biomarkers and calculates a risk score using a risk model formula. The liver cancer biomarkers consist of TXNRD1, HAO1, C1S, OPTN, and SNX6. The risk model calculation formula is: Risk Score = 0.0146 × TXNRD1 expression level - 0.0014 × C1S expression level + 0.0114 × OPTN expression level + 0.0304 × SNX6 expression level - 0.0019 × HAO1 expression level. The TXNRD1, HAO1, C1S, OPTN, and SNX6 expression levels are all values ​​obtained by globally scaling and normalizing the gene expression levels using a function; the global scaling and normalization uses the NormalizeData function. Assessment unit: Used to display the risk score of the prognostic prediction model for liver cancer patients; A risk score greater than or equal to 0 is considered high risk, and a risk score less than 0 is considered low risk.