A transcriptomic marker combination and methods and systems for glioma prognosis risk stratification

By constructing a prognostic risk stratification method for gliomas based on the LASSO-Cox risk model, biomarkers such as adenylate cyclase-associated protein 1 were screened out, solving the problem of blindness in the prognostic assessment of gliomas in existing technologies and achieving precise risk stratification and personalized treatment guidance.

CN122201773APending Publication Date: 2026-06-12YUANTONG HUIZE (SHAANXI) BIOTECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
YUANTONG HUIZE (SHAANXI) BIOTECHNOLOGY CO LTD
Filing Date
2026-03-12
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Current technologies lack objective molecular markers to guide the prognostic assessment of gliomas, leading to a high degree of blindness in the selection of treatment options. Existing prognostic models have insufficient generalization ability and are difficult to be stably applied in different populations.

Method used

A method for prognostic risk stratification of gliomas was developed, including data acquisition, preprocessing, multi-module feature screening, and prognostic model construction. The feature screening submodule was constructed using LASSO regression to screen out biomarkers such as adenylate cyclase-associated protein 1, dax adhesin-associated protein 2, and serum deprivation-reactive protein. The LASSO-Cox risk model was constructed, and the risk score was used to stratify patient risk.

🎯Benefits of technology

It enables precise risk stratification of glioma patients, effectively distinguishing between high- and low-risk patients, predicting patient prognosis, and is applicable to patients of different pathological grades and ethnicities, reducing the medical burden and guiding personalized treatment.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122201773A_ABST
    Figure CN122201773A_ABST
Patent Text Reader

Abstract

The application belongs to the technical field of tumor molecular biology and medical detection, and discloses a transcriptomics marker combination, method and system for glioma prognosis risk stratification. Transcriptomics data and clinical information of glioma patients are acquired first, low-expression genes are eliminated, and quantile normalization is used to eliminate batch effects. Then, a feature screening submodule is constructed using LASSO regression, the optimal parameters are determined through 10-fold cross-validation, a marker combination significantly related to the overall survival of glioma patients is screened, a LASSO-Cox risk model is constructed according to the marker combination and the clinical information, a risk score is obtained, and the glioma patients are divided into high-risk and low-risk groups with the median of the risk score as the threshold. The risk stratification of the application is accurate, can effectively distinguish between high-risk and low-risk glioma patients, and can predict the prognosis of patients.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of tumor molecular biology and medical testing technology, specifically relating to a combination of transcriptomic biomarkers, methods and systems for prognostic risk stratification of glioma. Background Technology

[0002] Currently, the prognostic heterogeneity of glioma patients is significant; even patients with the same pathological grade can have varying overall survival rates ranging from months to years. Current clinical prognostic assessments primarily rely on pathological grade and clinical characteristics, lacking objective molecular biomarkers for guidance, leading to considerable uncertainty in treatment selection. Existing prognostic models are mostly based on single algorithms or small sample data, resulting in insufficient generalization ability and difficulty in stable application across different populations. Transcriptomics data can reflect tumor molecular characteristics, and combined with machine learning, it can uncover prognostic biomarkers; however, single algorithms are easily affected by data distribution, resulting in insufficient specificity of the selected biomarkers. Therefore, constructing a new glioma prognostic system is of great significance for achieving precise risk stratification of gliomas and guiding personalized treatment. Summary of the Invention

[0003] The purpose of this application is to address the problems of the prior art by providing a combination of transcriptomic biomarkers, methods, and systems for prognostic risk stratification of gliomas.

[0004] To address the technical problem, the technical solution of this application is: a method for prognostic risk stratification of gliomas, comprising the following steps: Step 1: Data Acquisition; Obtain transcriptomic data and clinical information from glioma patients, divided into training set and external validation set; Step 2: Data preprocessing; The transcriptomics data were transformed by log2 (x+0.001), samples with missing values ​​>20% were removed, and quantile normalization was used to eliminate batch effects, resulting in the preprocessed training set and external validation set; Step 3: Multi-module feature screening; Based on the preprocessed training set, a feature screening sub-module is constructed using LASSO regression. The optimal parameters are determined through 10-fold cross-validation, and combinations of biomarkers that are significantly associated with the overall survival of glioma patients are screened. Step 4: Prognostic model construction; Based on the combination of biomarkers and clinical information, a LASSO-Cox risk model is constructed to obtain a risk score; Step 5: Risk stratification; Glioma patients are divided into high-risk and low-risk groups based on the median risk score, and external validation is performed using a pre-processed external validation set.

[0005] Preferably, step 1 specifically involves: downloading transcriptomic RNA-seq data and clinical information of glioma patients from the TCGA database, which are divided into the TCGA-LGG training set and the TCGA-GBM external validation set. The clinical information includes age, survival time, and survival status. At the same time, two independent cohorts, CGGA mRNAseq_693 and CGGA mRNAseq_325, are also obtained from the CGGA database as external validation sets.

[0006] Preferably, the biomarker combination in step 3 is adenylate cyclase-associated protein 1, dachshund-associated protein 2, and serum deprivation response protein.

[0007] Preferably, the risk model in step 4 is: RiskScore = 6.56 × 10 -5 ×CAP1 + 7.03×10 -5 ×DCHS2 – 1.40×10 -4 ×SDPR +0.0574×AGE; In the formula: CAP1 is adenylate cyclase-associated protein 1, DCHS2 is dax adhesion protein-associated protein 2, SDPR is serum deprivation response protein, and AGE is age information in clinical information.

[0008] Preferably, a combination of transcriptomic biomarkers for glioma prognostic risk stratification is obtained by screening using the method for glioma prognostic risk stratification, wherein the biomarker combination includes adenylate cyclase-associated protein 1, dachsky adhesion protein-associated protein 2, and serum deprivation-response protein.

[0009] Preferably, a system for prognostic risk stratification of gliomas, used to implement the method for prognostic risk stratification of gliomas, includes a data acquisition module, a data preprocessing module, a multi-module feature screening module, a prognostic model construction module, and a risk stratification module; The data acquisition module is used to acquire transcriptomic data and clinical information of glioma patients, and is divided into a training set and an external validation set. The data preprocessing module is used to transform the transcriptomics data by log2 (x+0.001), remove samples with missing values ​​>20%, and use quantile normalization to eliminate batch effects, so as to obtain the preprocessed training set and external validation set. The multi-module feature screening module is used to construct a feature screening sub-module based on the preprocessed training set using LASSO regression, determine the optimal parameters through 10-fold cross-validation, and screen for combinations of biomarkers that are significantly associated with the overall survival of glioma patients. The prognostic model construction module is used to construct a LASSO-Cox risk model based on biomarker combinations and clinical information to obtain a risk score. The risk stratification module is used to classify glioma patients into high-risk and low-risk groups based on the median risk score as a threshold, and to perform external validation using a preprocessed external validation set.

[0010] Compared with the prior art, the advantages of this application are: (1) This application provides a method for risk stratification for glioma prognosis. First, transcriptomic data and clinical information of glioma patients are obtained, low-expression genes are removed and batch effect is eliminated by quantile normalization. Then, LASSO regression is used to construct a feature screening submodule. The optimal parameters are determined by 10-fold cross-validation. A combination of biomarkers that are significantly related to the overall survival of glioma patients is screened. A LASSO-Cox risk model is constructed based on the biomarker combination and clinical information to obtain a risk score. The median of the risk score is used as the threshold to divide glioma patients into high-risk and low-risk patients. The risk stratification of this application is accurate and can effectively distinguish between high / low-risk glioma patients and predict the prognosis of patients. (2) This application downloads transcriptomic RNA-seq data and clinical information of glioma patients from the TCGA database, which are divided into TCGA-LGG training set and TCGA-GBM external validation set. The clinical information includes age, survival time and survival status. At the same time, two independent cohorts, CGGA mRNAseq_693 and CGGA mRNAseq_325, obtained from the CGGA database are also used as external validation sets, which are compatible with public database data and clinical sample detection data. (3) The training set of this application is based on transcriptomic data of glioma patients in the United States and has been validated by two independent cohorts of glioma genome maps from the Chinese population. It can cover different pathological grades and ethnicities, ensuring the reliability of clinical application. (4) This application clearly provides a combination of transcriptomic biomarkers for prognostic risk stratification of gliomas and provides a risk model based on the biomarker combination, which can be calculated by conventional software and is convenient for clinicians to use quickly. Attached Figure Description

[0011] Figure 1 This is a flowchart of a method for prognostic risk stratification of glioma according to this application; Figure 2 This is a graph showing the coefficient distribution of the LASSO-Cox risk model used to screen characteristic genes in Example 2 of this application, which is based on the TCGA-LGG cohort. Figure 3 The Kaplan-Meier survival curves for high-risk and low-risk patients in the TCGA-LGG training set in Example 2 of this application are shown. Figure 4 This is a heatmap showing the relationship between risk score and patient survival time, survival status, and biomarker expression levels in the TCGA-LGG training set in Example 2 of this application. Figure 5 The ROC curves of the risk model for 1-year, 3-year, 5-year, 7-year, and 10-year overall survival in the TCGA-LGG training set in Embodiment 2 of this application are shown. Figure 6 This is an example of the survival stratification effect of the risk model on patients with high-grade gliomas in the TCGA-GBM independent cohort in Example 2 of this application. Figure 7 This refers to the Kaplan-Meier survival analysis results of patients in two CGGA external validation cohorts of glioma genome maps derived from the Chinese population, as described in Example 2 of this application. Figure 8 This is the ROC curve verification result of the risk model in two external validation queues of CGGA in Embodiment 2 of this application. Detailed Implementation

[0012] The present application is described in detail below with reference to the accompanying drawings and specific embodiments, but the present application is not limited to these embodiments. The present application covers any alternatives, modifications, equivalent methods, and solutions made within the spirit and scope of the present application. To provide the public with a thorough understanding of the present application, specific details are described in detail in the following embodiments, but those skilled in the art will fully understand the present application even without these details.

[0013] This application proposes a method for prognostic risk stratification of gliomas, following the logic of "data integration - feature screening - model building - external validation," and the approach is established in the following steps: Step 1: Data Acquisition; Obtain transcriptomic data and clinical information from glioma patients, divided into training set and external validation set; The transcriptomic data are from databases such as TCGA and CGGA or clinical sample RNA-seq data. The expression levels of the transcriptomic data are raw count data, which are standardized and used for model training. Step 2: Data preprocessing; The transcriptomics data were transformed by log2 (x+0.001), samples with missing values ​​>20% were removed, and quantile normalization was used to eliminate batch effects, resulting in the preprocessed training set and external validation set; Step 3: Multi-module feature screening; Based on the preprocessed training set, a feature screening sub-module is constructed using LASSO regression. The optimal parameters are determined through 10-fold cross-validation, and combinations of biomarkers that are significantly associated with the overall survival of glioma patients are screened. Step 4: Prognostic model construction; Based on the combination of biomarkers and clinical information, a LASSO-Cox risk model is constructed to obtain a risk score; Step 5: Risk stratification; Glioma patients are divided into high-risk and low-risk groups based on the median risk score, and external validation is performed using a pre-processed external validation set.

[0014] Preferably, step 1 specifically involves: downloading transcriptomic RNA-seq data and clinical information of glioma patients from the TCGA database, which are divided into the TCGA-LGG training set and the TCGA-GBM external validation set. The clinical information includes age, survival time, and survival status. At the same time, two independent cohorts, CGGA mRNAseq_693 and CGGA mRNAseq_325, are also obtained from the CGGA database as external validation sets.

[0015] The risk model is validated using an external validation set to verify its performance and generalization ability.

[0016] Preferably, the biomarker combination in step 3 is adenylate cyclase-associated protein 1, dachshund-associated protein 2, and serum deprivation response protein.

[0017] Preferably, the risk model in step 4 is: RiskScore = 6.56 × 10 -5 ×CAP1 + 7.03×10 -5 ×DCHS2 – 1.40×10 -4 ×SDPR +0.0574×AGE; In the formula: CAP1 is adenylate cyclase-associated protein 1, DCHS2 is dax adhesion protein-associated protein 2, SDPR is serum deprivation response protein, and AGE is age information in clinical information.

[0018] Preferably, a combination of transcriptomic biomarkers for prognostic risk stratification of gliomas is obtained by screening using the method for prognostic risk stratification of gliomas, wherein the biomarker combination includes adenylate cyclase-associated protein 1 (CAP1), dachshund-associated protein 2 (DCHS2), and serum deprivation-reactive protein (SDPR).

[0019] Preferably, a system for prognostic risk stratification of gliomas, used to implement the method for prognostic risk stratification of gliomas, includes a data acquisition module, a data preprocessing module, a multi-module feature screening module, a prognostic model construction module, and a risk stratification module; The data acquisition module is used to acquire transcriptomic data and clinical information of glioma patients, and is divided into a training set and an external validation set. The data preprocessing module is used to transform the transcriptomics data by log2 (x+0.001), remove samples with missing values ​​>20%, and use quantile normalization to eliminate batch effects, so as to obtain the preprocessed training set and external validation set. The multi-module feature screening module is used to construct a feature screening sub-module based on the preprocessed training set using LASSO regression, determine the optimal parameters through 10-fold cross-validation, and screen for combinations of biomarkers that are significantly associated with the overall survival of glioma patients. The prognostic model construction module is used to construct a LASSO-Cox risk model based on biomarker combinations and clinical information to obtain a risk score. The risk stratification module is used to classify glioma patients into high-risk and low-risk groups based on the median risk score as a threshold, and to perform external validation using a preprocessed external validation set.

[0020] This application demonstrates the use of a prognostic risk stratification system in the development of personalized treatment plans for gliomas.

[0021] Example 1: Glioma prognostic risk stratification system; Data acquisition module: Acquires patient transcriptomics data (RNA-seq raw counts) and clinical information (age, survival time, survival status), and incorporates them into TCGA-LGG (training set), TCGA-GBM (external validation set), and two CGGA cohorts (external validation sets). Data preprocessing module: Quantile normalization was performed using the limma package in R language, low-expression genes (samples with an expression level of 0 > 50%) were removed, missing values ​​were filled with the minimum value, and the data were transformed by log2 (x + 0.001) before being used for analysis; Multi-module feature screening module: Three algorithms are used to construct sub-modules respectively. LASSO regression uses the glmnet package (λ=0.01308) to screen feature genes that are significantly related to overall survival. The intersection of the three is used to determine three core biomarkers. Prognostic model construction module: Integrating the expression levels of three biomarkers and age, a LASSO-Cox risk model is constructed to calculate an individual risk score (RiskScore = 6.56 × 10⁻⁶). -5 ×CAP1 + 7.03×10 -5 ×DCHS2 – 1.40×10 -4 ×SDPR + 0.0574×AGE); Risk stratification module: Patients are divided into high-risk and low-risk groups based on the median risk score in the training set, and the Kaplan-Meier method is used to analyze survival differences. Validation module: Validate model performance in 3 independent queues: Model performance and validation results: In the training set (TCGA-LGG), the overall survival of patients in the high-risk group was significantly shorter than that in the low-risk group (HR=3.16, 95% CI: 2.15-4.66, Log-rank P=9.2e-10), with AUC values ​​≥0.75 for 1-year, 3-year, and 5-year survival predictions. In the TCGA-GBM cohort, the model effectively stratified the survival risk of patients with high-grade gliomas. Further, based on the Chinese glioma genome atlas database, in the CGGA mRNAseq_693 (HR=2.00, P=8.5e-12) and mRNAseq_325 (HR=2.66, P=1.0e-12) cohorts, the model maintained stable risk stratification ability, with AUC values ​​of 0.60-0.87.

[0022] This glioma risk score has strong clinical applicability: the model only includes 3 biomarkers and age, the detection cost is low, and it can be integrated into routine molecular testing procedures.

[0023] This glioma risk model has been validated across ethnic groups and is applicable to glioma patients of different pathological grades and ethnicities. It can guide precision treatment, recommending intensive chemotherapy regimens for high-risk patients and reducing overtreatment for low-risk patients, thereby lowering the healthcare burden.

[0024] Example 2: Construction and validation of a prognostic risk stratification system for gliomas; This embodiment constructs and validates a glioma prognostic risk stratification model based on the Cancer Genome Atlas (TCGA) and the Chinese Glioma Genome Atlas (CGGA) databases. The specific steps are as follows: Step 1, Data Acquisition: Transcriptomic RNA-seq data (raw count data) and clinical information (including age, survival time, and survival status) of glioma patients were downloaded from the TCGA database. This included 472 cases in the TCGA-LGG (low-grade glioma) training set and 144 cases in the TCGA-GBM (glioblastoma) external validation set. Simultaneously, two independent cohorts (mRNAseq_693 and mRNAseq_325) from the CGGA (Glioma Genome Atlas) database, derived from the Chinese population, were obtained as external validation sets.

[0025] Step 2: Preprocess the transcriptome data: remove low-expression genes (samples with an expression level of 0 > 50%), transform expressed genes using log2(x+0.001), use quantile normalization to eliminate batch effects, and fill missing values ​​with the minimum value in the same group.

[0026] Step 3, Feature Selection and Model Building: Based on the TCGA-LGG training set data, LASSO regression was used to screen genes significantly associated with overall survival. The optimal penalty coefficient λ=0.01308 was determined using the R language package "glmnet" through 10-fold cross-validation. Finally, CAP1, DCHS2, and SDPR genes were selected as core prognostic biomarkers. Figure 2 The figure shows the coefficient distribution of feature genes screened by the LASSO-Cox risk model based on the TCGA-LGG cohort.

[0027] Step 4: Construct the LASSO-Cox prognostic risk model. The risk model has a RiskScore of 6.56 × 10⁻⁶. -5 ×CAP1 + 7.03×10 -5 ×DCHS2 – 1.40×10 -4 ×SDPR + 0.0574×AGE.

[0028] Step 5, Risk Stratification and Survival Analysis: Patients are divided into high-risk and low-risk groups based on the median risk score in the training set. For example... Figure 3 The figure shows the Kaplan-Meier survival curves for patients in the high-risk and low-risk groups in the TCGA-LGG training set. Kaplan-Meier survival analysis showed that the overall survival of patients in the high-risk group was significantly shorter than that in the low-risk group (HR = 3.16, 95% CI: 2.15–4.66, Log-rank). ).like Figure 4 As shown, in the TCGA-LGG training set, heatmaps of risk scores, survival time, survival status, and related gene expression indicate that high-risk patients have shorter survival times, increased CAP1 and DCHS2 expression, and decreased SDPR expression. Time-dependent ROC curve analysis shows that the model's AUC values ​​for 1-year, 3-year, 5-year, 7-year, and 10-year survival all exceed 0.75, indicating that the model has good predictive accuracy. Figure 5 As shown.

[0029] External independent validation: To verify the model's generalization ability, external validation was performed in the TCGA-GBM cohort and two CGGA cohorts. In the TCGA-GBM cohort, the model also effectively stratified the survival risk of patients with high-grade gliomas, such as... Figure 6 As shown. In the CGGA mRNAseq_693 and mRNAseq_325 cohorts, the risk model maintained significant survival stratification ability (HRs of 2.00 and 2.66, respectively). All less than The AUC value of the ROC curve is 0.60–0.87, as shown. Figure 7, 8 As shown in the figure, the model has good applicability in patients of different races and different pathological grades.

[0030] Clinical application recommendations: This risk model is easy to operate, requiring only the expression levels of three genes and age information. It can be detected using routine qPCR or RNA-seq techniques, facilitating integration into clinical molecular testing workflows. High-risk patients may be recommended for intensive treatment or clinical trial enrollment, while low-risk patients can avoid overtreatment, achieving personalized and precise management.

[0031] The preferred embodiments of this application have been described in detail above. However, this application is not limited to the above embodiments. Within the scope of knowledge possessed by those skilled in the art, various changes can be made without departing from the spirit of this application.

[0032] Many other changes and modifications can be made without departing from the concept and scope of this application. It should be understood that this application is not limited to the specific embodiments, and the scope of this application is defined by the appended claims.

Claims

1. A method for glioma prognosis risk stratification, characterized in that, Includes the following steps: Step 1: Data Acquisition; Obtain transcriptomic data and clinical information from glioma patients, divided into training set and external validation set; Step 2: Data preprocessing; The transcriptomics data were transformed by log2 (x+0.001), samples with missing values ​​>20% were removed, and quantile normalization was used to eliminate batch effects, resulting in the preprocessed training set and external validation set; Step 3: Multi-module feature filtering; Based on the preprocessed training set, a feature selection submodule was constructed using LASSO regression. The optimal parameters were determined through 10-fold cross-validation, and a combination of biomarkers that were significantly associated with the overall survival of glioma patients was selected. Step 4: Prognostic model construction; Based on the combination of biomarkers and clinical information, a LASSO-Cox risk model is constructed to obtain a risk score; Step 5: Risk stratification; Glioma patients were categorized into high-risk and low-risk groups based on the median risk score, and external validation was performed using a pre-processed external validation set.

2. The method for glioma prognosis risk stratification according to claim 1, characterized in that, Step 1 specifically involves downloading transcriptomic RNA-seq data and clinical information of glioma patients from the TCGA database, which are divided into the TCGA-LGG training set and the TCGA-GBM external validation set. The clinical information includes age, survival time, and survival status. At the same time, two independent cohorts, CGGA mRNAseq_693 and CGGA mRNAseq_325, are also obtained from the CGGA database as external validation sets.

3. The method for prognostic risk stratification of glioma according to claim 1, characterized in that, The biomarker combination in step 3 is adenylate cyclase-associated protein 1, dachshund-associated protein 2, and serum deprivation response protein.

4. The method for prognostic risk stratification of glioma according to claim 3, characterized in that, The risk model in step 4 is as follows: RiskScore = 6.56 x 10 -5 x CAP1 + 7.03 x 10 -5 x DCHS2 - 1.40 x 10 -4 x SDPR + 0.0574 x AGE; In the formula: RiskScore is the risk score, CAP1 is adenylate cyclase-associated protein 1, DCHS2 is dax adhesion protein-associated protein 2, SDPR is serum deprivation response protein, and AGE is age information in clinical information.

5. A combination of transcriptomic biomarkers for prognostic risk stratification of gliomas, characterized in that: The marker combination is obtained by screening according to any one of claims 1 to 4 for prognostic risk stratification of glioma, and the marker combination includes adenylate cyclase-associated protein 1, dachsky adhesion protein-associated protein 2, and serum deprivation-response protein.

6. A system for prognostic risk stratification of gliomas, characterized in that, The method for prognostic risk stratification of glioma as described in any one of claims 1 to 4 includes a data acquisition module, a data preprocessing module, a multi-module feature screening module, a prognostic model construction module, and a risk stratification module. The data acquisition module is used to acquire transcriptomic data and clinical information of glioma patients, and is divided into a training set and an external validation set. The data preprocessing module is used to transform the transcriptomics data by log2 (x+0.001), remove samples with missing values ​​>20%, and use quantile normalization to eliminate batch effects, so as to obtain the preprocessed training set and external validation set. The multi-module feature screening module is used to construct a feature screening sub-module based on the preprocessed training set using LASSO regression, determine the optimal parameters through 10-fold cross-validation, and screen for combinations of biomarkers that are significantly associated with the overall survival of glioma patients. The prognostic model construction module is used to construct a LASSO-Cox risk model based on biomarker combinations and clinical information to obtain a risk score. The risk stratification module is used to classify glioma patients into high-risk and low-risk groups based on the median risk score as a threshold, and to perform external validation using a preprocessed external validation set.