A marker combination related to postoperative prognosis of colorectal cancer and application and prediction system thereof

By combining gut microbiota and metabolomics biomarkers, a predictive model was constructed, which solved the problems of accuracy and specificity in predicting the risk of recurrence after colorectal cancer surgery, achieved efficient prognostic assessment, and improved the performance and stability of the predictive model.

CN122235339APending Publication Date: 2026-06-19NANKAI UNIV +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANKAI UNIV
Filing Date
2026-03-31
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Current indicators for predicting the risk of recurrence after colorectal cancer surgery have low specificity, poor individual heterogeneity, and insufficient predictive accuracy. Traditional pathological indicators cannot accurately assess the risk of recurrence in patients.

Method used

By combining gut microbiome and metabolomics analysis, *Peptostreptococcus*, *Fusobacterium*, *Bacteroides*, *Porphyromonas*, and *Prevotella* were screened as microbial biomarkers. Alanylglutamate, putrescine, arginine, histidine, and sebacic acid were used as metabolite biomarkers. A prediction model was constructed using LASSO regression and random forest algorithms.

🎯Benefits of technology

It improved the accuracy and specificity of predicting the risk of recurrence after colorectal cancer surgery. The AUC increased from 0.81 to 0.91, the accuracy was 0.81, the sensitivity was 0.75, and the specificity was 0.88. It can independently assess prognosis and distinguish between patients with high and low risk of recurrence, independent of TNM staging.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122235339A_ABST
    Figure CN122235339A_ABST
Patent Text Reader

Abstract

This invention belongs to the fields of bioinformatics and medical testing technology, specifically relating to a combination of biomarkers related to the postoperative prognosis of colorectal cancer and their application and prediction system. This invention, through 16S rRNA gene sequencing of tumor mucosal samples from colorectal cancer patients, combined with long-term postoperative follow-up data and screening, found that *Peptostreptococcus*, *Fusobacterium*, *Bacteroides*, *Porphyromonas*, and *Prevotella* are closely related to postoperative recurrence of colorectal cancer, with an AUC of 0.81 for assessing postoperative recurrence. Based on this, this invention conducts non-targeted metabolomics analysis, finding that combining alanine glutamate, putrescine, arginine, histidine, and sebacic acid with the above microbial biomarkers can improve the AUC to 0.91, significantly enhancing accuracy, specificity, and precision. This enables prognostic stratification of recurrence-free survival in colorectal cancer patients, demonstrating higher predictive efficacy and stability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the fields of bioinformatics and medical testing technology, specifically relating to a combination of biomarkers related to the postoperative prognosis of colorectal cancer and their application and prediction system. Background Technology

[0002] Colorectal cancer (CRC) is the third most common malignant tumor worldwide and the second leading cause of cancer-related deaths. For patients with stage II-III colorectal cancer, surgical resection is the standard treatment, but the recurrence rate is over 30%, with 80% of recurrences occurring within 3 years post-surgery. Traditional prognostic assessments primarily rely on pathological indicators such as TNM staging and tumor grade. However, these indicators have limitations, including significant inter-observer variability and an inability to fully reflect individual patient heterogeneity, leading to inaccurate recurrence risk assessments for some patients and impacting clinical intervention decisions.

[0003] In recent years, the roles of the gut microbiome and metabolome in the development and progression of colorectal cancer have received widespread attention. Studies have shown that the gut microbiota influences colorectal cancer progression through mechanisms such as genotoxicity, immune regulation, and activation of oncogenic signaling pathways, while the metabolome, as the chemical interface between the host and microbes, is closely related to the state of tumor cells. However, existing research largely focuses on the identification of single biomarkers, lacking integration analysis of the microbiome and metabolome, and the accuracy and clinical applicability of predictive models need improvement. Summary of the Invention

[0004] The purpose of this invention is to address the problems of low specificity of indicators, poor adaptability to individual heterogeneity, and insufficient prediction accuracy in existing colorectal cancer postoperative recurrence risk prediction. It provides a combination of biomarkers with high specificity and sensitivity and constructs a prediction system to achieve accurate prediction and prognostic stratification of colorectal cancer postoperative recurrence risk.

[0005] This invention provides a combination of biomarkers associated with postoperative prognosis of colorectal cancer, the combination of biomarkers including microbial biomarkers; the microbial biomarkers include Peptostreptococcus spp. (… Peptostreptococcus ), Fusobacterium genus ( Fusobacterium Bacteroides ( Bacteroides ), Porphyromonas spp. ( Porphyromonas ) and Prevotella spp. Prevotella ).

[0006] Preferably, the biomarker combination further includes metabolite biomarkers; the metabolite biomarkers include one or more of alanine glutamate, putrescine, arginine, histidine, and sebacic acid.

[0007] This invention also provides the application of reagents for detecting the combination of biomarkers described in the above-described technical solutions in the preparation of products for predicting or assessing postoperative prognosis of colorectal cancer.

[0008] Preferably, the prediction or assessment of postoperative prognosis for colorectal cancer includes one or more of the following: (1) Determine whether colorectal cancer patients have recurred after surgery; (2) Differentiate between colorectal cancer patients who have recurred after surgery and those who have not recurred after surgery; (3) Stratify the risk of future recurrence of colorectal cancer patients after surgery.

[0009] Preferably, the product is a reagent, reagent kit, test strip, digital microfluidic chip, or biosensor.

[0010] This invention provides a method for training a prognostic model after colorectal cancer surgery, including the steps of data acquisition, data preprocessing, feature selection, and model training; The data acquisition includes: acquiring relative abundance data of microorganisms in tumor samples from colorectal cancer patients; The data preprocessing includes: adding a pseudo-count of 1×10 to the relative abundance data of microorganisms. -6 The microbial pretreatment data were then obtained by CLR transformation. The feature selection includes: using the LASSO regression algorithm to perform feature selection on the microbial preprocessed data, shrinking the feature coefficients of non-microbial biomarkers to zero, adjusting the feature coefficients of microbial biomarkers to be non-zero, and retaining the non-zero coefficients as training set data; the microbial biomarkers are the microbial biomarkers in the biomarker combination described in the above technical solution. The model training includes random forest algorithm model training and / or random survival forest algorithm model training; The random forest algorithm model training includes: using the postoperative recurrence status of colorectal cancer patients as output data, training the random forest algorithm model using the training set data to obtain a colorectal cancer postoperative recurrence status prediction model; the postoperative recurrence status is recurrence or no recurrence; The training of the random survival forest algorithm model includes: using the integrated cumulative risk score of colorectal cancer patients after surgery as output data, and using the training set data to train the random survival forest algorithm model to obtain a risk assessment model for recurrence after colorectal cancer surgery.

[0011] Preferably, the data acquisition further includes: acquiring non-targeted metabolomics data of tumor samples from colorectal cancer patients; The data preprocessing also includes: performing z-score normalization on the non-targeted metabolomics data, setting the mean = 0 and the standard deviation = 1, to obtain the standardized ionic strength values ​​of the metabolites; The feature selection includes: using the LASSO regression algorithm to perform feature selection on the microbial preprocessed data and the standardized ionic intensity values ​​of the metabolites, shrinking the feature coefficients of non-microbial markers and non-metabolite markers to zero, adjusting the feature coefficients of microbial markers and non-metabolite markers to be non-zero, and retaining the non-zero coefficients as training set data; the metabolite markers are the metabolite markers in the marker combination described in the above technical solution.

[0012] Preferably, the training parameters of the random forest algorithm model include the number of estimators, maximum depth, minimum number of sample splits, and minimum number of sample leaf nodes; The feature selection is performed using L1 regularization, α=0.01, with a maximum number of iterations of 1000. The tumor samples from the colorectal cancer patients were mucosal tissue from the tumor site of the colorectal cancer patients.

[0013] Preferably, the model training process further includes a model validation step: The model validation includes: acquiring additional tumor samples from colorectal cancer patients, processing the additionally acquired tumor samples from colorectal cancer patients according to the steps of data acquisition, data preprocessing, and feature screening to obtain screened features, which are used as validation set data; inputting the validation set data into the trained model and calculating the accuracy.

[0014] This invention provides a predictive system for postoperative prognosis of colorectal cancer, the predictive system comprising a data acquisition module, a data preprocessing module, a feature extraction module, and a model prediction module; The data acquisition module includes a microbial relative abundance data acquisition module, used to acquire microbial relative abundance data in tumor samples from colorectal cancer patients; The data preprocessing module includes a microbial relative abundance data preprocessing module, used to first add a pseudo-count of 1×10 to the relative abundance. -6 The microbial pretreatment data were then obtained by CLR transformation. The feature extraction module is used to perform feature screening on the microbial preprocessed data using the LASSO regression algorithm, shrinking the feature coefficients of non-microbial biomarkers to zero, adjusting the feature coefficients of microbial biomarkers to be non-zero, and retaining the non-zero coefficients as input features of the model; the microbial biomarkers are the microbial biomarkers in the biomarker combination described in the above technical solution. The model prediction module includes a colorectal cancer postoperative recurrence status prediction module and / or a colorectal cancer postoperative recurrence risk prediction module. The colorectal cancer postoperative recurrence prediction module is used to input the model input features into the colorectal cancer postoperative recurrence prediction model constructed by the training method described in the above technical solution to obtain the prognostic result. The colorectal cancer postoperative recurrence risk prediction module is used to input the model input features into the colorectal cancer postoperative recurrence risk assessment model constructed by the training method described in the above technical solution to obtain prognostic results.

[0015] Preferably, the data acquisition module further includes a metabolomics data acquisition module for acquiring non-targeted metabolomics data of tumor samples from colorectal cancer patients; The data preprocessing module also includes a non-targeted metabolomics data preprocessing module, which is used to perform z-score normalization on the non-targeted metabolomics data, setting the mean = 0 and the standard deviation = 1 to obtain the standardized ionic strength values ​​of metabolites. The feature extraction module is used to perform feature screening on the microbial preprocessed data and the standardized ionic intensity values ​​of the metabolites using the LASSO regression algorithm. The feature coefficients of non-microbial markers and non-metabolite markers are shrunk to zero, and the feature coefficients of microbial markers and non-metabolite markers are adjusted to be non-zero. The non-zero coefficients are retained as input features of the model. The metabolite markers are the metabolite markers in the marker combination described in the above technical solution.

[0016] The present invention also provides a computer device, comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the training method for the postoperative prognostic model of colorectal cancer as described in the above technical solutions or to evaluate or predict postoperative colorectal cancer using the prediction system described in the above technical solutions.

[0017] The present invention also provides a computer-readable medium having a computer program stored thereon, which, when executed by a processor, implements the training method for the postoperative prognostic model of colorectal cancer as described in any of the above technical solutions, or uses the prediction system described in the above technical solutions to evaluate or predict postoperative colorectal cancer.

[0018] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the training method for the postoperative prognostic model of colorectal cancer as described in the above technical solutions, or uses the prediction system described in the above technical solutions to evaluate or predict postoperative colorectal cancer.

[0019] Beneficial effects: This invention utilizes 16S rRNA gene sequencing of tumor mucosal samples from colorectal cancer patients, combined with long-term postoperative follow-up data, and employs LASSO regression feature selection and random forest algorithms to screen for five microbial biomarkers closely associated with postoperative recurrence of colorectal cancer: *Peptostreptococcus*, *Fusobacterium*, *Bacteroides*, *Porphyromonas*, and *Prevotella*. These biomarkers can be used to assess postoperative prognosis of colorectal cancer. Results showed that the AUC for assessing postoperative recurrence of colorectal cancer was 0.81, with an accuracy of 0.71, a sensitivity of 0.75, a specificity of 0.67, and a precision of 0.67. Specifically, increased abundance of *Peptostreptococcus* and *Fusobacterium* increased the risk of recurrence, while increased levels of *Bacteroides* decreased the risk of recurrence.

[0020] Furthermore, this invention utilizes LC-MS non-targeted metabolomics analysis of tumor mucosal samples from colorectal cancer patients, combined with long-term postoperative follow-up data, and employs LASSO regression feature selection and random forest algorithms to screen for metabolite biomarkers closely related to postoperative recurrence of colorectal cancer, including alanine glutamate, putrescine, arginine, histidine, and sebacic acid. Combining these metabolite biomarkers with the aforementioned microbial biomarkers allows for the identification of microbial-metabolite interactions (such as co-aggregation of Fusobacterium and Peptostreptococcus, and the regulation of biofilm formation by arginine and putrescine) in the colorectal cancer recurrence process. This approach demonstrates a clear biological functional basis and addresses the limitations of single biomarkers in terms of specificity and sensitivity. The results of the examples show that using *Peptostreptococcus*, *Fusobacterium*, *Bacteroides*, *Porphyromonas*, and *Prevotella* as microbial markers, and alanine glutamate, putrescine, arginine, histidine, and sebacic acid as metabolite markers, the combined use of these markers to assess postoperative recurrence of colorectal cancer improved the AUC to 0.91, with an accuracy of 0.81, a sensitivity of 0.75, a specificity of 0.88, and a precision of 0.86. The predictive performance of the integrated model of five microbial markers and five metabolite markers was significantly improved. P <0.05). It can be used independently of TNM staging for prognostic assessment, effectively distinguishing between patients at high and low risk of recurrence. The HR value of the risk score after adjusting for TNM staging was 1.59 (95% CI: 1.35-1.88). P <0.0001), which can achieve accurate prognostic prediction independently of traditional pathological staging, showing higher predictive efficacy and stability, and providing an important tool for assessing the risk of postoperative recurrence and clinical intervention in colorectal cancer patients. Attached Figure Description

[0021] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the accompanying drawings used in the embodiments will be briefly described below.

[0022] Figure 1This is a differential expression diagram of five microorganisms in colorectal cancer patients after surgery and in the non-recurrence group, as shown in Example 1. Figure 2 This is a graph showing the differential expression of five metabolites in patients with colorectal cancer who underwent surgery and those who did not. (Example 1) Figure 3 The ROC curve of the random forest classification model based on five microbial biomarkers in Example 2 is shown. Figure 4 Example 2 shows the predictive performance of a random forest classification model based on five microbial biomarkers; Figure 5 The feature importance of five microbial biomarkers in the random forest classification model of Example 2; Figure 6 The results of SHAP analysis of five microbial biomarkers in the random forest classification model of Example 2; Figure 7 The ROC curve of the random forest classification model based on the genus *Fusobacterium* is shown in Comparative Example 1. Figure 8 To compare the prediction performance of the random forest classification model based on the genus *Fusobacterium* in Example 1; Figure 9 The ROC curve of the random forest classification model based on the genus *Streptococcus* is shown in Comparative Example 2. Figure 10 To compare the prediction performance of the random forest classification model based on the genus *Streptococcus* in Example 2; Figure 11 The ROC curve of the random forest classification model based on five metabolite markers in Comparative Example 3 is shown. Figure 12 The results of SHAP analysis of five metabolite biomarkers in the random forest classification model of Comparative Example 3 are shown. Figure 13 The ROC curve of the random forest classification model based on 5 microbial biomarkers and 5 metabolite biomarkers in Example 3; Figure 14 Example 3 shows the predictive performance of a random forest classification model based on 5 microbial biomarkers and 5 metabolite biomarkers; Figure 15 The Kaplan-Meier survival curves for Example 3 are based on a risk scoring model using 5 microbial biomarkers and 5 metabolite biomarkers. Figure 16 The results of Spearman correlation analysis for the co-aggregation experiment in Example 4; Figure 17 The co-aggregation rate among different strains in the co-aggregation experiment of Example 4; Figure 18Example 4 illustrates the effect of arginine on the co-aggregation of Fusobacterium nucleatum and anaerobic Streptococcus. Figure 19 This is a fluorescence confocal image of the bacterial biofilm in the biofilm formation experiment of Example 4; Figure 20 The biofilm content in Example 4, the biofilm formation experiment; Figure 21 Example 4: Biofilm formation experiment - Effect of putrescine on biofilm production of Fusobacterium nucleatum-Anaerobic Streptococcus bifidum dual species; Figure 22 This is a fluorescence confocal image of bacterial adhesion to tumor cells in Example 4, tumor cell adhesion experiment; Figure 23 Example 4: Effect of co-culture of tumor cells on the adhesion rate of Fusobacterium nucleatum and anaerobic Streptococcus. Figure 24 Example 4 illustrates the effect of co-culture on the intestinal colonization ability of Fusobacterium nucleatum and anaerobic Streptococcus. Figure 25 Example 4 illustrates the effect of co-culture on the colonic mucosa colonization ability of Fusobacterium nucleatum. Detailed Implementation

[0023] This invention provides a combination of biomarkers associated with postoperative prognosis of colorectal cancer, the combination of biomarkers including microbial biomarkers including Peptostreptococcus, Fusobacterium, Bacteroides, Porphyromonas, and Prevotella.

[0024] This invention utilizes 16S rRNA gene sequencing of tumor mucosal samples from colorectal cancer patients, combined with long-term postoperative follow-up data, and employs LASSO regression feature selection and random forest algorithms to screen for five microbial biomarkers closely related to postoperative recurrence of colorectal cancer: *Peptostreptococcus*, *Fusobacterium*, *Bacteroides*, *Porphyromonas*, and *Prevotella*. These biomarkers can be used to assess postoperative prognosis of colorectal cancer. Results showed that using *Fusobacterium* as a single biomarker to assess postoperative recurrence of colorectal cancer had an AUC of 0.79 (95% CI: 0.54–1.00), accuracy of 0.65, sensitivity of 0.88, specificity of 0.44, and precision of 0.58. Using *Peptostreptococcus* as a single biomarker to assess postoperative recurrence of colorectal cancer had an AUC of 0.60, accuracy of 0.65, sensitivity of 0.62, specificity of 0.67, and precision of 0.62. Combining Peptostreptococcus, Fusobacterium, Bacteroides, Porphyromonas, and Prevotella species significantly improves key performance indicators such as AUC, accuracy, specificity, and precision in assessing postoperative recurrence of colorectal cancer, demonstrating higher predictive efficacy and stability.

[0025] In one embodiment, the biomarker combination of the present invention further includes metabolite biomarkers; the metabolite biomarkers include one or more of alanylglutamic acid, putrescine, arginine, histidine, and sebacic acid. In one embodiment, the metabolite biomarkers of the present invention are alanylglutamic acid, putrescine, arginine, histidine, and sebacic acid.

[0026] This invention utilizes LC-MS non-targeted metabolomics analysis of tumor mucosal samples from colorectal cancer patients, combined with long-term postoperative follow-up data, and employs LASSO regression feature selection and random forest algorithms to screen for metabolite biomarkers closely related to postoperative recurrence of colorectal cancer, including alanylglutamate, putrescine, arginine, histidine, and sebacic acid. Results show that the AUC for assessing postoperative recurrence of colorectal cancer using a combination of alanylglutamate, putrescine, arginine, histidine, and sebacic acid is only 0.66. However, combining these metabolite biomarkers with the aforementioned microbial biomarkers improves the AUC to 0.91, with an accuracy of 0.81, sensitivity of 0.75, specificity of 0.88, and precision of 0.86. The predictive performance of the integrated model of five microbial biomarkers and five metabolite biomarkers is significantly improved. P <0.05).

[0027] Based on the biomarker combination described in this invention, a risk scoring model can be constructed. The ensemble cumulative risk predicted by the model is used as the comprehensive risk score (i.e., the expected number of recurrence events). Subjects are divided into high-risk and low-risk groups according to the median risk score, thereby achieving prognostic stratification of patients. Kaplan-Meier survival analysis showed that the recurrence-free survival of patients in the high-risk group was significantly shorter than that in the low-risk group (log-rank). P <0.0001). Multivariate Cox regression analysis showed that the risk score was an independent prognostic factor for postoperative recurrence of colorectal cancer.

[0028] Given the role of the biomarker combination provided by this invention, the application of reagents for detecting the biomarker combination in the preparation of products for predicting or assessing postoperative prognosis of colorectal cancer also falls within the scope of protection of this invention.

[0029] In one embodiment, the product of this invention is a reagent, kit, test strip, digital microfluidic chip, or biosensor. In another embodiment, the prediction or assessment of postoperative prognosis for colorectal cancer includes one or more of the following: (1) determining whether a colorectal cancer patient has relapsed after surgery; (2) distinguishing between patients with recurrent colorectal cancer and those without recurrence; (3) stratifying the risk of future recurrence in colorectal cancer patients after surgery. In one embodiment, the stratified assessment of colorectal cancer includes prognostic assessment of patients with TNM staging of colorectal cancer, effectively distinguishing between patients with high and low recurrence risk.

[0030] This invention provides a method for training a prognostic model after colorectal cancer surgery, comprising the steps of data acquisition, data preprocessing, feature selection, and model training. The data acquisition includes: acquiring relative abundance data of microorganisms in tumor samples from colorectal cancer patients; The data preprocessing includes: adding a pseudo-count of 1×10 to the relative abundance data of microorganisms. -6 The microbial pretreatment data were then obtained by CLR transformation. The feature selection includes: using the LASSO regression algorithm to perform feature selection on the microbial preprocessed data, shrinking the feature coefficients of non-microbial biomarkers to zero, adjusting the feature coefficients of microbial biomarkers to be non-zero, and retaining the non-zero coefficients as training set data; the microbial biomarkers are the microbial biomarkers in the biomarker combination described in the above technical solution. The random forest algorithm model training includes: using the postoperative recurrence status of colorectal cancer patients as output data, training the random forest algorithm model using the training set data to obtain a colorectal cancer postoperative recurrence status prediction model; the postoperative recurrence status is recurrence or no recurrence; The training of the random survival forest algorithm model includes: using the postoperative recurrence risk of colorectal cancer patients as output data, training the random survival forest algorithm model using the training set data to obtain a postoperative recurrence risk assessment model for colorectal cancer; the postoperative recurrence risk is divided into high risk and low risk.

[0031] This invention involves data acquisition. The data acquisition described in this invention includes obtaining relative abundance data of microorganisms in tumor samples from colorectal cancer patients. In this invention, the relative abundance of a particular microorganism is the proportion of that microorganism's abundance in the total abundance of all microorganisms.

[0032] In one implementation method, this invention obtains 16S rRNA gene sequencing data from tumor samples of colorectal cancer patients, performs quality filtering, ASV generation, and species annotation on the 16S rRNA gene sequencing data to obtain microbial relative abundance data. In one implementation method, this invention uses QIIME2 (v2019.4) software, performs quality filtering (truncation parameters: forward read length 280bp, reverse read length 220bp) using the DADA2 plugin to generate ASVs, and performs species annotation based on a Naive Bayes classifier trained on the SILVA database (v138.1). In one implementation method, the 16S rRNA gene sequencing of this invention targets the V3-V4 variable region of the 16S rRNA gene. In one implementation method, the PCR amplification conditions for the 16S rRNA gene sequencing of this invention are: 95℃ pre-denaturation for 3 min; 95℃ denaturation for 30 s, 55℃ annealing for 30 s, 72℃ extension for 30 s, for a total of 35 cycles; 72℃ final extension for 5 min.

[0033] As one implementation method, the data acquisition described in this invention further includes: acquiring non-targeted metabolomics data from tumor samples of colorectal cancer patients. As one implementation method, this invention acquires non-targeted metabolomics data from tumor samples of colorectal cancer patients, performs peak alignment, retention time correction, peak area extraction, and metabolite identification to obtain non-targeted metabolomics data. As one implementation method, this invention uses MSDIAL software to identify metabolites using HMDB, MassBank, GNPS databases, and a self-built standard library (matching conditions: RT deviation ±0.2 min, MS deviation ±5 ppm, MS / MS matching score ≥80), removes features with missing values ​​>50% or RSD >30% in QC samples, and performs total ion current normalization. As one embodiment, the chromatographic conditions for LC-MS detection according to the present invention are as follows: column temperature 40℃, flow rate 0.3 ml / min, mobile phase A is 0.1% formic acid aqueous solution, mobile phase B is acetonitrile, and gradient elution program is as follows: 0-2 min: 0% B, 2-6 min: 0-48% B linear gradient, 6-10 min: 48-100% B linear gradient, 10-12 min: 100% B, 12-12.1 min: 100-0% B linear gradient, 12.1-15 min: 0% B.

[0034] In one embodiment, the tumor sample from a colorectal cancer patient described in this invention is mucosal tissue from the tumor site of a colorectal cancer patient.

[0035] After obtaining the data, the present invention performs data preprocessing to obtain preprocessed data. The data preprocessing of the present invention includes adding a pseudo-count of 1×10⁻⁶ to the relative abundance data of microorganisms. -6 The microbial pretreatment data were then obtained by CLR transformation.

[0036] As one implementation method, the data preprocessing of the present invention further includes: performing z-score normalization on the non-targeted metabolomics data, setting the mean = 0 and the standard deviation = 1, to obtain the normalized ionic strength values ​​of the metabolites.

[0037] After obtaining the preprocessed data, the present invention performs feature filtering on the preprocessed data to obtain model input features. The feature filtering of the present invention includes: using the LASSO regression algorithm to perform feature filtering on the microbial preprocessed data, shrinking the feature coefficients of non-microbial biomarkers to zero, and adjusting the feature coefficients of microbial biomarkers to be non-zero; the microbial biomarkers are the microbial biomarkers in the biomarker combination described in the above technical solution.

[0038] As one implementation method, when obtaining the standardized ionic intensity values ​​of metabolites, the feature selection method of this invention includes: using the LASSO regression algorithm to perform feature selection on the microbial preprocessed data and the standardized ionic intensity values ​​of metabolites, shrinking the feature coefficients of non-microbial biomarkers and non-metabolite biomarkers to zero, adjusting the feature coefficients of microbial biomarkers and non-metabolite biomarkers to be non-zero, and retaining the non-zero coefficients as training set data; the metabolite biomarkers are the metabolite biomarkers in the biomarker combination described in the above technical solution. This invention performs feature selection, shrinking the coefficients of unimportant features to zero, which can reduce data dimensionality and redundant information, and improve the model's generalization ability.

[0039] As one implementation method, the feature selection described in this invention uses L1 regularization, α=0.01, and a maximum number of iterations of 1000.

[0040] After obtaining the model input features, the present invention performs model training on the model input features; the model training and validation includes random forest algorithm model training and / or random survival forest algorithm model training; The random forest algorithm model training includes: using the postoperative recurrence status of colorectal cancer patients as output data, training the random forest algorithm model using the training set data to obtain a colorectal cancer postoperative recurrence status prediction model; the postoperative recurrence status is recurrence or no recurrence; The training of the random survival forest algorithm model includes: using the integrated cumulative risk score of colorectal cancer patients after surgery as output data, and using the training set data to train the random survival forest algorithm model to obtain a risk assessment model for recurrence after colorectal cancer surgery.

[0041] As one implementation method, the training parameters of the random forest algorithm model of the present invention include the number of estimators, maximum depth, minimum number of sample splits, and minimum number of sample leaf nodes. In the present invention, the parameters of the random forest algorithm model after training are: the number of estimators is 100, Gini impurity is used as the splitting criterion, the maximum depth is set to None, the minimum number of sample splits is 2, and the minimum number of sample leaf nodes is 1. As one implementation method, the random survival forest algorithm model of the present invention is trained using the five-fold crossover method. The colorectal cancer postoperative recurrence status prediction model obtained by the present invention can predict whether colorectal cancer patients will relapse after surgery. Specifically, after inputting features into the colorectal cancer postoperative recurrence status prediction model, the colorectal cancer postoperative recurrence status prediction model will output "high recurrence probability" or "low recurrence probability", and the corresponding prediction probability (0-100%). When the prediction probability is ≥60%, it is judged as high recurrence probability (recurrence group), and <60% is judged as low recurrence probability, thus classifying colorectal cancer patients into recurrence group or non-recurrence group.

[0042] As one implementation method, when training the random survival forest algorithm model using the training set data, the present invention calculates the ensemble cumulative risk for each colorectal cancer patient after inputting the training set data, defining it as a risk score. Based on the median risk score, colorectal cancer patients are divided into high-risk and low-risk groups. The colorectal cancer postoperative recurrence risk assessment model obtained by the present invention can divide colorectal cancer patients into high-risk and low-risk groups. Specifically, after inputting features into the colorectal cancer postoperative recurrence status prediction model, the system outputs a comprehensive risk score, ranging from 0.1 to 9.2. As one implementation method, the present invention uses a median of 2.02, with a score ≥2.02 considered high-risk and <2.02 considered low-risk.

[0043] As one implementation method, after model training, the present invention further includes a model validation step. The model validation includes: acquiring additional tumor samples from colorectal cancer patients; processing the additionally acquired tumor samples according to the steps of data acquisition, data preprocessing, and feature selection to obtain selected features, which serve as validation set data; inputting the validation set data into the trained model and calculating the accuracy. As one implementation method, during model validation, it is preferable that the classification accuracy on the independent validation set is >80%, demonstrating that the model has good generalization ability and clinical auxiliary diagnostic value.

[0044] This invention provides a predictive system for postoperative prognosis of colorectal cancer, the predictive system comprising a data acquisition module, a data preprocessing module, a feature extraction module, and a model prediction module; The data acquisition module includes a microbial relative abundance data acquisition module, used to acquire microbial relative abundance data in tumor samples from colorectal cancer patients; The data preprocessing module includes a microbial relative abundance data preprocessing module, used to first add a pseudo-count of 1×10 to the relative abundance. -6 The microbial pretreatment data were then obtained by CLR transformation. The feature extraction module is used to perform feature screening on the microbial preprocessed data using the LASSO regression algorithm, shrinking the feature coefficients of non-microbial biomarkers to zero, adjusting the feature coefficients of microbial biomarkers to be non-zero, and retaining the non-zero coefficients as input features of the model; the microbial biomarkers are the microbial biomarkers in the biomarker combination described in the above technical solution. The model prediction module includes a colorectal cancer postoperative recurrence status prediction module and / or a colorectal cancer postoperative recurrence risk prediction module. The colorectal cancer postoperative recurrence prediction module is used to input the model input features into the colorectal cancer postoperative recurrence prediction model constructed by the training method described in the above technical solution to obtain the prognostic result. The colorectal cancer postoperative recurrence risk prediction module is used to input the model input features into the colorectal cancer postoperative recurrence risk assessment model constructed by the training method described in the above technical solution to obtain prognostic results.

[0045] The data acquisition module of this invention includes a microbial relative abundance data acquisition module, which acquires the microbial relative abundance data in tumor samples from colorectal cancer patients. The method by which the microbial relative abundance data acquisition module acquires the microbial relative abundance data is consistent with the preceding description and will not be repeated here.

[0046] In one implementation, the data acquisition module of the present invention further includes a metabolomics data acquisition module for acquiring non-targeted metabolomics data from tumor samples of colorectal cancer patients. The method by which the metabolomics data acquisition module of the present invention acquires non-targeted metabolomics data is consistent with that described above and will not be repeated here.

[0047] As one implementation, the data preprocessing module of the present invention further includes a non-targeted metabolomics data preprocessing module, which performs z-score normalization on the non-targeted metabolomics data, sets the mean to 0 and the standard deviation to 1, and obtains the normalized ionic strength values ​​of the metabolites.

[0048] As one implementation method, when obtaining the standardized ionic strength values ​​of metabolites, the feature extraction module of the present invention is used to perform feature screening on the microbial preprocessing data and the standardized ionic strength values ​​of metabolites using the LASSO regression algorithm, shrinking the feature coefficients of non-microbial markers and non-metabolite markers to zero, adjusting the feature coefficients of microbial markers and non-metabolite markers to be non-zero, and retaining the non-zero coefficients as model input features; the metabolite markers are the metabolite markers in the marker combination described in the above technical solution.

[0049] Based on the prediction system described in this invention, only routine 16S rRNA gene sequencing is required to detect the relative abundance data of microorganisms in samples from colorectal cancer patients. This data is then input into the data preprocessing module. After data preprocessing, feature extraction, and prediction by a colorectal cancer postoperative recurrence risk prediction model, the system can accurately determine whether the colorectal cancer patient has experienced postoperative recurrence. Furthermore, by inputting the data into the data preprocessing module, after data preprocessing, feature extraction, and prediction by a colorectal cancer postoperative recurrence status prediction model, the system can accurately determine the postoperative prognostic stratification of the colorectal cancer patient. Based on the detection of the relative abundance data of microorganisms in the samples from colorectal cancer patients, further metabolomics data can be detected using LC-MS metabolomics detection technology. Inputting the detection results into the prediction system can further improve the accuracy and sensitivity of the assessment. The prediction system of this invention is simple to operate, can achieve automated analysis, and is suitable for routine clinical applications.

[0050] The present invention also provides a computer device, comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the training method for the postoperative prognostic model of colorectal cancer as described in the above technical solutions or to evaluate or predict postoperative colorectal cancer using the prediction system described in the above technical solutions.

[0051] The present invention also provides a computer-readable medium having a computer program stored thereon, which, when executed by a processor, implements the training method for the postoperative prognostic model of colorectal cancer as described in any of the above technical solutions, or uses the prediction system described in the above technical solutions to evaluate or predict postoperative colorectal cancer.

[0052] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the training method for the postoperative prognostic model of colorectal cancer as described in the above technical solutions, or uses the prediction system described in the above technical solutions to evaluate or predict postoperative colorectal cancer.

[0053] To further illustrate the present invention, the following detailed description, in conjunction with the accompanying drawings and embodiments, provides a combination of biomarkers related to postoperative prognosis of colorectal cancer, their application, and a prediction system, but these descriptions should not be construed as limiting the scope of protection of the present invention.

[0054] Example 1 Screening and identification of marker combinations 1. Sample collection and ethical approval This study included 126 patients with colorectal cancer who underwent surgery at Nankai University People's Hospital between October 2021 and December 2022. All patients were pathologically confirmed to have colorectal cancer. Exclusion criteria included: 1) a history of colorectal cancer surgery or other malignant tumors; 2) preoperative chemotherapy, radiotherapy, immunotherapy, or neoadjuvant therapy; 3) concurrent intestinal obstruction or other serious systemic diseases; and 4) use of antibiotics within one month prior to surgery. The study protocol complied with the principles of the Declaration of Helsinki and was approved by the Ethics Committee of Nankai University People's Hospital (Approval No.: 2021-B37). All participants signed written informed consent forms. Mucosal tissue (approximately 0.5cm × 0.5cm) was aseptically collected from the tumor site, immediately placed in cryopreservation tubes, flash-frozen in liquid nitrogen, and then stored at -80°C. Ultimately, 115 samples were subjected to 16S rRNA gene sequencing and LC-MS non-targeted metabolomics testing. All patients were followed up until June 25, 2025. Among them, 27 cases relapsed (relapse rate 21.4%), with a median relapse time of 12.80 months (IQR 7.12-21.25 months).

[0055] 2. 16S rRNA gene sequencing and microbial biomarker screening (1) DNA extraction: Genomic DNA was extracted from the samples using the ZR Fungal / Bacterial DNA Kit (Zymo Research, Irvine, CA, USA), strictly following the kit instructions. DNA integrity was verified by 1.0% agarose gel electrophoresis (120V, 30min), and clear DNA bands without tailing were observed. Quantification was performed using a NanoDrop spectrophotometer (Thermo Fisher Scientific, Wilmington, DE), ensuring a DNA concentration ≥50ng / μl and an A260 / A280 ratio between 1.8 and 2.0.

[0056] (2) PCR amplification: Primers were designed targeting the V3-V4 variable region of the 16S rRNA gene. Forward primer: 5'-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG-3' (SEQ ID NO:1); Reverse primer: 5'-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC-3' (SEQ ID NO:2). PCR reaction system (25 μl): 12.5 μl of 2×Taq PCR Master Mix, 1 μl each of forward and reverse primers (10 μM), 2 μl of template DNA, and 8.5 μl of enzyme-free water. Amplification conditions: 95℃ pre-denaturation for 3 min; 95℃ denaturation for 30 s, 55℃ annealing for 30 s, 72℃ extension for 30 s, for a total of 35 cycles; 72℃ final extension for 5 min.

[0057] (3) Sequencing and Data Analysis: The amplified products were purified using Vazyme VAHTS™ DNA Clean Beads, quantified using a Qubit 4 fluorometer, and then subjected to paired-end sequencing (2×300bp) after equal mixing. The sequencing data were analyzed using QIIME2 (v2019.4), and ASVs were generated by quality filtering using the DADA2 plugin (truncation parameters: forward read length 280bp, reverse read length 220bp, maximum expected error E=2). Species annotation was performed using a Naive Bayes classifier trained on the SILVA database (v138.1), and sequences that were not classified into bacterial domains, chloroplasts, and mitochondria (accounting for <0.01%) were removed. α-diversity (Ace, Chao, Sobs, Shannon indices) and β-diversity (Bray-Curtis distance, unweighted UniFrac distance) were calculated using Mothur software, and differential microbial analysis was performed using ANCOM-BC2 (q<0.05 indicates significant difference).

[0058] (4) Microbial biomarker screening: Based on the postoperative recurrence status, the relative abundance of the detected microbial genera (a total of 434 genera) was used as candidate features. LASSO regression algorithm (α=0.01, maximum iterations 1000) was used for feature selection. Feature selection frequency was calculated using stratified five-fold cross-validation. Five microbial biomarkers with non-zero coefficients were retained: *Peptostreptococcus*, *Fusobacterium*, *Bacteroides*, *Porphyromonas*, and *Prevotella*. Differential analysis showed that the abundance of *Peptostreptococcus* and *Fusobacterium* was significantly increased in the recurrence group (*Peptostreptococcus*: recurrence group vs. non-recurrence group = 2.28 times, q=0.02; *Fusobacterium*: 3.0 times, q=0.006), while *Bacteroides* was significantly decreased in the recurrence group (q=0.035). The abundance of *Porphyromonas* and *Prevotella* showed an increasing trend in the recurrence group, but the differences did not reach statistical significance. See details in [link to relevant documentation]. Figure 1 It is speculated that *Peptostreptococcus*, *Fusobacterium*, *Bacteroides*, *Porphyromonas*, and *Prevotella* could serve as microbial biomarkers for assessing the prognosis of colorectal cancer.

[0059] 3. LC-MS metabolomics detection and metabolite biomarker screening (1) Sample pretreatment: Take about 50 mg of tumor mucosa sample, add 1 ml of ice bath methanol aqueous solution (4:1, v / v), vortex mix for 30 s; sonicate on ice bath for 30 min (power 180 W, working for 3 s, intermittent for 5 s); incubate at -20 ℃ for 1 h; centrifuge at 16000×g, 4 ℃ for 20 min, and take the supernatant into a new centrifuge tube; freeze dry in a vacuum freeze dryer, then add 100 μl of methanol aqueous solution (1:1, v / v) to reconstitute; centrifuge at 20000×g, 4 ℃ for 15 min, and take the supernatant for LC-MS analysis.

[0060] (2) LC-MS detection: A SHIMADZU LC-30A UHPLC system combined with a Q Exactive™ Plus mass spectrometer (Thermo Scientific) was used. The chromatographic column was an ACQUITY UPLC® HSS T3 (2.1×100mm, 1.8μm; Waters, Milford, MA, USA). Chromatographic conditions: column temperature 40℃, flow rate 0.3ml / min, mobile phase A was 0.1% formic acid aqueous solution, mobile phase B was acetonitrile, gradient elution program: 0-2min (0% B), 2-6min (0-48% B linear gradient), 6-10min (48-100% B linear gradient), 10-12min (100% B hold), 12-12.1min (100-0% B), 12.1-15min (0% B equilibration). Mass spectrometry conditions: Electrospray ionization (ESI) mode, scan range 70-1050 m / z, first-order mass spectrometry resolution 70000, second-order mass spectrometry resolution 17500, collision energies 20, 40, and 60 eV.

[0061] (3) Data Analysis and Metabolite Biomarker Screening: Raw data underwent peak alignment, retention time correction, and peak area extraction using MSDIAL software. Metabolites were identified using HMDB, MassBank, GNPS databases, and a self-built standard library (matching conditions: RT deviation ±0.2 min, MS deviation ±5 ppm, MS / MS matching score ≥80). Features with missing values ​​>50% or RSD >30% in QC samples were removed, retaining 721 endogenous metabolites. OPLS-DA was used to compare the relapse and non-relapse groups for screening. P Forty-six differentially expressed metabolites were identified with a VIP value <0.05 and a VIP value >1. Using the normalized intensity values ​​of these 46 differentially expressed metabolites as candidate features, the LASSO regression algorithm was used to screen for five metabolite biomarkers: alanylglutamate, putrescine, arginine, histidine, and sebacic acid. Differential analysis showed that alanylglutamate and putrescine were significantly enriched in the relapse group (alanylglutamate q=0.172, putrescine q=0.053), while arginine, histidine, and sebacic acid were significantly downregulated in the relapse group (arginine q=0.038, histidine q=0.151, sebacic acid q=0.169). See details in [link to relevant documentation]. Figure 2 It is speculated that alanine, putrescine, arginine, histidine, and sebacic acid could serve as metabolite markers for assessing the prognosis of colorectal cancer.

[0062] Example 2 Construction and Performance Validation of a Colorectal Cancer Prognostic Prediction System Based on Five Microbial Biomarkers 1. Microbial data preprocessing: The relative abundance of the 434 genera in Example 1 was zero-filled (a pseudo-count of 1×10⁻⁶ was added).-6 (Absolute value), and then perform a central log-ratio (CLR) transformation to eliminate compositional bias in the component data.

[0063] 2. Feature Extraction and Model Building (1) The LASSO regression algorithm was used to screen the features of 434 bacterial genera in Example 1. The L1 regularization coefficient α=0.01 was set to shrink the feature coefficients of non-microbial markers to zero, and the feature coefficients of microbial markers were adjusted to be non-zero. The non-zero coefficients were retained as input features of the model.

[0064] (2) Classification Model Construction and Optimization: A recurrence status prediction model was constructed using the Random Forest (RF) algorithm. Stratified sampling was used to divide the 54 colorectal cancer patients shown in Table 1 into a training set (38 cases) and a test set (16 cases). The training set was used for model training, and the test set was used for performance validation. The training set was used for model parameter optimization, and the test set was used to validate model performance, ultimately achieving accurate prediction of postoperative recurrence status in colorectal cancer. Specifically: Model parameters were optimized through grid search: number of estimators (50, 100, 200), maximum depth (None, 5, 10), minimum number of sample splits (2, 5, 10), and minimum number of sample leaf nodes (1, 2, 5). The optimal parameter combination was: estimators=100, maximum depth=None, minimum number of sample splits=2, minimum number of sample leaf nodes=1, using Gini impurity as the splitting criterion.

[0065] Table 1. Sample information of colon cancer patients

[0066] 3. Model Performance Validation (1) Performance validation results of the classification model (prediction model for postoperative recurrence of colorectal cancer) are as follows: Figure 3 and Figure 4 As shown, in the training set of the five microbial biomarkers, the model built based on the five microbial biomarkers has an AUC of 1.00, demonstrating good discriminative ability. In the test set, the model with the five microbial biomarkers has an AUC of 0.81, an accuracy of 0.71, a sensitivity of 0.75, a specificity of 0.67, and a precision of 0.67.

[0067] (2) Model interpretability analysis: Feature importance was assessed using random forest, and SHAP (Shapley Additive exPlanations) analysis was used to evaluate the contribution of each feature to the model's prediction results. The results are as follows: Figure 5 and Figure 6As shown, the importance of the five microbial biomarkers, ranked as follows: *Peptostreptococcus* 35%, *Fusobacterium* 28%, *Bacteroides* 20%, *Prevotella* 8.6%, and *Porphyromonas* 8.4%. *Peptostreptococcus* and *Fusobacterium* had the highest mean absolute Sharpe ratios (SHAP values), making them key characteristics influencing recurrence prediction. Specifically, increased abundance of *Peptostreptococcus* and *Fusobacterium* increased the risk of recurrence (positive SHAP value), while increased levels of *Bacteroides* decreased the risk of recurrence (negative SHAP value).

[0068] Comparative Example 1 Construction and Performance Validation of a Fusobacterium-Based Colorectal Cancer Prognostic Prediction System Similar to Example 2, the only difference is that, during feature extraction, only *Fusobacterium* bacteria are used as markers, and the relative abundance of *Fusobacterium* bacteria is directly used as the model input feature. The classification model performance validation (test set) results are as follows: Figure 7 and Figure 8 As shown, the prediction model constructed using a single *Fusobacterium* species had an AUC of 0.79 (95% CI: 0.54–1.00), accuracy of 0.65, sensitivity of 0.88, specificity of 0.44, and precision of 0.58. The prediction model constructed in Example 2 outperformed the prediction model constructed using a single *Fusobacterium* species in terms of key performance indicators such as AUC, accuracy, specificity, and precision, demonstrating higher predictive efficacy and stability.

[0069] Comparative Example 2 Construction and Performance Validation of a Colorectal Cancer Prognostic Prediction System Based on Peptostreptococcus Similar to Example 2, the only difference is that during feature extraction, only *Streptococcus* bacteria are used as markers, and the relative abundance of *Streptococcus* bacteria is directly used as the model input feature. The classification model performance validation (test set) results are as follows... Figure 9 and Figure 10 As shown, the prediction model constructed using *Peptostreptococcus* as a single feature had an AUC of 0.60 (95% CI: 0.27-0.87), accuracy of 0.65, sensitivity of 0.62, specificity of 0.67, and precision of 0.62. The prediction model constructed in Example 2 outperformed the prediction model constructed using *Peptostreptococcus* as a single feature in terms of key performance indicators such as AUC, accuracy, specificity, and precision, demonstrating higher predictive efficacy and stability.

[0070] Comparative Example 3 Construction and Performance Validation of a Colorectal Cancer Prognostic Prediction System Based on Five Metabolic Biomarkers 1. Metabolite data preprocessing: The intensity values ​​of the 46 metabolite markers in Example 1 were z-score standardized (mean=0, standard deviation=1) to eliminate dimensional differences between different metabolites.

[0071] 2. Feature Extraction and Model Building (1) The LASSO regression algorithm was used to screen the features of 46 metabolites in Example 1. The L1 regularization coefficient α=0.01 and the maximum number of iterations was 1000. The selection frequency of each feature was calculated in the hierarchical five-fold cross-validation. The selection frequency of the five metabolite markers was greater than 0 and was retained as the input features of the model.

[0072] (2) Classification model construction and optimization: The classification model is constructed and optimized in accordance with the method in step 2 (2) of Example 1.

[0073] 3. Model Performance Validation (1) Performance verification results of the classification model are as follows Figure 11 As shown, the AUC of the five metabolite biomarker models in the training set was 1.00, demonstrating good discriminative ability. The AUC of the five metabolite biomarker models in the test set was 0.66 (95% CI: 0.38–0.91).

[0074] (2) Model interpretability analysis: SHAP (Shapley Additive exPlanations) analysis was used to evaluate the contribution of each feature to the model's prediction results. The results are as follows: Figure 12 As shown, putrescine has the highest mean absolute SAP value, which is a key feature affecting relapse prediction. Elevated putrescine concentration increases the risk of relapse (positive SAP value), while elevated arginine levels decrease the risk of relapse (negative SAP value).

[0075] Example 3 Construction and Performance Validation of a Colorectal Cancer Prognostic Prediction System Based on Five Microbial Markers and Five Metabolic Markers 1. Data Preprocessing (1) Microbial data preprocessing: The relative abundance of 434 genera in Example 1 was zero-filled (1×10⁻⁶ pseudo-counts were added). -6 Then, a central log-ratio (CLR) transformation is performed to eliminate compositional bias in the component data.

[0076] (2) Metabolite data preprocessing: The intensity values ​​of the 46 metabolite markers in Example 1 were standardized by z-score (mean=0, standard deviation=1) to eliminate the dimensional differences between different metabolites.

[0077] 2. Feature Extraction and Model Building (1) The LASSO regression algorithm was used to screen features of 434 bacterial genera and 46 metabolites in Example 1. The L1 regularization coefficient α=0.01 and the maximum number of iterations was 1000. The selection frequency of each feature was calculated in the hierarchical five-fold cross-validation. The selection frequencies of the five microbial markers and the five metabolite markers were all greater than 0 and were retained as input features of the model.

[0078] (2) Classification model construction and optimization: The classification model is constructed and optimized in accordance with the method in step 2 (2) of Example 1.

[0079] (3) Risk scoring model construction: A risk scoring model was constructed using the Random Survival Forest (RSF) algorithm, with recurrence-free survival time and recurrence status as outcome indicators. 115 patients with complete surgery dates and follow-up data were included. To ensure the unbiasedness of risk estimation, 1000 survival trees were fitted to each training set using a stratified five-fold cross-validation scheme. The ensemble cumulative risk was calculated for each patient using the predictive function of the RSF model and defined as a risk score (representing the expected number of recurrence events). Finally, patients were divided into a high-risk group (n=57) and a low-risk group (n=58) based on the median risk score (2.02).

[0080] 3. Model Performance Validation (1) Performance validation results of the classification model (prediction model for postoperative recurrence of colorectal cancer) are as follows: Figure 13 and Figure 14 As shown, the AUC of the training set model with 5 microbial biomarkers and 5 metabolite biomarkers was 1.00, demonstrating good discriminative ability. The test set model had an AUC of 0.91, accuracy of 0.81, sensitivity of 0.75, specificity of 0.88, and precision of 0.86. The predictive performance of the integrated model with 5 microbial biomarkers and 5 metabolite biomarkers was significantly improved compared to that with 5 microbial biomarkers or 5 metabolite biomarkers alone. P <0.05).

[0081] (2) Performance validation results of the risk scoring model (postoperative recurrence risk assessment model for colorectal cancer) are as follows: Figure 15 As shown, Kaplan-Meier survival analysis revealed that relapse-free survival was significantly shorter in the high-risk group than in the low-risk group (log-rank). P <0.0001). Multivariate Cox regression analysis showed that the risk score was an independent prognostic factor for postoperative recurrence of colorectal cancer (HR=1.59 after adjusting TNM staging, 95% CI: 1.35-1.88). P <0.0001).

[0082] Example 4 Functional verification of marker combination 1. Bacterial co-aggregation experiment (1) Experimental materials: Fusobacterium spp. ( F. nucleatum The representative strain is *Fusobacterium nucleatum* (…). Fusobacterium nucleatum ATCC 25586, Peptostreptococcus ( P.anaerobius The representative strain is *Streptococcus anaerobicans*. Peptostreptococcus anaerobius ATCC 27337 was purchased from Guangdong Provincial Microbial Culture Collection Center; Brain Heart Infusion (BHI) medium, heme chloride, vitamin K1, and L-cysteine ​​hydrochloride monohydrate were purchased from Solarbio; L-arginine and D-galactose were purchased from Sigma.

[0083] (2) Experimental method: F. nucleatum and P.anaerobius The bacteria were inoculated into BHI medium supplemented with 5 μg / ml heme chloride, 1 μg / ml vitamin K1, and 0.5 mg / ml L-cysteine ​​hydrochloride monohydrate, and cultured anaerobicly at 37°C for 24 h until the logarithmic growth phase. The bacteria were collected by centrifugation at 5000 rpm for 5 min, washed twice with sterile PBS, and the bacterial concentration was adjusted to OD0.05. 600 =1.0. (The rest of the text appears to be a mix of characters and symbols, possibly representing a corrupted or incomplete sentence F. nucleatum bacterial solution and P. anaerobius Mix equal volumes of bacterial suspensions (0.5 ml each), place in a cuvette, gently vortex for 10 seconds, and immediately measure the OD. 600 OD value (0h); after anaerobic incubation at 37℃ for 2h and 5h, OD was measured again. 600 Value. Calculate the aggregation rate: Aggregation rate (%) = 100% - (2h or 5h OD) 600 Value / 0h OD 600 Value) × 100%. Setting F. nucleatum Single culture, P. anaerobius Single culture, F. nucleatum + E. coli (MG1655) P. anaerobius + E. coli The control group was used as the control group; PBS (control), D-galactose (Fap2 inhibitor, 50mM), and L-arginine (RadD inhibitor, 50mM) treatment groups were set up to explore the aggregation mechanism.

[0084] (3) Experimental results are as follows Figures 16-18 As shown, Spearman correlation analysis of the abundance of related microorganisms revealed a significant positive correlation between *Fusobacterium* and *Peptostreptococcus* (R=0.42, P<0.01), suggesting a possible synergistic effect between the two in the gut microbiota. To further verify the interaction between the two bacteria, a bacterial aggregation experiment was conducted. F. nucleatum The self-aggregation rate of single-cultured organisms was 15.4%. P. anaerobius The self-aggregation rate of single-cultured organisms was 3.6%. E. coli The self-aggregation rate of single-cultured organisms was 8.5%; F. nucleatum and P. anaerobius The aggregation rate was 58.0% after 2 hours of co-cultivation and 79.1% after 5 hours; while F. nucleatum + E. coli , P. anaerobius + E. coli The aggregation rate was <5.7% after 5 hours of co-cultivation. Treatment with PBS and D-galactose had an effect on... F. nucleatum and P. anaerobius Coagulation was not significantly affected by arginine, while L-arginine treatment significantly reduced the coagulation rate (28.5% decrease at 2h, 17.4% at 5h, P<0.001), indicating that... F. nucleatum and P. anaerobius Co-aggregation depends on RadD-mediated adhesion.

[0085] 2. Experiment on the formation of a biofilm by two species (1) Experimental method: The logarithmic growth phase of the chromatic apoptosis was used to test the chromatic F. nucleatum and P. anaerobius Dilute to 5×10 8 Mix cells / ml at a 1:1 ratio, and seed 200 μl into a 96-well plate. Incubate anaerobicly at 37°C for 24 h. Discard the culture medium, wash 2-3 times with sterile PBS, and air dry for 1 h. Add 100 μl of crystal violet solution (0.1%) and stain for 30 min. Discard the staining solution, wash 2-3 times with sterile PBS, and air dry. Add 200 μl of 96% ethanol and incubate for 30 min to dissolve the crystal violet. Measure the OD. 595 This value reflects the biofilm content. (Setting...) F. nucleatum Single culture, P. anaerobius Single culture was used as the control group; different concentrations of putrescine (10 μM, 50 μM) were set up as treatment groups, and PBS was used as the control group to explore the effect of putrescine on biofilm formation.

[0086] (2) Experimental results are as follows Figures 19-21 As shown, under single culture conditions, P. anaerobius It is difficult to form a distinct biofilm structure; while with F. nucleatum In the co-training system, P. anaerobius Able to F. nucleatum They aggregate and cross-link to form a relatively dense biofilm. P. anaerobius Single culture almost never forms a biofilm (OD) 595 =0.43), F. nucleatum OD forming biofilm in single culture 595 =0.78, and F. nucleatum and P. anaerobius OD co-culture to form biofilm 595 =1.35 ( P <0.001 vs F. nucleatum (Single culture). OD of the 10 μM putrescine-treated group 595 =1.28 ( P =0.056 vs control group), OD of the 50 μM putrescine treatment group 595 =1.93 ( P <0.05 vs control group), indicating that putrescine can promote concentration-dependent enhancement. F. nucleatum and P. anaerobius Formation of bispecific biofilms.

[0087] 3. Tumor cell adhesion experiment (1) Experimental materials: Human colorectal cancer cell line HCT116 was purchased from the Cell Bank of the Chinese Academy of Sciences; DMEM medium, fetal bovine serum (FBS), penicillin-streptomycin were purchased from Gibco; CellTracker™ Green CMFDA, CellTracker™ Red CMTPX, and DAPI were purchased from Thermo Scientific.

[0088] (2) Experimental methods: HCT116 cells were seeded in 12-well plates and cultured in DMEM medium containing 10% FBS and 1% penicillin-streptomycin until 80% confluence. F. nucleatum Stain with CellTracker™ Green CMFDA. P. anaerobius Stain with CellTracker™ Red CMTPX, incubate at 37°C for 1 hour, and wash three times with sterile PBS. F. nucleatum and P. anaerobius The bacteria were separately or mixed at a 1:1 ratio and inoculated into HCT116 cell monolayers at an MOI of 50. The cells were incubated at 37°C and 5% CO2 for 1 h. The cells were washed twice with sterile PBS to remove unadhered bacteria. The cells were fixed with 4% paraformaldehyde for 30 min and stained with DAPI for 10 min. Bacterial adhesion was observed using a laser confocal microscope (Zeiss LSM 800). Fluorescence intensity was quantitatively detected using a fluorescence microplate reader (Spark, Tecan) to reflect the amount of bacterial adhesion.

[0089] (3) Experimental results are as follows Figure 22 and Figure 23 As shown, observations using a laser confocal microscope reveal that... F. nucleatum and P. anaerobius Both can adhere individually to the surface of HCT116 cells; during co-culture, P. anaerobius Mainly adheres to F. nucleatumOn the surface, bacterial aggregates form and anchor to the surface of tumor cells. Quantitative fluorescence results show that... F. nucleatum and P. anaerobius Co-culture group F. nucleatum The adhesion amount is significantly higher than F. nucleatum Single culture group (relative fluorescence intensity: 10⁹ vs 100), P <0.05), and P. anaerobius The amount of adhesion did not change significantly. P >0.05), indicating P. anaerobius Can enhance F. nucleatum Adhesion ability to colorectal cancer cells.

[0090] 4. Animal experiments to verify (1) Experimental animals: Eight-week-old male C57BL / 6J mice (SPF grade) were purchased from Vital River and housed at the Animal Experiment Center of Nankai University (12h light / 12h darkness, free access to water and food). The experiments were conducted after one week of acclimatization. The animal experimental protocol was approved by the Animal Ethics Committee of Nankai University (approval number: SYXK2019-0001).

[0091] (2) Experimental methods: Mice were given an antibiotic cocktail containing streptomycin (1 mg / ml), gentamicin (1 mg / ml), ampicillin (0.5 mg / ml), and vancomycin (0.5 mg / ml) for 3 days, followed by 2 days of sterile water to establish a gut microbiota depletion model. Mice were randomly divided into 3 groups (n=5 / group): F. nucleatum Group, P. anaerobius Group, F. nucleatum + P. anaerobius Co-culture group. Bacterial suspension (1×10⁻⁶) was administered via gavage. 8 CFU / mouse / day was administered for 5 consecutive days; the control group received an equal volume of PBS. Fecal samples were collected on days 1, 3, 6, and 9 after gavage and stored at -80℃. Mice were sacrificed 9 days later, and colonic mucosal tissue was collected. qRT-PCR was used to quantitatively detect CFU / mouse and colonic mucosal tissue. F. nucleatum and P. anaerobius Loading capacity, primer sequences: F. nucleatum Forward 5'-CTGTATTGCGTTGGAAACTGTGTAA-3' (SEQ ID NO:3), reverse 5'-TACCAGGGTATCTAATCCTGTTTGC-3' (SEQ ID NO:4); P. anaerobius Forward 5'-AGACGAATTCAAGTCAGTAAATACA-3' (SEQ ID NO:5), reverse 5'-CTCCTATCCACCAGGATATCAA-3' (SEQ ID NO:6).

[0092] (3) Experimental results are as follows Figure 24 and Figure 25 As shown, one day after the gavage was completed, F. nucleatum feces from the group and co-culture group F. nucleatum There was no significant difference in load; subsequently, the feces of the co-culture group F. nucleatum The load gradually increased, and was significantly higher at 9 days. F. nucleatum Group( P <0.01). P. anaerobius feces from the group and co-culture group P.anaerobius The viral load decreased below the detection limit three days after gavage administration. In the colonic mucosa, the co-culture group... F.nucleatum The load capacity is significantly higher than F.nucleatum Group( P =0.051), indicating P.anaerobius Can enhance F.nucleatum Colonization ability in mouse colonic mucosa.

[0093] Example 5 A predictive system for postoperative prognosis of colorectal cancer includes a data acquisition module, a data preprocessing module, a feature extraction module, and a model prediction module; The data acquisition module includes a microbial relative abundance data acquisition module and a metabolomics data acquisition module. The microbial relative abundance data acquisition module performs quality filtering, ASV generation, and species annotation on the 16S rRNA gene sequencing data to obtain microbial relative abundance data. The metabolomics data acquisition module performs peak alignment, retention time correction, peak area extraction, and metabolite identification on the LC-MS metabolomics detection results to obtain non-targeted metabolomics data.

[0094] The data preprocessing module includes a microbial relative abundance data preprocessing module and a non-targeted metabolomics data preprocessing module; the microbial relative abundance data preprocessing module first adds a pseudo-count of 1×10 to the relative abundance. -6 The data is then subjected to CLR transformation to obtain microbial pretreatment data. The non-targeted metabolomics data pretreatment module performs z-score standardization on the non-targeted metabolomics data, setting the mean to 0 and the standard deviation to 1, to obtain the standardized ionic strength values ​​of metabolites, which are used as metabolite pretreatment data.

[0095] The feature extraction module uses the LASSO regression algorithm to screen features in the pre-processed microbial and metabolite data. The L1 regularization coefficient α = 0.01 and the maximum number of iterations is 1000. Coefficients of unimportant features are reduced to zero. Non-zero coefficients of microbial and metabolite markers are retained as input features to the model. The microbial markers and metabolite markers are those determined in Example 1. The model prediction module includes a colorectal cancer postoperative recurrence status prediction module and / or a colorectal cancer postoperative recurrence risk prediction module; the colorectal cancer postoperative recurrence status prediction module is a classification model constructed by the training method described in Example 3, and the colorectal cancer postoperative recurrence risk prediction module is a risk scoring model constructed by the training method described in Example 3.

[0096] Application Example 1 1. Target audience This treatment is suitable for patients with stage I-III colorectal cancer who undergo surgical treatment, especially those with the same TNM stage but uncertain recurrence risk. It can assist clinicians in developing individualized treatment and follow-up plans.

[0097] 2. Testing Procedure and Result Interpretation 2.1 Sample collection: After intraoperative tumor mucosal samples were collected from colorectal cancer patients, they were preserved and transported according to the method in Example 1.

[0098] 2.2 Detection and Data Analysis: The samples were sent to the testing laboratory and 16S rRNA gene sequencing and LC-MS metabolomics detection were performed according to the method in Example 1 to obtain microbial relative abundance data and non-targeted metabolomics data, which were then input into the prediction system obtained in Example 5.

[0099] 2.3 Results Output and Interpretation: (1) Relapse status prediction results: The system outputs "high relapse probability" or "low relapse probability" and the corresponding predicted probability (0-100%). A predicted probability ≥60% is judged as high relapse probability (relapse group), and <60% is judged as low relapse probability (non-relapse group).

[0100] (2) Risk scoring and prognostic stratification: The system outputs a comprehensive risk score (range 0.1~9.2) and prognostic stratification results (high-risk group / low-risk group). Patients in the high-risk group have a significantly increased risk of postoperative recurrence. It is recommended to increase the intensity of postoperative adjuvant chemotherapy, such as extending the chemotherapy cycle and shortening the follow-up interval to once every 3 months, including tumor marker detection, imaging examinations, etc. Patients in the low-risk group have a lower risk of postoperative recurrence. They can adopt conventional adjuvant therapy and follow-up protocols, and be followed up once every 6 months to avoid overtreatment.

[0101] As can be seen from the above, the biomarker combination and prediction system provided by this invention is closely related to the postoperative prognosis of colorectal cancer. It can be used to predict or assess whether colorectal cancer patients will relapse after surgery, distinguish between patients with recurrent colorectal cancer and those without recurrence, and perform stratified assessment of the future recurrence risk of colorectal cancer patients after surgery. It has high specificity, high sensitivity, and good accuracy.

[0102] Although the above embodiments have provided a detailed description of the present invention, they are only some embodiments of the present invention, and not all embodiments. People can obtain other embodiments based on these embodiments without creative effort, and these embodiments all fall within the protection scope of the present invention.

Claims

1. A combination of biomarkers associated with postoperative prognosis of colorectal cancer, characterized in that, The biomarker combination includes microbial biomarkers; the microbial biomarkers include Peptostreptococcus spp. ( Peptostreptococcus ), Fusobacterium genus ( Fusobacterium Bacteroides ( Bacteroides ), Porphyromonas spp. ( Porphyromonas ) and Prevotella spp. Prevotella ).

2. The marker combination according to claim 1, characterized in that, The biomarker combination also includes metabolite biomarkers; the metabolite biomarkers include one or more of alanine glutamate, putrescine, arginine, histidine, and sebacic acid.

3. The use of a reagent for detecting the combination of biomarkers as described in claim 1 or 2 in the preparation of products for predicting or assessing postoperative prognosis of colorectal cancer.

4. The application according to claim 3, characterized in that, The prediction or assessment of postoperative prognosis for colorectal cancer includes one or more of the following: (1) Determine whether colorectal cancer patients have recurred after surgery; (2) Differentiate between colorectal cancer patients who have recurred after surgery and those who have not recurred after surgery; (3) Stratify the risk of future recurrence of colorectal cancer patients after surgery.

5. The application according to claim 3 or 4, characterized in that, The products are reagents, kits, test strips, digital microfluidic chips, or biosensors.

6. A training method for a postoperative prognostic model of colorectal cancer, characterized in that, This includes data acquisition, data preprocessing, feature selection, and model training; The data acquisition includes: acquiring relative abundance data of microorganisms in tumor samples from colorectal cancer patients; The data preprocessing includes: adding a pseudo-count of 1×10 to the relative abundance data of microorganisms. -6 The microbial pretreatment data were then obtained by CLR transformation. The feature selection includes: using the LASSO regression algorithm to perform feature selection on the microbial preprocessed data, shrinking the feature coefficients of non-microbial biomarkers to zero, adjusting the feature coefficients of microbial biomarkers to be non-zero, and retaining the non-zero coefficients as training set data; the microbial biomarkers are the microbial biomarkers in the biomarker combination of claim 1 or 2. The model training includes random forest algorithm model training and / or random survival forest algorithm model training; The random forest algorithm model training includes: using the postoperative recurrence status of colorectal cancer patients as output data, training the random forest algorithm model using the training set data to obtain a colorectal cancer postoperative recurrence status prediction model; the postoperative recurrence status is recurrence or no recurrence; The training of the random survival forest algorithm model includes: using the integrated cumulative risk score of colorectal cancer patients after surgery as output data, and using the training set data to train the random survival forest algorithm model to obtain a risk assessment model for recurrence after colorectal cancer surgery.

7. The training method according to claim 6, characterized in that, The data acquisition also includes: acquiring non-targeted metabolomics data of tumor samples from colorectal cancer patients; The data preprocessing also includes: performing z-score normalization on the non-targeted metabolomics data, setting the mean = 0 and the standard deviation = 1, to obtain the standardized ionic strength values ​​of the metabolites; The feature selection includes: using the LASSO regression algorithm to perform feature selection on the microbial pretreatment data and the standardized ionic intensity values ​​of the metabolites, shrinking the feature coefficients of non-microbial markers and non-metabolite markers to zero, adjusting the feature coefficients of microbial markers and non-metabolite markers to be non-zero, and retaining the non-zero coefficients as training set data; the metabolite markers are the metabolite markers in the marker combination of claim 2.

8. The training method according to claim 6 or 7, characterized in that, The training parameters of the random forest algorithm model include the number of estimators, maximum depth, minimum number of sample splits, and minimum number of sample leaf nodes. The feature selection is performed using L1 regularization, α=0.01, with a maximum number of iterations of 1000. The tumor samples from the colorectal cancer patients were mucosal tissue from the tumor site of the colorectal cancer patients.

9. The training method according to claim 6 or 7, characterized in that, The model training process also includes a model validation step: The model validation includes: acquiring additional tumor samples from colorectal cancer patients, processing the additionally acquired tumor samples from colorectal cancer patients according to the steps of data acquisition, data preprocessing, and feature screening to obtain screened features, which are used as validation set data; inputting the validation set data into the trained model and calculating the accuracy.

10. A predictive system for postoperative prognosis of colorectal cancer, characterized in that, The prediction system includes a data acquisition module, a data preprocessing module, a feature extraction module, and a model prediction module; The data acquisition module includes a microbial relative abundance data acquisition module, used to acquire microbial relative abundance data in tumor samples from colorectal cancer patients; The data preprocessing module includes a microbial relative abundance data preprocessing module, used to first add a pseudo-count of 1×10 to the relative abundance. -6 The microbial pretreatment data were then obtained by CLR transformation. The feature extraction module is used to perform feature screening on the microbial preprocessed data using the LASSO regression algorithm, shrinking the feature coefficients of non-microbial biomarkers to zero, adjusting the feature coefficients of microbial biomarkers to be non-zero, and retaining the non-zero coefficients as model input features. The microbial biomarker is the microbial biomarker in the biomarker combination described in claim 1 or 2; The model prediction module includes a colorectal cancer postoperative recurrence status prediction module and / or a colorectal cancer postoperative recurrence risk prediction module. The colorectal cancer postoperative recurrence prediction module is used to input the model input features into the colorectal cancer postoperative recurrence prediction model constructed by the training method according to any one of claims 6 to 9 to obtain the prognostic result. The colorectal cancer postoperative recurrence risk prediction module is used to input the model input features into the colorectal cancer postoperative recurrence risk assessment model constructed by the training method according to any one of claims 6 to 9 to obtain the prognostic result.

11. The prediction system according to claim 10, characterized in that, The data acquisition module also includes a metabolomics data acquisition module, used to acquire non-targeted metabolomics data of tumor samples from colorectal cancer patients; The data preprocessing module also includes a non-targeted metabolomics data preprocessing module, which is used to perform z-score normalization on the non-targeted metabolomics data, setting the mean = 0 and the standard deviation = 1 to obtain the standardized ionic strength values ​​of metabolites. The feature extraction module is used to perform feature screening on the microbial preprocessed data and the standardized ionic strength values ​​of the metabolites using the LASSO regression algorithm. The feature coefficients of non-microbial markers and non-metabolite markers are shrunk to zero, the feature coefficients of microbial markers and non-metabolite markers are adjusted to be non-zero, and the non-zero coefficients are retained as input features of the model. The metabolite biomarker is the metabolite biomarker in the biomarker combination described in claim 2.

12. A computer device, comprising: The memory, the processor, and the computer program stored in the memory and executable on the processor are characterized in that the processor executes the computer program to implement the training method for the postoperative prognostic model of colorectal cancer according to any one of claims 6 to 9 or the prediction system according to claim 10 or 11.

13. A computer-readable medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the training method of the colorectal cancer postoperative prognosis model according to any one of claims 6 to 9 or uses the prediction system according to claim 10 or 11 to evaluate or predict colorectal cancer postoperatively.

14. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the training method of the colorectal cancer postoperative prognosis model according to any one of claims 6 to 9 or uses the prediction system according to claim 10 or 11 to evaluate or predict colorectal cancer postoperatively.