A method and device for predicting the microbiome features and prognosis of colorectal cancer patients
By extracting features from tissue section images of colorectal cancer patients and performing cluster analysis, the difficulty of detecting microorganisms in tumor tissues has been overcome, enabling a method to accurately predict patient prognostic risks and improving the accuracy and reliability of colorectal cancer treatment.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GENEIS TECH BEIJING CO LTD
- Filing Date
- 2022-11-30
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies are insufficient to accurately detect the microbial characteristics in tumor tissues of colorectal cancer patients. Furthermore, the low microbial biomass in tumor samples and interference from host DNA make detection difficult, and the problem of sample contamination has not been effectively resolved.
By acquiring hematoxylin-eosin stained tissue images of colorectal cancer patients, feature extraction and cluster analysis are performed to determine the category of the sample, thereby obtaining tissue microbial characteristics and prognostic risk.
It provides an effective and rapid way to obtain tumor tissue microbial information, which can accurately predict the prognostic risk of patients and provide more accurate, reliable and comprehensive test results for the clinical treatment of colorectal cancer.
Smart Images

Figure CN115798569B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of biology, and more specifically, to a method and apparatus for predicting the tissue microbial characteristics and prognosis of colorectal cancer patients. Background Technology
[0002] Colorectal cancer (CRC) is the third most common malignant tumor worldwide, severely threatening patients' survival and quality of life. Despite the emergence of new treatment methods, the overall survival rate for CRC patients remains low because most patients are diagnosed at an advanced stage. Recently, numerous studies have explored prognostic indicators for CRC from multiple perspectives and disciplines, achieving promising predictive results, such as digital pathological sections, in situ immune cell infiltration of tumors, and the tumor microbiome. However, the association between pathological images, intratumoral non-cancerous cells, and tumor tissue microbiota remains unknown, and the commonalities among these independent factors in predicting CRC prognosis still need to be explored.
[0003] Mounting evidence suggests that tumor evolution is largely dependent on the tumor microenvironment (TME). The TME, the internal environment in which tumor cells arise and survive, is a complex and integrated system containing non-cellular components such as tumor cells, immune cells, stromal cells, and cytokines. In tumor development, the tumor microenvironment is not a "silent bystander" but rather an "active promoter of cancer progression." Besides the tumor itself, stromal cells and immune cells are the most abundant components in tumor tissue. Studies have found that patients with high stromal scores have poorer overall survival, and stromal scores can serve as an independent prognostic factor for primary gastric cancer. Furthermore, research indicates that immune scores provide a reliable estimate of the recurrence risk in colorectal cancer patients and could be incorporated as a new component of the cancer TNM-immune classification. Therefore, in-depth research into important non-cancerous tumor-resident cell types within the TME is crucial for the development of next-generation tumor stromal-targeted therapies.
[0004] Accurately detecting microorganisms in tumor tissue is a challenging task. First, the microbial biomass in tumor samples is very low. Furthermore, tumor samples exhibit a high host-bacterial DNA ratio, which can lead to host DNA interference with amplicon-based 16S rRNA gene sequencing, resulting in reduced PCR amplification efficiency of bacterial population sequences, or even amplification failure. If metagenomic sequencing is used, obtaining the gene sequences of the microbial community will be difficult without appropriate enrichment procedures. Another issue requiring special attention is sample contamination; the microbial community detected from tumor tissue may not originate from within the tumor itself, but rather from contaminated DNA. For samples with high microbial biomass, the impact of contamination on the microbial community is limited and often negligible. However, for samples with low microbial biomass, contaminated DNA can easily outweigh the low-level microbial DNA in the sample and dominate the sample's microbiome information.
[0005] In view of this, the present invention is proposed. Summary of the Invention
[0006] The purpose of this invention is to provide a method and apparatus for predicting the tissue microbial characteristics and prognosis of colorectal cancer patients.
[0007] This invention is implemented as follows:
[0008] In a first aspect, embodiments of the present invention provide a method for predicting the tissue microbial characteristics of colorectal cancer patients, comprising: acquiring clustering results of multiple known samples and tissue microbial characteristics corresponding to each class of samples in the clustering results; wherein, the samples are hematoxylin-eosin stained section images of tissue sections from colorectal cancer patients; acquiring a sample to be tested, performing feature extraction to obtain the section features of the sample to be tested; based on the section features of the sample to be tested, determining the class to which the sample to be tested belongs in the clustering results, and taking the tissue microbial characteristics corresponding to the class to which the sample to be tested belongs as the tissue microbial characteristics possessed by the sample to be tested.
[0009] Secondly, embodiments of the present invention provide a device for predicting the microbial characteristics of tissues from colorectal cancer patients, comprising: an acquisition module, a feature extraction module, and a prediction module.
[0010] The acquisition module is used to acquire the sample to be tested, the clustering results of multiple known samples as described in the foregoing embodiments, and the tissue microbial characteristics corresponding to each class of samples in the clustering results;
[0011] The feature extraction module is used to extract features from the sample to be tested and obtain slice features; wherein the feature extraction steps are as described in the previous embodiments;
[0012] The prediction module is used to determine the category of the sample to be tested in the clustering results based on the slice characteristics of the sample to be tested, and to take the tissue microbial characteristics corresponding to the category of the sample to be tested as the tissue microbial characteristics of the sample to be tested; wherein, the step of determining the category of the sample to be tested in the clustering results is as described in the previous embodiment.
[0013] Thirdly, embodiments of the present invention provide a device for predicting the prognostic risk of colorectal cancer patients, which includes: an acquisition module, a feature extraction module, and a prediction module.
[0014] The acquisition module is used to acquire the clustering results of the sample to be tested, multiple known samples, and the prognostic risk corresponding to each class of samples in the clustering results; wherein, the samples and the clustering results are as described in the foregoing embodiments;
[0015] The feature extraction module is used to extract features from the sample to be tested and obtain slice features; wherein the feature extraction steps are as described in the previous embodiments;
[0016] The prediction module is used to determine the category of the sample to be tested in the clustering results based on the slice characteristics of the sample to be tested, and to take the prognostic risk corresponding to the category of the sample to be tested as the prognostic risk of the sample to be tested; wherein, the step of determining the category of the sample to be tested in the clustering results is as described in the previous embodiment.
[0017] Fourthly, embodiments of the present invention provide an electronic device, which includes a processor and a memory, wherein the memory is used to store a program, and when the program is executed by the processor, the processor enables the processor to implement a method for predicting the prognostic risk of colorectal cancer patients or a method for predicting the tissue microbial characteristics of colorectal cancer patients as described in the foregoing embodiments.
[0018] The methods for predicting the prognostic risk of colorectal cancer patients include:
[0019] Obtain clustering results for multiple known samples and the prognostic risk corresponding to each class of samples in the clustering results; wherein, the samples and the clustering results are as described in the foregoing embodiments;
[0020] After obtaining the sample to be tested and performing feature extraction, the slice features of the sample to be tested are obtained; wherein, the feature extraction steps are as described in the previous embodiments;
[0021] Based on the slice characteristics of the sample to be tested, the category to which the sample to be tested belongs in the clustering result is determined, and the prognostic risk corresponding to the category to which the sample to be tested belongs is taken as the prognostic risk of the sample to be tested; wherein, the step of determining the category to which the sample to be tested belongs in the clustering result is as described in the previous embodiment.
[0022] Fifthly, embodiments of the present invention provide a computer-readable medium storing a computer program, which, when executed by a processor, implements the method for predicting the tissue microbial characteristics of colorectal cancer patients as described in the foregoing embodiments or the method for predicting the prognostic risk of colorectal cancer patients as described in the foregoing embodiments.
[0023] The present invention has the following beneficial effects:
[0024] This predictive method extracts image features from H&E-stained sections of colorectal cancer tissue from colorectal cancer patients, thereby non-quantitatively obtaining the microbial community composition of tumor tissue. This provides an effective and rapid way to obtain microbial information from tumor tissue. Furthermore, patients with different tissue microbial community compositions have different prognoses due to the influence of intratumoral microorganisms. Therefore, the obtained microbial community composition can further predict the prognostic risk of patients, providing more accurate, reliable, and comprehensive detection results for the clinical treatment of colorectal cancer. Attached Figure Description
[0025] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention and should not be regarded as a limitation on the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0026] Figure 1 To integrate H&E-stained slides, tissue microbial abundance, and host gene expression in CRC; (a) Study design overview: This application collected 607 H&E-stained slides, 533 pairs of colorectal tissue microbial samples, and host transcriptome samples; for 533 CRC patients, each patient had a paired H&E-stained slide, a tissue microbial sample, and a host transcriptome sample; features of the H&E-stained slides, including color and morphological features, were extracted to obtain an image feature matrix; for the host transcriptome, tumor microloops were calculated. (a) Purity of stromal cells, immune cells, and tumors in the environment; (b) Heatmap showing the H&E staining section features of all samples, clustered by Ward.D2 hierarchical clustering based on feature size and Euclidean distance; (c) Contour width of each sample in the two PAM clusters and the average contour width of the two clusters; (d) Average contour width when the number of clusters k is 1 to 10; (e) PCA based on H&E staining section features in the two PAM clusters, R2 obtained by ANOSIM test, with a p-value less than 0.05 considered as the significance threshold;
[0027] Figure 2 The colorectal tissue microbiota of the two groups of patients; (a) Procrustes analysis showing the association between the characteristics of H&E stained sections and the genus-level tissue microbiota, based on the Bray-Curtis dissimilarity matrix of the genus-level tissue microbiota of the two groups (b) PCoA plot and (c) NMDS plot; R was calculated by PERMANOVA test. 2 (d) Comparison of alpha diversity (Shannon index) of microorganisms in patient tissues between the two groups at the phylum, class, order, family, and genus levels, with p-values calculated using the Wilcoxon rank-sum test; (e) Heatmap showing the comparison of mean relative abundance of 33 species at the phylum level in the two groups, with red species names indicating significant differences in relative abundance between the two groups; (f) Comparison of relative abundance of Proteobacteria, (g) Chlamydiae, (h) Actinobacteria, (i) Tenericutes, (j) Bacteroidetes, and (k) Planctomycetes between the two groups;
[0028] Figure 3 To illustrate the role of tissue microbiota in the formation of two groups of CRC patients; (a) PCoA plots of tissue microbiota before and after removal of Proteobacteria and Actinomycetes from both groups of samples; R-values were calculated using the PERMANOVA test. 2 (a) Among the 150 genera with the highest relative abundance, 125 (83%) showed significant differences in relative abundance between the two groups; P-values were calculated using the Wilcoxon rank-sum test; (b) The heatmap shows the relative abundance of 97 genera (P<0.01) that showed significant differences between the two clusters, and the Sankey plot shows the phylum to which the 97 genera belong, with the percentage after the phylum name indicating the proportion of significantly different genera belonging to that phylum;
[0029] Figure 4 Differences in non-cancerous cell components and tissue microbiota within the tumor microenvironment are potential contributing factors to these differences; (a) a flowchart provides an overview of the tumor microenvironment analysis procedure; (b) box plots show differences in matrix score, immune score, and tumor purity between the two groups; p-values were calculated using the Wilcoxon rank-sum test, with p-values less than 0.05 considered significant; (c) Kaplan-Meier (KM) survival curves for overall survival (Group 1 vs. Group 2); (d) comparisons of eight phyla with significant relative abundance differences between the two groups and matrix score, The correlation between immune score and tumor purity was analyzed; the correlation coefficient and p-value were calculated using Spearman correlation analysis, *P<0.05, **P<0.01, ***P<0.001; (e) shows the correlation network diagram of the relationship between genus-level species and matrix score, immune score and tumor purity; circular and triangular nodes represent species and tumor microenvironment, respectively, the size of the node represents the relative abundance of the species, the node color represents the phylum to which the species belongs, red border indicates positive correlation, blue border indicates negative correlation, and bar chart shows the difference in relative abundance of the species between the two groups. Detailed Implementation
[0030] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below. Where specific conditions are not specified in the embodiments, conventional conditions or conditions recommended by the manufacturer shall apply. Reagents or instruments whose manufacturers are not specified are all conventional products that can be purchased commercially.
[0031] First, embodiments of the present invention provide a method for predicting the tissue microbial characteristics of colorectal cancer patients, comprising:
[0032] Acquire clustering results of multiple known samples and tissue microbial characteristics corresponding to each cluster of samples in the clustering results; wherein, the samples (to be tested and / or known) are hematoxylin-eosin stained section images of tissue sections from colorectal cancer patients;
[0033] After obtaining the sample to be tested and performing feature extraction, the slice features of the sample to be tested are obtained. Based on the slice features of the sample to be tested, the category to which the sample to be tested belongs in the clustering results is determined, and the tissue microbial features corresponding to the category to which the sample to be tested belongs are taken as the tissue microbial features of the sample to be tested.
[0034] Through a series of inventive efforts, the inventors of this application have, for the first time, proposed and verified that by extracting image features from H&E-stained sections of colorectal cancer tissue from colorectal cancer patients, the microbial colony composition of tumor tissue can be obtained, providing an effective and rapid method for acquiring microbial information of tumor tissue. Furthermore, patients with different tissue microbial colony compositions have different prognoses; therefore, the known microbial colony composition can be used to further predict the prognostic risk of patients, providing more accurate, reliable, and comprehensive detection results for the clinical treatment of colorectal cancer.
[0035] In some embodiments, the number of known samples is ≥ (5 to 400), specifically it can be ≥ any one or any two of the following: 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300, 320, 340, 360, 380 and 400.
[0036] In some embodiments, the known sample may be a sample of known information or results, and may be derived from any one or more of known public databases, publicly available literature, patents, and clinical samples.
[0037] The “hematoxylin-eosin stained slide images” in this article include whole slide imaging (WSI) of hematoxylin-eosin (H&E) stained slides. Therefore, the “slide features” mentioned in this article are also abbreviated as “WSI features”.
[0038] In some embodiments, the specific parameters (such as image size, resolution, etc.) of the images of the known samples and / or the samples to be tested can refer to conventional H&E staining images.
[0039] In some embodiments, the clustering result refers to: extracting features from multiple known samples to obtain slice features of the known samples; and clustering the multiple samples based on the slice features to obtain the clustering result.
[0040] In some embodiments, the feature extraction further includes: preprocessing the image to be extracted.
[0041] In some embodiments, the preprocessing includes: converting the image from RGB to HSV color space, binarizing the image to obtain the foreground and background regions of the image, and extracting the features from the foreground region.
[0042] In some embodiments, the binarization process includes: binarizing a grayscale image based on a threshold; performing dilation and erosion morphological closing operations on the binarized image while smoothing the foreground contour; and moving the detected approximate foreground contour according to an offset to obtain the final foreground region (tissue region).
[0043] In some embodiments, the feature extraction includes: extracting at least one of color features and hu moments.
[0044] In some embodiments, the color feature includes at least one of the color mean, variance, and slope.
[0045] In some embodiments, the clustering includes any one of k-medoids clustering, k-means clustering, and clarans clustering.
[0046] In some embodiments, the clustering is k-medoids clustering.
[0047] In some embodiments, the number of clusters is 2 to 10, specifically any one or any two of 2, 3, 4, 5, 6, 7, 8, 9 and 10, preferably 2.
[0048] In some embodiments, the tissue microbial characteristics corresponding to each type of sample are obtained by statistical analysis of the tissue microbial characteristics of each image in that type of sample.
[0049] In some embodiments, the tissue microbial characteristics corresponding to each type of sample are: tissue microbial characteristics common to each known type of sample.
[0050] In some embodiments, the tissue microbial characteristics include: the composition of the microbial community in the tissue.
[0051] In some embodiments, the composition of the microbial community includes: microbial species and relative abundance of microorganisms.
[0052] In some embodiments, the composition of the microbial community includes: the dominant microbial species and the relative abundance of the dominant microbial species.
[0053] In this article, "dominant microorganisms" can refer to: a microbial population that has the best living conditions, the fastest reproduction rate, and / or the largest number of microorganisms under a certain environment, or a microbial population that dominates a habitat.
[0054] In some embodiments, when the number of clusters is 2, the obtained tissue microbial characteristics include: one group of samples has a higher abundance of actinomycetes; another group has a higher abundance of Proteobacteria and Bacteroidetes, and the prognostic risk of this group is relatively higher.
[0055] In some embodiments, the step of determining the category to which the test sample belongs in the clustering results includes: constructing a classification model based on multiple known slice features and clustering results, combined with a classification algorithm; the classification model can determine the category to which the test sample belongs in the clustering results based on the slice features of the sample.
[0056] In some embodiments, the classification algorithm is selected from any one of the following: k-nearest neighbor algorithm, support vector machine classification algorithm, and linear discriminant analysis algorithm.
[0057] This invention provides a device for predicting the microbial characteristics of tissues from colorectal cancer patients, comprising:
[0058] The acquisition module is used to acquire the sample to be tested, the clustering results of multiple known samples as described in any of the foregoing embodiments, and the tissue microbial characteristics corresponding to each class of samples in the clustering results;
[0059] The feature extraction module is used to extract features from the sample to be tested and obtain slice features; wherein the feature extraction steps are as described in any of the foregoing embodiments;
[0060] The prediction module is used to determine the category to which the sample belongs in the clustering result based on the slice characteristics of the sample to be tested, and to take the tissue microbial characteristics corresponding to the category to which the sample belongs as the tissue microbial characteristics of the sample to be tested; wherein, the step of determining the category to which the sample belongs in the clustering result is as described in any of the foregoing embodiments.
[0061] On the other hand, embodiments of the present invention also provide a device for predicting the prognostic risk of colorectal cancer patients, comprising:
[0062] The acquisition module is used to acquire the sample to be tested, the clustering results of multiple known samples, and the prognostic risk corresponding to each class of samples in the clustering results; wherein, the sample and the clustering results are as described in any of the foregoing embodiments;
[0063] The feature extraction module is used to extract features from the sample to be tested and obtain slice features; wherein the feature extraction steps are as described in any of the foregoing embodiments;
[0064] The prediction module is used to determine the category to which the test sample belongs in the clustering result based on the slice characteristics of the test sample, and to take the prognostic risk corresponding to the category to which the test sample belongs as the prognostic risk of the test sample; wherein, the step of determining the category to which the test sample belongs in the clustering result is as described in any of the foregoing embodiments.
[0065] In some embodiments, the method for obtaining the prognostic risk corresponding to each type of sample is the same as the tissue microbial characteristics corresponding to each type of sample described in any of the foregoing embodiments. It is based on the prognostic risk of each known type of sample, and statistically obtains the common or corresponding prognostic risk of each type of sample. For example, the prognostic risk may be relatively high or low.
[0066] On the other hand, embodiments of the present invention also provide an electronic device, which includes a processor and a memory, the memory being used to store a program, which, when executed by the processor, causes the processor to implement a method for predicting the prognostic risk of colorectal cancer patients or a method for predicting the tissue microbial characteristics of colorectal cancer patients as described in any of the foregoing embodiments.
[0067] The methods for predicting the prognostic risk of colorectal cancer patients include:
[0068] Obtain clustering results for multiple known samples and the prognostic risk corresponding to each class of samples in the clustering results; wherein, the samples, the clustering results, and the prognostic risk corresponding to each class of samples are as described in any of the foregoing embodiments;
[0069] Obtain the sample to be tested, perform feature extraction as described in any of the foregoing embodiments, and obtain the slice features of the sample to be tested;
[0070] Based on the slice features of the sample to be tested, the category to which the sample to be tested belongs in the clustering result is determined, and the prognostic risk corresponding to the category to which the sample to be tested belongs is taken as the prognostic risk of the sample to be tested; wherein, the step of determining the category to which the sample to be tested belongs in the sample image clustering result is as described in any of the foregoing embodiments.
[0071] Understandably, the methods for predicting the prognostic risk of colorectal cancer patients are largely the same as those for predicting tissue microbial characteristics, the only difference being that the former is a tissue microbial characteristic and the latter is a prognostic risk.
[0072] Electronic devices may include memory, processor, bus, and communication interface, which are electrically connected directly or indirectly to enable data transmission or interaction. For example, these components may be electrically connected to each other via one or more buses or signal lines.
[0073] The memory can be, but is not limited to, Random Access Memory (RAM), Read Only Memory (ROM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), etc.
[0074] The processor can be an integrated circuit chip with signal processing capabilities. The processor 120 can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0075] The electronic device can be a server, cloud platform, mobile phone, tablet computer, laptop computer, ultra-mobile personal computer (UMPC), handheld computer, netbook, personal digital assistant (PDA), wearable electronic device, virtual reality device, etc. Therefore, the embodiments of this application do not limit the types of electronic devices.
[0076] Furthermore, embodiments of the present invention also provide a computer-readable medium storing a computer program, which, when executed by a processor, implements the method for predicting the tissue microbial characteristics of colorectal cancer patients as described in any of the foregoing embodiments, or the method for predicting the prognostic risk of colorectal cancer patients as described in any of the foregoing embodiments.
[0077] Computer-readable media can be general-purpose storage media, such as removable disks and hard drives.
[0078] The features and performance of the present invention will be further described in detail below with reference to embodiments.
[0079] Example 1
[0080] A method for inferring tissue microbial characteristics and prognostic risk of colorectal cancer patients based on tissue images of colorectal cancer cases, comprising the following steps.
[0081] Obtain hematoxylin and eosin (H&E) pathological images (WSI) of tissue sections from known colorectal cancer patients, along with their corresponding tissue microbial characteristics and / or prognostic risks, and extract features from the images:
[0082] (1) WSI is read into memory at a 20X resolution and converted from RGB to HSV color space. Median filtering is used to effectively suppress noise.
[0083] (2) To better distinguish between the foreground (organizational area) and the background, binarization is performed in the following manner:
[0084] Binarize grayscale images based on thresholds;
[0085] The binarized image is first dilated and then eroded (morphological closing operation) to eliminate narrow discontinuities and small holes in the foreground, while smoothing the foreground contour.
[0086] After moving the detected foreground approximate contour according to the offset, the final tissue region is obtained;
[0087] (3) For the tissue region, extract and store the corresponding moment features and color features (mean, variance and slope) as H&E staining section features for downstream experimental analysis.
[0088] The k-medoids clustering method—Partitioning Around Medoids (PAM)—was used to cluster the H&E stained slide features. The PAM steps include: ① selecting k objects as medoids; ② calculating the dissimilarity matrix; ③ assigning each object to its nearest centroid; ④ checking if any individual in each cluster reduces the average dissimilarity coefficient, and if so, using the individual with the largest reduction as the centroid of that cluster; ⑤ if at least one centroid has changed, returning to step ③, otherwise ending the algorithm. This embodiment calculated the average silhouette width for clusters ranging from 1 to 10. Silhouette width is a measure of an object's affiliation to its cluster; it is a comparison of the average distance between an object and other objects in the same group with the average distance between the object and all objects in its nearest neighbor cluster. The silhouette width value ranges from -1 to 1; a larger value indicates better clustering, while a negative value means the object may have been misclassified into the current cluster. Based on the calculation of the maximum average silhouette width, the optimal number of clusters for WSI features was determined to be 2.
[0089] Based on the tissue microbial characteristics and / or prognostic risks of the acquired sample images, the tissue microbial characteristics and / or prognostic risks corresponding to each type of sample image are statistically obtained.
[0090] In practical clinical applications:
[0091] For the tissue microbial characteristics and / or prognostic risk prediction of the test sample (colorectal cancer patient), it is necessary to determine the grouping of this patient in the clustering results of the above-mentioned known sample images: extract the H&E stained slide features of the patient using the above feature extraction method, and calculate the distance from the prediction point (the patient whose grouping needs to be determined) to the sample point (colorectal cancer patient in the existing public dataset) based on the k nearest neighbor algorithm, find the category of the k nearest neighbors (k nearest neighbors) to the prediction point, and use the category with the highest frequency of the k nearest neighbors as the category of the prediction point.
[0092] Based on the category of the sample to be tested, obtain the tissue microbial characteristics and / or prognostic risk of the sample.
[0093] Example 2
[0094] First, H&E stained slides from 607 colorectal cancer patients and paired tissue microbiome and host transcriptome data from 533 colorectal cancer patients were downloaded and obtained from The Cancer Genome Atlas (TCGA). Figure 1 The specific implementation process is as follows:
[0095] 1. A total of 607 H&E stained slides from CRC patients were obtained from TCGA. Following the method in Example 1, the H&E stained slide features (WSI features) were extracted, and based on the PAM clustering method described in Example 1, all samples were divided into two major categories ( Figure 1 (b) The 607 WSI samples were divided into two clusters, comprising 321 and 286 samples respectively, with mean contour widths of 0.40 and 0.36 (m², 0.36m², 0.40m², 0.36 ... Figure 1 (c) Outline drawing ( Figure 1 (d) further validated the clustering quality, showing that the contour width was highest (0.38) when k=2, indicating that this is the optimal number of clusters. The ANOSIM test showed that the WSI characteristics of the two PAM clusters were significantly different ( Figure 1 In the middle e, P<0.0001, R 2 =54%).
[0096] 2. Using Procrustes correlation analysis, we analyzed the relationship between H&E stained section characteristics and microbial communities and non-cancerous components in the tumor microenvironment.
[0097] Hellinger transformations were performed on the H&E staining section feature matrix, genus-level species abundance matrix, and non-cancerous cell component matrix. Then, based on the H&E staining section features, genus-level species abundance, and non-cancerous cell components, the Bray-Curtis dissimilarity index between samples was calculated using the "vegdist" function in the R package "vegan". PCoA was used to reconcile each dissimilarity matrix. Rotation of the two different matrices was performed using the "procustes" function in the R package "vegan". The symmetric Procrustes correlation coefficient r, sum of squared distances, and p-value were calculated by the "protest" function in the "vegan" package using 9999 permutations.
[0098] Procrustes analysis confirmed a significant correlation between microbial community composition and WSI characteristics. Figure 2 In the case of a, P = 0.001, the correlation coefficient R0 2 =0.2463), indicating that CRC samples with similar microbial characteristics also have similar WSI characteristics. Furthermore, at the genus level, there were significant differences in bacterial communities between the two groups (ANOSIM test, P = 0.001, Figure 2 (bc) Compared to group 2, the alpha diversity (Shannon diversity index) in group 1 was reduced at all classification levels. Figure 2(d). For example, the mean alpha diversity at the phylum level was 1.22 for group 1 and 1.26 for group 2 (Wilcoxon P = 0.0038); at the genus level, it was 2.83 for group 1 and 3.16 for group 2 (Wilcoxon P = 1.2e-7). In summary, these results indicate that the differences in WSI characteristics between the two groups are mainly due to differences in colorectal tissue microbial content.
[0099] 3. Identify the bacteria that contributed the most to these two groups.
[0100] This embodiment analyzed species enriched in a cluster at the phylum level compared to another group. The sum of the relative abundances of Proteus, Firmicutes, and Actinomycetes in the collected CRC patient cohort was 91.8%. The results indicated a significant difference in the relative abundance of dominant phyla between the two groups of CRC patients. In both groups, 12 out of 33 phyla showed significant differences in abundance (Wilcoxon test, ...). Figure 2 (e), of which 11 were significantly enriched in group 2.
[0101] At the phylum level, the number of Proteus, Chlamydia, and Bacteroides was significantly increased in group 2. Figure 2 Groups f, g, and j) were present, while group 1 had a higher number of actinomycetes (f, g, and j). Figure 2 (h). To assess the importance of Proteus and Actinobacteria in the formation of these two clusters, this embodiment removed them from samples containing them. This resulted in a 3-fold (3% to 1%) decrease in the variance explained by the two clusters (PERMANOVA test, h). Figure 3 (a)
[0102] At the genus level, among the 150 species with the highest relative abundance, 125 species (83%) showed significant differences in abundance between the two groups (ANOVAP < 0.05). Figure 3 (b) Furthermore, the heatmap showed differences in the abundance of 97 genera between the two clusters (P<0.01), with 89 (91.8%) of the 97 genera being more abundant in group 2. Of these 97 significantly different genera, 48 (49.5%) belonged to the Proteobacteria phylum and 12 (12.4%) belonged to the Actinobacteria phylum. These results suggest that Proteobacteria and Actinobacteria play a key role in the division between the two groups. Figure 3 (c)
[0103] 4. Assess the differences between the two groups of tumor microenvironments derived from pathological image features.
[0104] The ESTIMATE algorithm was used to calculate stromal score, immune score, assessment score, and tumor purity. The stromal score refers to the number of stromal cells in the tumor tissue, and the immune score refers to the infiltration of immune cells in the tumor tissue. The assessment score is the sum of the stromal and immune scores. The algorithm uses gene expression data to output estimated levels of infiltrating stromal cells and immune cells, as well as estimated tumor purity. It integrates expression data from six platforms, totaling 10,412 common genes, and after screening, obtains stromal features (141 genes) and immune features (41 genes).
[0105] Because microorganisms are known to profoundly influence the tumor microenvironment. The data analysis protocol workflow is as follows: Figure 4 As shown in Figure a. Analysis revealed a higher matrix score observed in group 2. Figure 4 (P = 0.0071). There was no statistically significant difference in immune scores between the two groups (P = 0.72). Tumor purity in group 1 was slightly higher than that in group 2, but the difference was not statistically significant (P = 0.16). Furthermore, the impact of matrix and immune cell components in the TME on patient prognosis was explored. Although not significant, it was still found that CRC patients in group 2, i.e., those with significantly higher matrix scores, had poorer overall survival. Figure 4 (c, P = 0.38). This finding is consistent with previous studies. One study showed that stromal score can serve as a biomarker and independent prognostic factor for primary gastric cancer. A study collected gene expression profiles and clinical information from 1635 CRC patients from the TCGA and GEO databases, concluding that higher stromal scores, immune scores, and lower tumor purity were positively correlated with tumor stage and poorer overall survival. In summary, the data suggest that although stromal cells in the TME are diverse and complex, they may favor tumor cell survival, thereby reducing patient survival.
[0106] The significant differences in stromal scores between the two groups obtained by clustering H&E-stained slide features are reliable. Recently, computer vision technology has been applied to tumor pathology specimens, enabling the automatic identification and classification of various cell types within the tumor microenvironment. One study suggests that stromal cell ratios based on automated image analysis may be a potential predictor of high recurrence risk in ovarian cancer patients after platinum-based chemotherapy. While not quantitative, this invention proposes that the prognosis of CRC patients can be preliminarily determined based on the extraction and analysis of H&E-stained slide features, which has significant scientific implications for reducing clinical burden, assisting physicians in diagnosis and treatment, and improving patient survival.
[0107] To explore the impact of tissue microbiota on TME, the correlation between microbiota and immune cells, stromal cells, and tumor purity was further analyzed. For example... Figure 4As shown in Figure d, the eight phyla that were significantly associated with pathological image features were also significantly associated with non-cancerous cells in tumor tissue (Spearman, P<0.05). These results indicate that the heterogeneity of the microbiome within colorectal tumors drives the differentiation of the non-cancerous components of the tumor microenvironment (TME), which is further manifested in H&E-stained sections.
[0108] To further investigate the relationship between genus-level bacteria and immune scores, matrix scores, and tumor purity, a correlation network was constructed based on the relative abundance of the top 30 genera, whose relative abundance differed significantly between the two groups. Figure 4 (e). Matrix score was positively correlated with Acinetobacter and Campylobacter, both belonging to the Proteobacteria phylum. Most genera, such as Enterococcus, Streptococcus, and Escherichia coli, showed a significant negative correlation with the immune score, suggesting an immune system dysregulation in the tumor microenvironment of CRC patients. Genera significantly correlated with tumor purity were all positive, such as Bordetella and Shigella.
[0109] In summary, the tissue microbiome characteristics of colorectal cancer patients can be non-quantitatively determined from H&E-stained sections of colorectal tissue. Similar to the concept of intestinal type, if classified as Group 1 based on WSI characteristics, the abundance of actinomycetes in the tissue microbiome will be higher; if it is Group 2, it tends to have more Proteobacteria and Bacteroidetes. Furthermore, if it is Group 2, the stromal cell component in the patient's tumor tissue is significantly increased, which may lead to poorer survival. This may be because the increased relative abundance of Acinetobacter and Campylobacter in the tumor tissue contributes to the increased stromal cell content in the tumor microenvironment. Ultimately, the stromal components in the tumor microenvironment interact with tumor cells, regulating tumor growth, promoting tumor metastasis and spread, and creating barriers to conventional chemotherapy to prevent it from reaching tumor cells, thereby increasing tumor cell resistance and ultimately leading to poor patient prognosis.
[0110] Hematologic and epithelial (H&E) staining pathology images are essential for cancer patients, forming the foundation of pathological diagnosis and serving as the cornerstone of quality assurance. With the continuous development of medical technology and cancer treatment, the era of personalized medicine has arrived. The advent of fully automated single-drop staining technology ensures that each staining slide uses fresh reagents under standardized procedural conditions, reducing human error that may occur during weighing, dissolving, and filtering by different technicians, thus improving the stability of staining quality. This invention proposes adding a step to this automated process to extract the characteristics of the stained pathological images. Based on these image characteristics, non-quantitative information on the colorectal microbiota of colorectal cancer patients can be obtained, similar to intestinal patterns. Furthermore, due to the influence of intratumoral microbiota, patients with different tissue microbial contents have different prognoses. Therefore, in-depth research combined with imaging characteristics can further understand the prognosis of colorectal cancer patients. In this way, while ensuring high-quality test results, clinicians can obtain more accurate, reliable, and comprehensive results, making the development of personalized medicine possible. In the future, it will play an important role in early cancer diagnosis, prognostic assessment, and targeted therapy guidance.
[0111] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A method for predicting the microbial characteristics of tissues from colorectal cancer patients, characterized in that, It includes: The method involves obtaining clustering results from multiple known samples and the tissue microbial features corresponding to each cluster in the clustering results; wherein the samples are hematoxylin-eosin stained slide images of tissue sections from colorectal cancer patients; the clustering results refer to: extracting features from multiple known samples to obtain slide features of the known samples; clustering multiple samples based on the slide features to obtain the clustering results; the feature extraction further includes: preprocessing the image to be feature extracted; the preprocessing includes: converting the image from RGB to HSV color space, binarizing the image to obtain the foreground and background regions of the image, and performing the feature extraction on the foreground region; the feature extraction includes: extracting color features and Hu moments; the color features include: at least one of the color mean, variance, and slope; After obtaining the sample to be tested and performing feature extraction, the slice features of the sample to be tested are obtained. Based on the slice features of the sample to be tested, the category to which the sample to be tested belongs in the clustering results is determined, and the tissue microbial features corresponding to the category to which the sample to be tested belongs are taken as the tissue microbial features of the sample to be tested.
2. The prediction method according to claim 1, characterized in that, The binarization process includes: binarizing the grayscale image based on a threshold; performing dilation and erosion morphological closing operations on the binarized image while smoothing the foreground contour; and moving the detected approximate foreground contour according to the offset to obtain the final foreground region.
3. The prediction method according to claim 1 or 2, characterized in that, The clustering includes any one of k-medoids clustering, k-means clustering, and clarans clustering.
4. The prediction method according to claim 3, characterized in that, The clustering is k-medoids clustering.
5. The prediction method according to claim 1 or 2, characterized in that, The number of clusters in the clustering is 2 to 10.
6. The prediction method according to claim 5, characterized in that, The number of clusters in the clustering is 2.
7. The prediction method according to claim 1 or 2, characterized in that, The tissue microbial characteristics corresponding to each type of sample are obtained by statistical analysis of the tissue microbial characteristics of each image in that type of sample.
8. The prediction method according to claim 1 or 2, characterized in that, The tissue microbial characteristics corresponding to each type of sample are: the tissue microbial characteristics common to each known type of sample.
9. The prediction method according to claim 1 or 2, characterized in that, The tissue microbial characteristics include the composition of the microbial community in the tissue.
10. The prediction method according to claim 9, characterized in that, The composition of the microbial community includes the types of microorganisms and their relative abundance.
11. The prediction method according to claim 9, characterized in that, The composition of the microbial community includes: the dominant microbial species and the relative abundance of the dominant microbial species.
12. The prediction method according to claim 1 or 2, characterized in that, The step of determining the category of the sample to be tested in the clustering results includes: constructing a classification model based on the slice features of multiple known samples and the clustering results, combined with a classification algorithm; the classification model can determine the category of the sample to be tested in the clustering results based on the slice features of the sample.
13. The prediction method according to claim 12, characterized in that, The classification algorithm is selected from any one of the following: k-nearest neighbor algorithm, support vector machine classification algorithm, and linear discriminant analysis algorithm.
14. A device for predicting the microbial characteristics of tissues from colorectal cancer patients, characterized in that, It includes: The acquisition module is used to acquire the sample to be tested, the clustering results of multiple known samples as described in any one of claims 1 to 13, and the tissue microbial characteristics corresponding to each class of samples in the clustering results; A feature extraction module is used to extract features from the sample to be tested and obtain slice features; wherein the feature extraction steps are as described in any one of claims 1 to 13; The prediction module is used to determine the category of the sample to be tested in the clustering results based on the slice characteristics of the sample to be tested, and to take the tissue microbial characteristics corresponding to the category of the sample to be tested as the tissue microbial characteristics of the sample to be tested; wherein the step of determining the category of the sample to be tested in the clustering results is as described in any one of claims 1 to 13.
15. A device for predicting the prognostic risk of colorectal cancer patients, characterized in that, It includes: An acquisition module is used to acquire the sample to be tested, the clustering results of multiple known samples, and the prognostic risk corresponding to each class of samples in the clustering results; wherein, the sample and the clustering results are as described in any one of claims 1 to 13; A feature extraction module is used to extract features from the sample to be tested and obtain slice features; wherein the feature extraction steps are as described in any one of claims 1 to 13; The prediction module is used to determine the category of the sample to be tested in the clustering results based on the slice characteristics of the sample to be tested, and to take the prognostic risk corresponding to the category of the sample to be tested as the prognostic risk of the sample to be tested; wherein the step of determining the category of the sample to be tested in the clustering results is as described in any one of claims 1 to 13.
16. An electronic device, characterized in that, It includes a processor and a memory, the memory being used to store a program that, when executed by the processor, causes the processor to implement a method for predicting the prognostic risk of colorectal cancer patients or a method for predicting the tissue microbial characteristics of colorectal cancer patients as described in any one of claims 1 to 13. The methods for predicting the prognostic risk of colorectal cancer patients include: Obtain clustering results of multiple known samples and the prognostic risk corresponding to each class of samples in the clustering results; wherein, the samples and the clustering results are as described in any one of claims 1 to 13; After obtaining the sample to be tested and performing feature extraction, the slice features of the sample to be tested are obtained; wherein the feature extraction step is as described in any one of claims 1 to 13; Based on the slice characteristics of the sample to be tested, the category to which the sample to be tested belongs in the clustering result is determined, and the prognostic risk corresponding to the category to which the sample to be tested belongs is taken as the prognostic risk of the sample to be tested; wherein, the step of determining the category to which the sample to be tested belongs in the clustering result is as described in any one of claims 1 to 13.
17. A computer-readable medium, characterized in that, The computer-readable medium stores a computer program that, when executed by a processor, implements the method for predicting the tissue microbial characteristics of colorectal cancer patients as described in any one of claims 1 to 13, or the method for predicting the prognostic risk of colorectal cancer patients as described in claim 16.