A data-efficient multi-scale pathological image classification method

By employing a weakly supervised classification method that combines random partial segmentation and multi-scale feature extraction, the problems of data redundancy and overfitting in pathological image processing are solved, achieving efficient and accurate pathological image classification.

CN116152566BActive Publication Date: 2026-06-12YUNNAN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
YUNNAN UNIV
Filing Date
2023-03-02
Publication Date
2026-06-12

Smart Images

  • Figure CN116152566B_ABST
    Figure CN116152566B_ABST
Patent Text Reader

Abstract

The application discloses a kind of data efficient multiscale pathological image classification method.First stage carries out tissue area extraction, block to pathological image after multiple rate splicing of super resolution, and extracts feature;First, the whole slide image is read into memory under low magnification, and the color space is converted from RGB to HSV;Then the edge is smoothed by median blur, the binary segmentation threshold of image is determined, and small gaps and holes are filled by additional morphological closure;According to the region threshold, the approximate contour of the detected foreground object is filtered, and is stored for downstream processing.Then under a certain magnification, 224x224 tiles are cropped from the segmented foreground contour in a sliding window manner without overlapping, and the tiles and their coordinates and WSI metadata are stored using the HDF5 hierarchical data format.The second stage randomly selects part of the feature input model for classification prediction, repeats multiple times, selects topk base learner for bagging integration to obtain the final classification result.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of image classification technology, and in particular relates to a data-efficient method for multi-scale pathological image classification. Background Technology

[0002] In clinical medicine, pathological examination has been considered the gold standard for cancer diagnosis for over 100 years. Pathological images at different magnifications contain detailed information about tissue structures and cell shapes, aiding pathologists in making judgments. Manually analyzing large numbers of medical images is lengthy and time-consuming, easily leading to human bias and errors; delayed or incorrect diagnoses can harm patients. With the successful application of deep learning in the medical field, using deep learning technology to classify pathological images has become an important research direction.

[0003] Automated analysis of histopathological images has been applied from different perspectives of deep learning (e.g., supervised, weakly supervised, unsupervised, and transfer learning) to various tasks in histology (e.g., cell or nucleus segmentation, tissue classification, tumor detection, disease prediction, and prognosis), and has been applied to multiple cancer types. For classification and prediction using WSIs, a single histopathological image is on the order of billions of pixels, containing over one million descriptive objects. Feature extraction computation is extremely demanding, making direct input into neural networks difficult to implement. Directly downsampling WSIs to fit the neural network results in the loss of a significant amount of important detailed information.

[0004] Lu MY et al. (Nature Biomedical Engineering, 2021, 5(6)) revealed a weakly supervised method with high data utilization, strong interpretability, and strong domain adaptability. It only requires slice-level labels and uses attention-based learning to automatically identify sub-regions with high diagnostic value to accurately classify the entire slice. At the same time, it also uses instance-level clustering to constrain and refine the feature space of the identified representative regions. CLAM adds a binary clustering network to the attention-based multi-instance classification of AB-MIL to generate pseudo-labels for additional supervision signals. For tumor subtypes, domain knowledge can be added to perform constrained clustering. This method inputs all slices into the model for prediction. Especially in some histopathological slices with a small proportion of positive instances, there are far more negative instances than positive instances. The model faces great challenges in identifying positive instances under MIL conditions, which increases the difficulty of identification. These factors together lead to serious overfitting problems. At the same time, the existing technology inputs all slices into the model for prediction, which also has problems such as data redundancy and low data utilization. That is, the computational load is large but the improvement in classification performance is small. Given the enormous size of the full-slice image, the units that the model directly processes are chunks cropped from the full-slice image. Multi-instance learning models for full-slice image classification are essentially designed to identify chunks that primarily correspond to slide labels, but with all slices input into the model, the ratio of positive to negative instances is severely imbalanced. Summary of the Invention

[0005] To address the aforementioned problems, this invention provides a data-efficient multi-scale pathological image classification method.

[0006] The technical solution adopted in this invention is a data-efficient multi-scale pathological image classification method, comprising the following steps:

[0007] S1. Segment and cut the image into pieces;

[0008] S2. Extract features from the segmented image data;

[0009] S3. Topk ensemble based on weakly supervised classification of random partial block segmentation.

[0010] Furthermore, the specific steps of S1 are as follows:

[0011] First, the entire slide image is read into memory, and the color space is converted from RGB to HSV. Then, the binary segmentation threshold of the image is determined. Finally, the approximate contours of the detected foreground objects are filtered based on the region threshold and stored.

[0012] Specifically, the process of determining the threshold is as follows:

[0013] First, determine the proportions ω0 and ω1 of the two types of pixels in the image, namely the background and foreground parts, according to the following formula;

[0014]

[0015] Then, determine the grayscale mean values ​​μ0 and μ1 of the two types of pixels, namely the background and foreground parts, according to the following formula;

[0016]

[0017] The inter-class variance σ is determined according to the following formula. 2 Finally, iterate through each gray value T in the image gray level range of 0-255, and find the gray value that maximizes the inter-class variance, which is the threshold L to be obtained.

[0018] σ 2 =ω0(μ0-μ) 2 +ω1(μ1-μ) 2

[0019] Where, ω i N represents the proportion of pixels in the image. i Indicates the number of foreground or background pixels, M×N is the image size, μ i Sum is the average grayscale value of a pixel. i The sum of the gray values ​​of the foreground or background pixels, μ, represents the total average gray value of the image, and μ = ω0μ0 + ω1μ1.

[0020] Specifically, the block division steps described in S1 are as follows:

[0021] At a specific magnification, small tiles are cropped from the segmented foreground outline using a sliding window method without overlap, retaining only the parts whose organizational area accounts for more than 70% and discarding the rest; the tiles and their coordinates and WSI metadata are stored using HDF5 layered data format, with one HDF5 file corresponding to each slide.

[0022] Specifically, the specific magnification ratios are 20x and 5x. The image size is set to 256×256 at 20x magnification and 64×64 at 5x magnification.

[0023] Furthermore, the S2 feature extraction step specifically involves using a ResNet50 model pre-trained on ImageNet to convert the small slices obtained in S1 at two magnifications into 1024-dimensional feature vectors, and storing all feature vectors corresponding to each full slice image in an HDF5 file.

[0024] Furthermore, the specific steps of the topk ensemble based on random partial block weakly supervised classification in S3 are as follows:

[0025] In each S2 iteration, 1 / k features are randomly selected from the feature set. All selected features are then aligned and concatenated according to their corresponding positions in the image patch. This concatenation is then input into the ABMIL classification model. The resulting logits are processed through a Softmax layer to obtain the predicted probability of each class. The class with the highest probability is the class predicted by the model to belong to the current sample. This process is repeated m times to obtain m different models, i.e., m base learners. Here, m and k are hyperparameters, where m > k.

[0026] Validate the performance of m models on the validation set, rank the models according to their error rates, and select the k best-performing base learners.

[0027] The classification results from k models are integrated, and the area under the ROC curve (AUC) is calculated.

[0028] Specifically, during the stitching process, all features extracted at 20x magnification and 5x magnification are aligned and stitched according to the corresponding positions of the patches. The dimension of the stitched image changes from Q×1024 to Q×2048, where Q represents the number of patches generated by the full slice image.

[0029] Specifically, the AUC is obtained by the following formula:

[0030]

[0031]

[0032] Among them, P a P represents the predicted probability of a positive sample. b I(P) represents the predicted probability of a negative sample, where A is the number of positive samples and B is the number of negative samples. a P b The number of samples whose predicted positive value is greater than the predicted negative value is 0.5. When the predicted positive and negative values ​​are exactly equal, the sample is recorded as 0.5.

[0033] The beneficial effects of this invention are:

[0034] This invention uses random sampling to select features, inputting a portion of the slices into the model each time. Instead of inputting all slices at once, random sampling effectively alleviates the overfitting problem and avoids data redundancy and low data utilization. Furthermore, the use of topk base learners for bagging ensemble improves the stability of the algorithm and enhances classification performance. Attached Figure Description

[0035] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0036] Figure 1 It is a diagram showing the correspondence between different magnification blocks;

[0037] Figure 2 This is a schematic diagram of feature extraction;

[0038] Figure 3 Here are the architecture diagrams of the ResNet50 model: (a) is the overall architecture diagram of the ResNet50 model, (b) is the unfolded diagram of the conv block, and (c) is the unfolded diagram of the Identity block.

[0039] Figure 4 This is an ABMIL model architecture diagram;

[0040] Figure 5 This is a diagram of the Top-K Bagging architecture. Detailed Implementation

[0041] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.

[0042] Existing models typically use features from the entire image at a single magnification for prediction, resulting in data redundancy. Therefore, this invention proposes a top-k ensemble for weakly supervised classification with random partial block segmentation, incorporating information from multiple magnifications to improve classification performance.

[0043] S1: Segmentation and Blocking

[0044] S11: First, read the entire slide image into memory at a low magnification and convert the color space from RGB to HSV to determine the segmentation threshold later; then determine the binary segmentation threshold of the image; finally, filter the approximate contours of the detected foreground objects according to the region threshold and store them for downstream processing.

[0045] The implementation process is as follows:

[0046] Assuming the optimal threshold to be found is L, which divides the image into background and foreground, we first calculate the proportions ω0 and ω1 of each type of pixel in the image according to formula (1). Then, we calculate the gray-scale mean values ​​μ0 and μ1 of the two types of pixels according to formula (2). Finally, we calculate the inter-class variance σ according to formula (3). 2 Finally, iterate through each gray value T in the image gray level range of 0-255, and find the gray value that maximizes the inter-class variance, which is the desired threshold L.

[0047]

[0048]

[0049] σ 2 =ω0(μ0-μ) 2 +ω1(μ1-μ) 2 (3)

[0050] Where, ω i μ represents the proportion of pixels in the image. i The mean gray level of each pixel is M×N, where M×N is the image size. i Sum represents the number of foreground or background pixels. i This represents the sum of the gray values ​​of the foreground or background pixels. μ is the total average gray value of the image, and μ = ω0μ0 + ω1μ1.

[0051] S12 block division:

[0052] like Figure 1 At a specific magnification, small patches are cropped from the segmented foreground contours using a sliding window method without overlap. For the resulting patches, only the portion where the tissue area accounts for more than 70% is retained, and the rest are discarded. Patches, their coordinates, and WSI metadata are stored using the HDF5 hierarchical data format, with one HDF5 file per slide. Specifically, the patch size is set to 256×256 at 20x magnification and 64×64 at 5x magnification; this is done to align the two magnifications after feature extraction.

[0053] S2: Feature Extraction

[0054] like Figure 2 For each full-slice image, a deep convolutional neural network is used to compute a low-dimensional feature representation for each slice. Specifically, a ResNet50 model pre-trained on ImageNet is used to convert the small slices obtained by S1 at two magnifications into 1024-dimensional feature vectors. All feature vectors corresponding to each full-slice image are stored in a single HDF5 file.

[0055] The ResNet50 model architecture is as follows: Figure 3As shown in (a), where t Figure 3 (b) is the unfolded diagram of the conv block. Figure 3 (c) is the expanded diagram of IdentityBlock;

[0056] S3: Topk ensemble based on weakly supervised classification using random partial block segmentation

[0057] For each full-slice image, corresponding to all features in the HDF5 file, 1 / k features are randomly extracted from the feature set obtained in each S2 iteration. All features extracted at both magnifications are then concatenated according to the corresponding positions of the patches. After concatenation, the feature dimension changes from Q×1024 to Q×2048, where Q represents the number of patches generated from the full-slice image. The input is as follows... Figure 4 In the ABMIL classification model, h k Let a represent the input feature matrix. k The output attention matrix is ​​represented by logits. After passing through a softmax layer, the predicted probability of each class is obtained. The class with the highest probability is the class predicted by the model to belong to the current sample. This process is repeated m times (where m and k are hyperparameters and m>k) to obtain m different models, i.e., m base learners.

[0058] like Figure 5 On the validation set, the same method is used to validate the performance of the above m models, and their performance is sorted according to the error rate. The k base learners with the best performance are selected.

[0059] The classification results from k models are integrated. Specifically, the classification accuracy (ACC) is obtained by averaging the accuracies output by the k base learners, and the area under the ROC curve (AUC) is obtained by averaging the predicted probabilities output by the k base learners and then dividing by the image label. For all experiments, AUC is the primary performance metric because it is more comprehensive and insensitive to class imbalance. The formula for calculating ACC is expressed as follows:

[0060]

[0061] The formula for calculating AUC is as follows:

[0062]

[0063] Among them, P a P represents the predicted probability of a positive sample. b I(P) represents the predicted probability of a negative sample, where A is the number of positive samples and B is the number of negative samples. a P bThe number of samples whose predicted positive values ​​are greater than their predicted negative values ​​is 0.5. When the predicted positive and negative values ​​are exactly equal, the sample is recorded as 0.5.

[0064] Example 1

[0065] The CAMELYON16 dataset contains 399 images, consisting of 270 training images and 129 test images. The training set includes 159 normal images and 111 tumor images; the test set contains 80 normal images and 49 tumor images. On the CAMELYON16 dataset, we randomly partitioned the 270 digital pathology images in the training set, using 90% for training and 10% for validation, and tested them on the 129 test images provided by the official documentation. The batch size was 1, and all experiments were evaluated using 10-fold cross-validation. Specific experimental results are shown in Table 1. ACC represents the ratio of the number of correctly predicted samples to the total number of samples. AUC represents the probability that, on the ROC curve, the classifier correctly judges a positive sample as higher than a negative sample if a positive sample and a negative sample are randomly selected, assuming that above a threshold is positive and below a threshold is negative.

[0066] Table 1. Sample prediction accuracy and area under the ROC curve for different models.

[0067] Model AUC CLAM 0.884 DTFD-MIL 0.946 This invention 0.953

[0068] The various embodiments in this specification are described in a related manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

[0069] The above description is merely a preferred embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention are included within the scope of protection of the present invention.

Claims

1. A data-efficient multi-scale pathological image classification method, characterized in that, Includes the following steps, S1. Segment and cut the image into pieces; S2. Extract features from the segmented image data; S3. Topk ensemble based on weakly supervised classification of random partial block segmentation; The specific steps of the topk ensemble based on random partial block weakly supervised classification in S3 are as follows: In each S2 iteration, 1 / k features are randomly selected from the feature set. All selected features are then aligned and concatenated according to their corresponding positions in the image patch. This concatenation is then input into the ABMIL classification model. The resulting logits are processed through a Softmax layer to obtain the predicted probability of each class. The class with the highest probability is the class predicted by the model to belong to the current sample. This process is repeated m times to obtain m different models, i.e., m base learners. Here, m and k are hyperparameters, where m > k. Validate the performance of m models on the validation set, rank the models according to their error rates, and select the k best-performing base learners. Integrate the classification results from k models and calculate the area under the ROC curve (AUC). During the stitching process, all features extracted at 20x magnification and 5x magnification are aligned and stitched according to the corresponding positions of the image patches. The dimension of the stitched image changes from Q×1024 to Q×2048, where Q represents the number of image patches generated from the full slice image.

2. The data-efficient multi-scale pathological image classification method according to claim 1, characterized in that, The segmentation steps in S1 are as follows: First, the entire slide image is read into memory, and the color space is converted from RGB to HSV. Then, the binary segmentation threshold of the image is determined. Finally, the approximate contours of the detected foreground objects are filtered based on the region threshold and stored.

3. The data-efficient multi-scale pathological image classification method according to claim 2, characterized in that, The process of determining the threshold is as follows: First, determine the proportion of each type of pixel in the image, namely the background and the foreground, according to the following formula. ; Then, determine the average grayscale values ​​of the two types of pixels, the background and the foreground, according to the following formula. ; The inter-class variance is determined according to the following formula. Finally, iterate through each gray value T in the image gray level range of 0-255, and find the gray value that maximizes the inter-class variance, which is the threshold L to be obtained. in, The proportion of pixels in the image. This indicates the number of foreground or background pixels, where M×N is the image size. The average grayscale value of a pixel. Represents the sum of grayscale values ​​of foreground or background pixels. Let be the total average gray value of the image, and have .

4. The data-efficient multi-scale pathological image classification method according to claim 1, characterized in that, The specific steps for segmentation in S1 are as follows: At a specific magnification, small patches are cropped from the segmented foreground outline using a sliding window method without overlap, retaining only the parts whose organizational area accounts for more than 70% and discarding the rest of the patches; The HDF5 layered data format is used to store the tiles, their coordinates, and WSI metadata. Each slide corresponds to one HDF5 file.

5. The data-efficient multi-scale pathological image classification method according to claim 4, characterized in that, The specific magnification ratios are 20x and 5x. The block size is set to 256×256 at 20x magnification and 64×64 at 5x magnification.

6. The data-efficient multi-scale pathological image classification method according to claim 1, characterized in that, The S2 feature extraction steps are as follows: The small slices obtained by S1 at two magnifications are converted into 1024-dimensional feature vectors using a ResNet50 model pre-trained on ImageNet. All feature vectors corresponding to each full slice image are stored in an HDF5 file.

7. The data-efficient multi-scale pathological image classification method according to claim 1, characterized in that, The AUC is obtained by the following formula: in, This represents the predicted probability of a positive sample. Represents the predicted probability of a negative sample. The number of positive samples The number of negative samples This indicates the number of samples whose predicted positive value is greater than the predicted negative value. When the predicted positive and negative values ​​are exactly equal, the sample is recorded as 0.5.