A method and system for identifying intestinal polyps
By combining feature extraction and fusion of endoscopic and pathological slide images, the problem of single feature recognition for complex intestinal polyps is solved, achieving high-precision intestinal polyp recognition and diagnostic support.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SUZHOU UNIV
- Filing Date
- 2025-12-31
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies for identifying complex intestinal polyps rely on relatively simple feature extraction methods, resulting in low classification accuracy.
By combining endoscopic images and pathological slide images, macroscopic and microscopic feature vectors are obtained through shallow and deep feature extraction. The MS-SCAM module and DCA-Net network are used for feature enhancement and fusion, and finally, a prediction network is used for classification.
It achieves more accurate identification of intestinal polyps, outputs comprehensive conclusions with both macroscopic and microscopic characteristics, provides richer clinical decision-making basis, and improves diagnostic and treatment efficiency.
Smart Images

Figure CN121437516B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image recognition technology, and in particular to a method and system for identifying intestinal polyps. Background Technology
[0002] Intestinal polyps are raised lesions on the colonic mucosa. They are caused by various factors that lead to colonic mucosal hyperplasia or adenomatous changes, resulting in a raised appearance.
[0003] Current intestinal polyp identification technologies often focus on single-modal information, such as endoscopic images or pathological tissue images. For single-modal medical image data, the extracted features are relatively limited, and the enrichment and emphasis of features vary at different scales. These methods perform well in predicting most simple polyps; however, they suffer from insufficient accuracy for complex polyps. In actual examinations, doctors need to combine multidimensional feature information of the lesions to make more specific and discriminative classifications, which current polyp identification technologies cannot meet.
[0004] In summary, existing identification techniques for complex intestinal polyps rely on relatively simple feature extraction, resulting in low accuracy in subsequent classification. Summary of the Invention
[0005] Therefore, the technical problem to be solved by the present invention is to overcome the problem that the existing technology for identifying complex intestinal polyps has relatively simple feature extraction, resulting in low accuracy of subsequent classification.
[0006] To address the aforementioned technical problems, this invention provides a method for identifying intestinal polyps, comprising:
[0007] Step S1: Obtain endoscopic images of intestinal polyps and their corresponding pathological slide images;
[0008] Step S2: Perform shallow feature extraction on the endoscopic image to obtain a shallow feature map, and then perform deep feature extraction on the shallow feature map to obtain a macroscopic feature vector of the intestinal polyp;
[0009] Simultaneously, the pathological slide image is cropped into several non-overlapping image blocks, and feature extraction is performed on each image block to obtain the feature vector of the image block. The attention weight of the feature vector of each image block is calculated, and the microscopic feature vector of the intestinal polyp is calculated based on the feature vector of each image block and its corresponding attention weight.
[0010] Step S3: Fuse the macroscopic feature vector and the microscopic feature vector to obtain a fused feature vector;
[0011] Step S4: Predict the category of intestinal polyps by analyzing the fused feature vector.
[0012] In one embodiment of the present invention, step S2 involves extracting shallow features from the endoscopic image to obtain a shallow feature map, and then extracting deep features from the shallow feature map to obtain a macroscopic feature vector of the intestinal polyp. The method includes:
[0013] A shallow feature map is obtained by performing shallow feature extraction on the endoscope image through the first convolutional layer and the first max pooling layer.
[0014] The shallow feature map is subjected to deep feature extraction by three cascaded residual blocks, and an MS-SCAM module is embedded after each residual block. The MS-SCAM module is used to enhance the features extracted by the residual blocks and maintain the stability of the features.
[0015] The enhanced features extracted from the three residual blocks and the corresponding MS-SCAM module are compressed into a one-dimensional feature vector through the first global average pooling layer, and this vector is used as the macroscopic feature vector f of the intestinal polyp. A .
[0016] In one embodiment of the present invention, each residual block includes a second convolutional layer, a first batch of normalized layers, a first ReLU function, a third convolutional layer, and a second batch of normalized layers connected in sequence. The input of each residual block is element-wise added to the output of the second batch of normalized layers to obtain a concatenated feature. The concatenated feature is then passed through a second ReLU function to obtain the feature map output by the residual block. .
[0017] In one embodiment of the present invention, the MS-SCAM module includes a statistical feature channel attention stage and a multi-scale cavity spatial attention stage, wherein,
[0018] During the statistical feature channel attention stage, the feature map output by the residual block is... The feature maps are extracted using a second global average pooling layer and a global max pooling layer, respectively. The second global average pooling layer is used to extract the feature maps. The mean background information is used to extract the feature map. The salient feature responses are used to generate corresponding vectors from the features extracted by the second global average pooling layer and the global max pooling layer, respectively, through a multilayer perceptron. and To enhance texture details; the vectors output by the two multilayer perceptrons are... and After element-wise addition, the channel attention weight vector is generated by passing it through the first Sigmoid activation function. Finally, the channel attention weight vector Feature map of residual block output Perform channel-dimensional multiplication to obtain the channel-enhanced feature map. ;
[0019] In the multi-scale void space attention stage, the channel enhancement feature map is first processed. Max pooling and average pooling are performed along the channel axis, and the results of channel max pooling and channel average pooling are concatenated to obtain the spatial description map. The spatial description map Simultaneously, three parallel convolutional branches are input, each employing a different dilated convolutional layer. The feature maps output by each of the three convolutional branches are then processed together. , , After concatenation, the layers are then fused using a fourth convolutional layer for dimensionality reduction, and finally a spatial attention weight map is generated using a second sigmoid activation function. ;
[0020] The spatial attention weight map With channel enhancement feature map Perform spatial dimension multiplication to obtain the spatially weighted feature map. Then, the spatially weighted feature map Feature map of residual block output Element-wise addition is performed to obtain the final output feature map of the MS-SCAM module. .
[0021] In one embodiment of the present invention, step S2 involves cropping the pathological slide image into several non-overlapping image blocks, extracting features from each image block to obtain a feature vector, and calculating the attention weight of the feature vector of each image block. The method for calculating the microscopic feature vector of the intestinal polyp based on the feature vector of each image block and its corresponding attention weight includes:
[0022] The pathological slide image is cropped into N non-overlapping image blocks of the same size using a sliding window method. i ;
[0023] N image blocks P i Input to the DCA-Net network, and process each image block P through the DCA-Net network. i Generate its corresponding feature vector f i The final feature package F={f1,f2,..,f i ,..,f N};
[0024] Each image block P in the feature packet i The corresponding eigenvector f i The first and second fully connected layers are copied and fed in parallel. The first fully connected layer is connected to the Tanh activation function, and the second fully connected layer is connected to the third Sigmoid activation function. The vectors output by the Tanh and Sigmoid activation functions are multiplied by Hadamard, and the result of the Hadamard product is then passed through the third fully connected layer and the Softmax function to output each image patch P. i The corresponding eigenvector f i Attention weights;
[0025] The feature vector f i The corresponding attention weights are weighted and summed to obtain the final aggregated micro-feature vector, which is used as the micro-feature vector f of intestinal polyps. B .
[0026] In one embodiment of the present invention, the DCA-Net network first extracts image patch P by sequentially connecting a fifth convolutional layer, a second max-pooling layer, and a third ReLU activation function. i Basic texture features Then, the basic texture features Deep feature extraction is performed using two cascaded densely connected blocks, and the deep features output by the second densely connected block are used as the basis for this extraction. Upsample the data and then combine it with the features output by the first densely connected block. The feature maps are then stitched and fused along the channel dimension to obtain the fused feature map. The fused feature map Average pooling is performed along the X-axis and Y-axis respectively to obtain feature vectors in the two directions. and , the feature vectors in two directions and After concatenation, the data passes through a sixth convolutional layer, a third batch of normalization layers, and a fourth ReLU activation function to generate an intermediate feature map containing spatial orientation information. Mapping intermediate features The data is split into two paths, and the features from each path are processed through their own independent seventh convolutional layer to recover the number of channels. Attention weight maps in the X-axis and Y-axis directions are then generated using the fourth sigmoid activation function. and Weighting the graphs in these two directions and With the fused feature map Element-wise multiplication is performed to recalibrate the features; the recalibrated feature map is then compressed through a third global average pooling layer to output the feature vector f. i .
[0027] In one embodiment of the present invention, step S3, which fuses the macroscopic feature vector and the microscopic feature vector to obtain a fused feature vector, includes the following method:
[0028] The macroscopic feature vector f A and the microscopic feature vector f B The concatenation operation is performed along the feature dimension to form a fused feature vector, represented as follows:
[0029] f fused =[f A; f B ];
[0030] Among them, f fused This represents a fusion feature vector that combines the macroscopic features of endoscopic images of intestinal polyps with the microscopic features of pathological slide images.
[0031] In one embodiment of the present invention, the method for predicting the category of intestinal polyps by the fused feature vector in step S4 includes:
[0032] The fused feature vector is input into the prediction network, which includes a fourth fully connected layer, a fifth ReLU function, a Dropout layer, a fifth fully connected layer, and a Softmax layer connected in sequence to output the probability that the polyp or intestinal polyp belongs to each category.
[0033] In one embodiment of the present invention, in step S2, macroscopic feature vectors of intestinal polyps are extracted through a macroscopic analysis branch network, and microscopic feature vectors of intestinal polyps are extracted through a microscopic analysis branch network.
[0034] The loss function corresponding to the macroscopic analysis branch network is the focus loss function. The formula is:
[0035] ;
[0036] in, It is the model's predicted probability of the true class; It is a regulatory factor; As a balance factor;
[0037] The loss function corresponding to the microanalysis branch network is the attention sparse entropy regularization loss function. The formula is:
[0038] ;
[0039] ;
[0040] ;
[0041] in, Cross-entropy loss for classification; This represents the total number of categories of intestinal polyps; As an indicator variable, when the category Set the value to 1 if the value is a true category, otherwise set the value to 0. To predict probabilities; Entropy loss for attention weights; For the first Attention weights for each image patch; The regularization coefficient is . This represents the total number of image blocks.
[0042] To address the aforementioned technical problems, this invention provides an intestinal polyp identification system, comprising:
[0043] Acquisition module: Used to acquire endoscopic images of intestinal polyps and their corresponding pathological slide images;
[0044] Feature extraction module: used to perform shallow feature extraction on the endoscopic image to obtain a shallow feature map, and then perform deep feature extraction on the shallow feature map to obtain a macroscopic feature vector about the intestinal polyp;
[0045] Simultaneously, it is used to crop the pathological slide image into several non-overlapping image blocks, extract features from each image block to obtain the feature vector of the image block, calculate the attention weight of the feature vector of each image block, and calculate the microscopic feature vector of the intestinal polyp based on the feature vector of each image block and its corresponding attention weight.
[0046] Feature fusion module: used to fuse the macroscopic feature vector and the microscopic feature vector to obtain a fused feature vector;
[0047] Prediction module: used to predict the category of intestinal polyps from the fused feature vector.
[0048] To address the aforementioned technical problems, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the steps of the intestinal polyp identification method described above.
[0049] To address the aforementioned technical problems, the present invention provides a computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, it implements the steps of the intestinal polyp identification method described above.
[0050] Compared with the prior art, the above-described technical solution of the present invention has the following advantages:
[0051] The intestinal polyp identification method of the present invention extracts features from the endoscopic images of intestinal polyps and their corresponding pathological slide images, then fuses the extracted features, and finally predicts the type of intestinal polyp (such as hyperplastic, adenomatous, carcinogenic, etc.) based on the fused features.
[0052] This invention constructs a more complete information view by integrating macroscopic morphological information from endoscopy (endoscopic images) and microscopic histological information from pathology (pathological slide images), effectively overcoming the shortcomings of insufficient information in a single modality, thus enabling more accurate detection than any single modality method.
[0053] The results of this invention for intestinal polyps are not merely simple classification labels, but comprehensive conclusions that combine macroscopic and microscopic characteristics (e.g., clearly indicating the polyp type and lesion grade). This provides richer decision-making basis for clinicians to formulate personalized treatment plans (such as the extent of endoscopic resection, whether additional surgery is needed, etc.). This invention is expected to provide a high-precision "predictive" pathology result after endoscopic examination and before the pathology report is issued, helping clinicians to perform risk stratification and treatment planning in advance, thereby improving diagnostic and treatment efficiency. Attached Figure Description
[0054] To make the content of this invention easier to understand, the invention will be further described in detail below with reference to specific embodiments and accompanying drawings.
[0055] Figure 1 This is a flowchart of the method of the present invention;
[0056] Figure 2 This is a schematic diagram of the macroscopic analysis branch network structure for extracting macroscopic feature vectors of intestinal polyps from endoscopic images in an embodiment of the present invention;
[0057] Figure 3 This is a schematic diagram of the residual block structure in an embodiment of the present invention;
[0058] Figure 4 This is a schematic diagram of the MS-SCAM module structure in an embodiment of the present invention;
[0059] Figure 5 This is a schematic diagram of the microscopic analysis branch network structure for extracting microscopic feature vectors of intestinal polyps from pathological slice images in an embodiment of the present invention;
[0060] Figure 6 This is a schematic diagram of the DCA-Net network structure in an embodiment of the present invention;
[0061] Figure 7 This is a schematic diagram of the attention aggregation module structure in an embodiment of the present invention.
[0062] Figure 8 This is a schematic diagram of the prediction network structure in an embodiment of the present invention. Detailed Implementation
[0063] The present invention will be further described below with reference to the accompanying drawings and specific embodiments, so that those skilled in the art can better understand and implement the present invention. However, the embodiments described are not intended to limit the present invention.
[0064] Example 1
[0065] Reference Figure 1 As shown, this invention relates to a method for identifying intestinal polyps, comprising:
[0066] Step S1: Obtain endoscopic images of intestinal polyps and their corresponding pathological slide images;
[0067] Step S2: Perform shallow feature extraction on the endoscopic image to obtain a shallow feature map, and then perform deep feature extraction on the shallow feature map to obtain a macroscopic feature vector of the intestinal polyp;
[0068] Simultaneously, the pathological slide image is cropped into several non-overlapping image blocks, and feature extraction is performed on each image block to obtain the feature vector of the image block. The attention weight of the feature vector of each image block is calculated, and the microscopic feature vector of the intestinal polyp is calculated based on the feature vector of each image block and its corresponding attention weight.
[0069] Step S3: Fuse the macroscopic feature vector and the microscopic feature vector to obtain a fused feature vector;
[0070] Step S4: Predict the category of intestinal polyps by analyzing the fused feature vector.
[0071] The following is a detailed description of this embodiment:
[0072] Further, in step S1, the original endoscope image is cropped to extract the region of interest (ROI), and all ROI images are uniformly scaled to the model input size, such as 224x224 pixels, to obtain the final endoscope image.
[0073] In step S1, data enhancement is performed on the original pathological image. Color enhancement is performed on the original pathological image to improve the local contrast of the image, making the boundary between the cell nucleus and the cytoplasm / matrix clearer and the texture details more prominent.
[0074] Further, please refer to Figure 2In step S2, the method of using a macroscopic analysis branch network to extract shallow features from the endoscopic image to obtain a shallow feature map, and then extracting deep features from the shallow feature map to obtain a macroscopic feature vector of the intestinal polyp includes:
[0075] Shallow feature maps are obtained by performing shallow feature extraction on the endoscopic image through a first convolutional layer (7x7 convolution) and a first max pooling layer (3x3 max pooling);
[0076] Deep feature extraction is performed on the shallow feature map using three cascaded residual blocks. This embodiment also embeds an improved multi-scale spatial-channel attention module (MS-SCAM module) after each residual block. This MS-SCAM module employs a serial cascaded structure, sequentially performing channel recalibration based on higher-order statistics and spatial filtering based on multi-scale hole awareness on the feature map, and introducing global residual connections to maintain the stability of the feature flow.
[0077] Please see Figure 3 Each residual block consists of a second convolutional layer (3x3 convolution), a first batch normalization layer (BatchNorm layer), a first ReLU function, a third convolutional layer (3x3 convolution), and a second batch normalization layer, all connected in sequence. The input of each residual block is element-wise added to the output of the second batch normalization layer to obtain a concatenated feature. This concatenated feature is then passed through a second ReLU function to obtain the feature map output by the residual block. The residual block structure designed in this embodiment effectively alleviates the gradient vanishing problem during deep network training.
[0078] The enhanced features extracted from the three residual blocks and the corresponding MS-SCAM module are compressed into a one-dimensional feature vector through the first global average pooling layer, and this vector is used as the macroscopic feature vector f of the intestinal polyp. A .
[0079] Further, please refer to Figure 4 The MS-SCAM module includes a statistical feature channel attention stage and a multi-scale cavity spatial attention stage, as detailed below:
[0080] (1) First, enter the statistical feature channel attention stage: In order to effectively make up for the shortcomings of traditional global average pooling which only focuses on background information and ignores local salient features, this embodiment adopts a dual-path statistical pooling strategy to focus on the feature map output by the residual block. The features are processed by Global Average Pooling (GAP) and Global Max Pooling (GMP). GAP extracts the mean background information of the feature distribution, while GMP specifically extracts the salient response features to preserve key discriminative details. The feature vectors output from these two statistical pooling methods are then fed into a Multilayer Perceptron (MLP) with identical parameters. This MLP captures the nonlinear dependencies between channels through dimensionality reduction and expansion operations. The output vectors of the two features processed by the MLP are then... and After element-wise addition, the channel attention weight vector is generated by the first Sigmoid activation function. Finally, this weight vector Feature map of residual block output Perform channel-dimensional multiplication to obtain the channel-enhanced feature map. .
[0081] (2) Channel Enhancement Feature Map Entering the multi-scale void space attention stage. Considering the dramatic size changes and diverse morphologies of intestinal polyps under endoscopy, this stage first focuses on enhancing the feature maps. Max pooling and average pooling are performed along the channel axis, and the results are concatenated to obtain a spatial description map containing the main feature responses. To achieve adaptive focusing on lesions of different sizes, this spatial description map... The input is simultaneously fed into three parallel convolutional branches, which use dilation rates of 1, 3, and 5, respectively. Dilated convolutional layers. Dilated convolutions with a dilation rate of 1 are used to capture subtle local features, adapting to small polyps; while dilated convolutions with dilation rates of 3 and 5 utilize expanded receptive fields to capture broader contextual features, adapting to larger polyps or diffuse lesions. The output feature maps of these three convolutional branches are then analyzed. , , After concatenation (implemented via a Concat layer), it is passed through the fourth convolutional layer ( Convolutional processing is used for dimensionality reduction and fusion, followed by a second Sigmoid activation function to generate a spatial attention weight map. The weighted graph With channel enhancement feature map Perform spatial dimension multiplication to obtain the spatially weighted feature map. This completes the secondary recalibration of the features. Finally, to prevent the loss of original information that may be caused by the attention mechanism and to promote the effective propagation of gradients, this embodiment adopts a global residual connection design: the spatially weighted feature map... Feature map of residual block output Element-wise addition is performed to obtain the final output feature map of the MS-SCAM module. .
[0082] Further, please refer to Figure 5 In step S2, the pathological slide image is cropped into several non-overlapping image blocks using a microscopic analysis branch network. Feature extraction is performed on each image block to obtain its feature vector. The attention weight of the feature vector of each image block is calculated. The method for calculating the microscopic feature vector of the intestinal polyp based on the feature vector of each image block and its corresponding attention weight includes:
[0083] The pathological slide image is cropped into N (200~2000) non-overlapping image blocks of the same size using a sliding window method {P1, P2, ..., P...}. i ,..,P N}; divide N image blocks P i Input a custom improved densely connected coordinate attention network (DCA-Net) for each image patch P. i Generate its corresponding feature vector f i Finally, a feature bag F={f1,f2,..,f...} is generated for all image patches. i ,..,f N}
[0084] In this embodiment, the process of extracting image patch features using a microscopic analysis branch network is as follows: N image patches P are obtained after cropping the pathological slide. i Input a custom, improved densely connected coordinate attention network (DCA-Net). See [link to relevant documentation]. Figure 6 This DCA-Net network aims to fully utilize the strong feature transfer capabilities of densely connected networks and combine a coordinate attention mechanism to enhance the capture of fine polyp textures. Image patch P i First, it enters the initial processing stage of the DCA-Net network, and then goes through a fifth convolutional layer ( Convolution), second max pooling layer ( Max pooling and a third ReLU activation function are used to extract image patch P. i Basic texture features This initial processing stage is primarily used for processing high-resolution image blocks P. i Dimensionality reduction is performed to remove redundant noise and extract image patch P. i Preliminary edge contours and texture information lay the foundation for subsequent depth feature extraction. Following this, basic texture features... The DCA-Net backbone network is then introduced. This backbone consists of two densely connected blocks (Dense Block 1 and Dense Block 2) connected in series, with a transition layer between them. Since Dense Blocks are existing technology, they will not be described in detail here. The transition layer corresponding to the Dense Block specifically includes a batch normalization layer, a ReLU activation function, a 1×1 convolutional layer, and a 2×2 average pooling layer connected in sequence. The 1×1 convolutional layer is used to reduce the dimensionality of the feature map output from the previous densely connected block (Dense Block 1), compressing the number of feature channels to reduce computation and prevent overfitting. The 2×2 average pooling layer is used to downsample the feature map, halving its spatial size, thereby expanding the receptive field of subsequent network layers. To address the problem of easily losing microscopic texture details in deep networks, this embodiment introduces a multi-scale feature fusion strategy. Specifically, the deep features output from the second densely connected block (Dense Block) are fused together... Perform an upsampling operation to restore its spatial resolution to the size of the previous level, and then combine it with the features output by the first dense connection block. The feature maps are concatenated along the channel dimension (using a Concat layer) to obtain the fused feature maps. This design effectively combines deep, abstract semantic information with shallow, location-based information, enhancing the network's ability to represent complex polypoid structures. The fused feature maps... The data is then input into the Coordinate Attention module for feature recalibration. Within this module, the fused feature map... First, the feature vectors are decomposed and then subjected to global average pooling along the X and Y axes, respectively, to obtain feature vectors in two directions that can capture long-range dependencies. and These two eigenvectors and They were then spliced together and fed into a sixth convolutional layer ( Dimensionality reduction is performed using convolution, and then combined with the third batch of normalization layers and the fourth ReLU activation function to generate intermediate feature maps containing spatial orientation information. Next, the intermediate features are mapped. The data is re-split into two paths, and the features from each path are processed through their own independent seventh convolutional layer. (Convolution) restores the number of channels, and uses the fourth sigmoid activation function to generate attention weight maps in the X and Y axes. and Finally, the weighted graphs for these two directions are plotted. and With the fused feature map Element-wise multiplication is performed to enable the network to precisely focus on specific regions of polyps, thus recalibrating the features. The recalibrated feature maps are then passed through a third global average pooling layer, compressing the two-dimensional spatial information into a one-dimensional feature vector f. i As the image block P i The final characteristic representation.
[0085] Please see Figure 7 The feature packet F={f1,f2,..,f i ,..,f N The input is fed into the attention aggregation module. Specifically, in this embodiment, each image block P in the feature package is... i The corresponding eigenvector f i The first and second fully connected layers are copied and fed into the attention aggregation module in parallel. The first fully connected layer is connected to the Tanh activation function, and the second fully connected layer is connected to the third Sigmoid activation function. The vectors output by the Tanh and third Sigmoid activation functions are multiplied by Hadamard, and the result of the Hadamard product is then passed through the third fully connected layer and the Softmax function to output each image patch P. i The corresponding eigenvector f i Attention weights;
[0086] The feature vector f i The corresponding attention weights are weighted and summed to obtain the final aggregated micro-feature vector, which is used as the micro-feature vector f of intestinal polyps. B .
[0087] Further, step S3 involves fusing the macroscopic feature vector and the microscopic feature vector to obtain a fused feature vector. The method includes:
[0088] The macroscopic feature vector f A and the microscopic feature vector f B A concatenation operation is performed along the feature dimension (implemented through a concat layer) to form a fused feature vector, represented as:
[0089] f fused =[f A; f B ];
[0090] Among them, f fused This represents a fusion feature vector that combines the macroscopic features of endoscopic images of intestinal polyps with the microscopic features of pathological slide images.
[0091] Further, please refer to Figure 8 The method for predicting the category of intestinal polyps by the fused feature vector in step S4 includes:
[0092] The fused feature vector is input into a prediction network, which contains a fourth fully connected layer, a fifth ReLU function, a Dropout layer, a fifth fully connected layer, and a Softmax layer connected in sequence to output the probability that the polyp belongs to each category (such as hyperplastic, adenomatous, carcinogenic, etc.).
[0093] Further, in step S2: macroscopic feature vectors of intestinal polyps are extracted through macroscopic analysis branch network, and microscopic feature vectors of intestinal polyps are extracted through microscopic analysis branch network;
[0094] The loss function corresponding to the macroscopic analysis branch network is the focal loss function. The loss function corresponding to the micro-analysis branch network is the AttentionSparse Entropy Loss. .
[0095] To address the issue of insufficient mining of difficult samples (such as flat polyps) and simple samples (such as raised polyps or background) in intestinal polyp identification, a focus loss function is employed. Its formula is:
[0096]
[0097] in, It is the model's predicted probability of the true class; It is a regulating factor used to reduce the weight of easily classified samples, so that the model focuses on difficult samples with larger training errors. It is a balancing factor.
[0098] Because pathological sections are cropped into multiple image patches, and only some of these patches contain typical lesion features, this embodiment introduces an attention entropy regularization term on top of the cross-entropy loss to force the attention mechanism to sensitively capture these few key image patches, rather than assigning uniform weights. The formula is as follows:
[0099]
[0100]
[0101]
[0102] in, Cross-entropy loss for classification; This represents the total number of categories of intestinal polyps; As an indicator variable, when the category Set the value to 1 if the value is a true category, otherwise set the value to 0. To predict probabilities; For the first Attention weights for each image patch; The regularization coefficient is . This represents the total number of image blocks. The entropy loss of the attention weights is minimized. This can make the attention weight distribution It is more "sharper" (i.e. sparser), thus accurately locating the most valuable pathological areas.
[0103] Example 2
[0104] This embodiment provides a system for identifying intestinal polyps, including:
[0105] Acquisition module: Used to acquire endoscopic images of intestinal polyps and their corresponding pathological slide images;
[0106] Feature extraction module: used to perform shallow feature extraction on the endoscopic image to obtain a shallow feature map, and then perform deep feature extraction on the shallow feature map to obtain a macroscopic feature vector about the intestinal polyp;
[0107] Simultaneously, it is used to crop the pathological slide image into several non-overlapping image blocks, extract features from each image block to obtain the feature vector of the image block, calculate the attention weight of the feature vector of each image block, and calculate the microscopic feature vector of the intestinal polyp based on the feature vector of each image block and its corresponding attention weight.
[0108] Feature fusion module: used to fuse the macroscopic feature vector and the microscopic feature vector to obtain a fused feature vector;
[0109] Prediction module: used to predict the category of intestinal polyps from the fused feature vector.
[0110] Example 3
[0111] This embodiment provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the steps of the intestinal polyp identification method described in Embodiment 1.
[0112] Example 4
[0113] This embodiment provides a computer-readable storage medium storing a computer program thereon. When the computer program is executed by a processor, it implements the steps of the intestinal polyp identification method described in Embodiment 1.
[0114] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code. The solutions in the embodiments of this application can be implemented in various computer languages, such as the object-oriented programming language Java and the interpreted scripting language JavaScript.
[0115] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0116] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0117] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0118] Obviously, the above embodiments are merely illustrative examples for clear explanation and are not intended to limit the implementation. Those skilled in the art will recognize that other variations or modifications can be made based on the above description. It is neither necessary nor possible to exhaustively list all possible implementations here. However, obvious variations or modifications derived therefrom are still within the scope of protection of this invention.
Claims
1. A method for identifying intestinal polyps, characterized in that, include: Step S1: Obtain endoscopic images of intestinal polyps and their corresponding pathological slide images; Step S2: Perform shallow feature extraction on the endoscopic image to obtain a shallow feature map, and then perform deep feature extraction on the shallow feature map to obtain a macroscopic feature vector of the intestinal polyp; Simultaneously, the pathological slide image is cropped into several non-overlapping image blocks, and feature extraction is performed on each image block to obtain the feature vector of the image block. The attention weight of the feature vector of each image block is calculated, and the microscopic feature vector of the intestinal polyp is calculated based on the feature vector of each image block and its corresponding attention weight. The method for step S2, which involves extracting shallow features from the endoscopic image to obtain a shallow feature map, and then extracting deep features from the shallow feature map to obtain a macroscopic feature vector of the intestinal polyp, includes: A shallow feature map is obtained by performing shallow feature extraction on the endoscope image through the first convolutional layer and the first max pooling layer. The shallow feature map is subjected to deep feature extraction by three cascaded residual blocks, and an MS-SCAM module is embedded after each residual block. The MS-SCAM module is used to enhance the features extracted by the residual blocks and maintain the stability of the features. The enhanced features extracted from the three residual blocks and the corresponding MS-SCAM module are compressed into a one-dimensional feature vector through the first global average pooling layer, and this vector is used as the macroscopic feature vector f of the intestinal polyp. A ; Step S3: Fuse the macroscopic feature vector and the microscopic feature vector to obtain a fused feature vector; Step S4: Predict the category of intestinal polyps by analyzing the fused feature vector.
2. The method for identifying intestinal polyps according to claim 1, characterized in that: Each residual block comprises a second convolutional layer, a first batch of normalized layers, a first ReLU function, a third convolutional layer, and a second batch of normalized layers connected in sequence. The input of each residual block is element-wise added to the output of the second batch of normalized layers to obtain concatenated features. These concatenated features are then passed through a second ReLU function to obtain the feature map output by the residual block. .
3. The method for identifying intestinal polyps according to claim 1, characterized in that: The MS-SCAM module includes a statistical feature channel attention stage and a multi-scale cavity spatial attention stage, wherein, During the statistical feature channel attention stage, the feature map output by the residual block is... The feature maps are extracted using a second global average pooling layer and a global max pooling layer, respectively. The second global average pooling layer is used to extract the feature maps. The mean background information is used to extract the feature map. The salient feature responses are used to generate corresponding vectors from the features extracted by the second global average pooling layer and the global max pooling layer, respectively, through a multilayer perceptron. and To enhance texture details; the vectors output by the two multilayer perceptrons are... and After element-wise addition, the channel attention weight vector is generated by passing it through the first Sigmoid activation function. Finally, the channel attention weight vector Feature map of residual block output Perform channel-dimensional multiplication to obtain the channel-enhanced feature map. ; In the multi-scale void space attention stage, the channel enhancement feature map is first processed. Max pooling and average pooling are performed along the channel axis, and the results of channel max pooling and channel average pooling are concatenated to obtain the spatial description map. The spatial description map Simultaneously, three parallel convolutional branches are input, each employing a different dilated convolutional layer. The feature maps output by each of the three convolutional branches are then processed together. , , After concatenation, the layers are then fused using a fourth convolutional layer for dimensionality reduction, and finally a spatial attention weight map is generated using a second sigmoid activation function. ; The spatial attention weight map With channel enhancement feature map Perform spatial dimension multiplication to obtain the spatially weighted feature map. Then, the spatially weighted feature map Feature map of residual block output Element-wise addition is performed to obtain the final output feature map of the MS-SCAM module. .
4. The method for identifying intestinal polyps according to claim 1, characterized in that: Step S2 involves cropping the pathological slide image into several non-overlapping image blocks, extracting features from each image block to obtain a feature vector, and calculating the attention weight of the feature vector for each image block. The method for calculating the microscopic feature vector of the intestinal polyp based on the feature vector of each image block and its corresponding attention weight includes: The pathological slide image is cropped into N non-overlapping image blocks of the same size using a sliding window method. i ; N image blocks P i Input to the DCA-Net network, and process each image block P through the DCA-Net network. i Generate its corresponding feature vector f i The final feature package F={f1,f2,..,f i ,..,f N }; Each image block P in the feature packet i The corresponding eigenvector f i The first and second fully connected layers are copied and fed in parallel. The first fully connected layer is connected to the Tanh activation function, and the second fully connected layer is connected to the third Sigmoid activation function. The vectors output by the Tanh and Sigmoid activation functions are multiplied by Hadamard, and the result of the Hadamard product is then passed through the third fully connected layer and the Softmax function to output each image patch P. i The corresponding eigenvector f i Attention weights; The feature vector f i The corresponding attention weights are weighted and summed to obtain the final aggregated micro-feature vector, which is used as the micro-feature vector f of intestinal polyps. B .
5. The method for identifying intestinal polyps according to claim 4, characterized in that: The DCA-Net network first extracts image patch P by sequentially passing a fifth convolutional layer, a second max pooling layer, and a third ReLU activation function. i Basic texture features Then, the basic texture features Deep feature extraction is performed using two cascaded densely connected blocks, and the deep features output by the second densely connected block are used as the basis for this extraction. Upsample the data and then combine it with the features output by the first densely connected block. The feature maps are then stitched and fused along the channel dimension to obtain the fused feature map. The fused feature map Average pooling is performed along the X-axis and Y-axis respectively to obtain feature vectors in the two directions. and , the feature vectors in two directions and After concatenation, the data passes through a sixth convolutional layer, a third batch of normalization layers, and a fourth ReLU activation function to generate an intermediate feature map containing spatial orientation information. Mapping intermediate features The data is split into two paths, and the features from each path are processed through their own independent seventh convolutional layer to recover the number of channels. Attention weight maps in the X-axis and Y-axis directions are then generated using the fourth sigmoid activation function. and Weighting the graphs in these two directions and With the fused feature map Element-wise multiplication is performed to recalibrate the features; the recalibrated feature map is then compressed through a third global average pooling layer to output the feature vector f. i .
6. The method for identifying intestinal polyps according to claim 1, characterized in that: The method for fusing the macroscopic feature vector and the microscopic feature vector in step S3 to obtain the fused feature vector includes: The macroscopic feature vector f A and the microscopic feature vector f B The concatenation operation is performed along the feature dimension to form a fused feature vector, represented as follows: f fused =[f A; f B ]; Among them, f fused This represents a fusion feature vector that combines the macroscopic features of endoscopic images of intestinal polyps with the microscopic features of pathological slide images.
7. The method for identifying intestinal polyps according to claim 1, characterized in that: The method for predicting the category of intestinal polyps by the fused feature vector in step S4 includes: The fused feature vector is input into the prediction network, which includes a fourth fully connected layer, a fifth ReLU function, a Dropout layer, a fifth fully connected layer, and a Softmax layer connected in sequence to output the probability that the polyp or intestinal polyp belongs to each category.
8. The method for identifying intestinal polyps according to claim 1, characterized in that: In step S2, macroscopic feature vectors of intestinal polyps are extracted through macroscopic analysis branch network, and microscopic feature vectors of intestinal polyps are extracted through microscopic analysis branch network. The loss function corresponding to the macroscopic analysis branch network is the focus loss function. The formula is: ; in, It is the model's predicted probability of the true class; It is a regulatory factor; As a balance factor; The loss function corresponding to the microanalysis branch network is the attention sparse entropy regularization loss function. The formula is: ; ; ; in, Cross-entropy loss for classification; This represents the total number of categories of intestinal polyps; As an indicator variable, when the category Set the value to 1 if the value is a true category, otherwise set the value to 0. To predict probabilities; Entropy loss for attention weights; For the first Attention weights for each image patch; The regularization coefficient is . This represents the total number of image blocks.
9. A system for identifying intestinal polyps, used to implement the method for identifying intestinal polyps as described in any one of claims 1 to 8, characterized in that, include: Acquisition module: Used to acquire endoscopic images of intestinal polyps and their corresponding pathological slide images; Feature extraction module: used to perform shallow feature extraction on the endoscopic image to obtain a shallow feature map, and then perform deep feature extraction on the shallow feature map to obtain a macroscopic feature vector about the intestinal polyp; Simultaneously, it is used to crop the pathological slide image into several non-overlapping image blocks, extract features from each image block to obtain the feature vector of the image block, calculate the attention weight of the feature vector of each image block, and calculate the microscopic feature vector of the intestinal polyp based on the feature vector of each image block and its corresponding attention weight. Feature fusion module: used to fuse the macroscopic feature vector and the microscopic feature vector to obtain a fused feature vector; Prediction module: used to predict the category of intestinal polyps from the fused feature vector.