A method for urine formed element instance segmentation

By integrating the SE attention mechanism into the Mask Scoring R-CNN model, the problem of low accuracy and efficiency of urine formed elements in the instance segmentation model is solved, achieving higher accuracy and efficiency in urine formed element instance segmentation.

CN116778153BActive Publication Date: 2026-06-23RONGZHEN (CHONGQING) MEDICAL TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
RONGZHEN (CHONGQING) MEDICAL TECHNOLOGY CO LTD
Filing Date
2023-05-17
Publication Date
2026-06-23

Smart Images

  • Figure CN116778153B_ABST
    Figure CN116778153B_ABST
Patent Text Reader

Abstract

The application relates to the technical field of image processing, and discloses a urine formed element instance segmentation method, which comprises the following steps: constructing a urine formed element labeled image dataset, dividing the urine formed element labeled image dataset into a training set and a test set; taking a Mask Scoring R-CNN model as a basic model, training an instance segmentation model integrated with an SE attention mechanism by using the training set; inputting urine formed element labeled data in the test set into the trained instance segmentation model to perform instance segmentation, and obtaining a urine formed element segmentation result. On the basis of the Mask Scoring R-CNN model, the SE attention mechanism is integrated, the urine formed element image is subjected to instance segmentation, the problems of urine formed element missed detection and rough urine formed element segmentation caused by cell adhesion, overlapping and impurities can be effectively solved, and the precision and efficiency of urine formed element instance segmentation are improved.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Technology Neighborhood

[0002] This invention relates to the field of image processing technology, and more specifically, to a method for segmenting formed elements in urine. Background Technology

[0003] Formed elements in urine are a collective term for substances that leak, are excreted, detach, and concentrate and crystallize from the urinary tract. Examination of formed elements in urine is a crucial procedure in urinalysis. With the development of computer vision technology, deep learning has been applied to the segmentation of formed elements in urine. Currently, single deep learning models are commonly used for instance segmentation of formed elements in urine. However, the morphology of formed elements in urine is complex and varied, and the segmentation results of single deep learning models for urine formed elements suffer from low accuracy and low efficiency. Summary of the Invention

[0004] To overcome the shortcomings of existing technologies, such as low accuracy and low efficiency in urine formation and segmentation, the present invention proposes the following technical solution:

[0005] This invention proposes a method for segmenting formed elements in urine, comprising:

[0006] A urine formed element labeled image dataset is constructed, and the urine formed element labeled image dataset is divided into a training set and a test set.

[0007] Using the Mask Scoring R-CNN model as the base model, an instance segmentation model incorporating the SE attention mechanism is trained using the training set.

[0008] The labeled urine formed elements data from the test set are input into the trained instance segmentation model to perform instance segmentation and obtain the urine formed element segmentation results.

[0009] As a preferred technical solution, the instance segmentation model includes a backbone feature network that integrates the SE attention mechanism, a candidate region extraction network, a masking terminal network, and a segmentation quality scoring network.

[0010] The backbone feature network is used to extract feature maps from the input image data, and to perform SE attention operations on the feature maps to obtain feature maps with fused attention.

[0011] The candidate region extraction network is used to extract target candidate regions of urine formed elements based on the feature mapping of the input fusion attention.

[0012] The masking network is used to detect, classify, and segment the target candidate regions to obtain a predicted segmentation mask.

[0013] The segmentation quality scoring network is used to score the quality of the predicted segmentation mask and select the optimal urine formed element segmentation result based on the scoring results.

[0014] As a preferred technical solution, the backbone feature network extracts feature maps from the input image data and performs SE attention operations on the feature maps to obtain feature maps with fused attention, specifically including:

[0015] A residual network and a feature pyramid network with fused SE attention mechanism are used to extract feature maps from the input image, and feature maps of these feature maps are extracted to obtain a first feature map of size [C, H, W]. The expression for the extracted feature map is as follows:

[0016] u c =v c *X

[0017] Among them, u c Represents the first feature map, v c Here, X represents the filter kernel set, X represents the input image, * indicates the convolution operation, and C, H, and W represent the number of channels, height, and width of the first feature map, respectively.

[0018] The SE attention operation is performed on the first feature map, specifically including:

[0019] The first feature map is squeezed to obtain a second feature map of size [C, 1, 1], as shown in the following expression:

[0020]

[0021] Among them, z c This represents the second feature map.

[0022] The second feature map is activated, and the resulting feature vector is passed through a fully connected neural network to obtain the channel weights of the second feature map, as shown in the following expression:

[0023] s c =σ(W2δ(W1z) c ))

[0024] Among them, s c σ represents the channel weights of the second feature map, σ(·) is the sigmoid function, and W1 and W2 are the parameters of the fully connected layer.

[0025] The channel weights of the first and second feature maps are multiplied by a dot product to obtain the fused SE attention feature map, as shown in the following expression:

[0026] P = s c ·uc

[0027] Where P is the fused SE attention feature map, and · represents the dot product operation.

[0028] As a preferred technical solution, the candidate region extraction network extracts target candidate regions of formed elements in urine based on the feature mapping of the input fusion attention. The specific steps include:

[0029] After performing a convolution operation on the feature map fused with attention using a 3×3 convolution kernel, two 1×1 fully connected layers are used sequentially to perform classification and regression operations on the convolution operation results, resulting in redundant candidate regions.

[0030] The Soft-NMS method is used to filter out the target candidate region from the redundant candidate regions.

[0031] As a preferred technical solution, the masking terminal network utilizes three fully connected networks to perform instance detection and classification, instance segmentation, and segmentation scoring operations on the target candidate region, respectively, to obtain the predicted segmentation mask. The loss function L of the masking terminal network... RoI The expression is as follows:

[0032] L RoI =L box +L cls +L mask

[0033] Among them, L box For, L cls For classification loss, L mask This is the loss due to the segmentation mask.

[0034] Border detection loss L box The SmoothL1 loss function is used, and its expression is as follows:

[0035]

[0036] Among them, T pred T represents the mask predicted by the terminal network. gt The mask representing the actual annotation.

[0037] Classification loss L cls The cross-entropy loss for all classes is expressed as follows:

[0038]

[0039] Among them, X label To predict the score for a category, label represents the category index, N represents the number of categories, and X... j Let be the score of the true category of the j-th class.

[0040] Segmentation mask loss L mask The expression is as follows:

[0041] L mask =L cls ×L maskIoU

[0042] Among them, L cls For classification loss, L maskIoU The loss of the segmented quality scoring network.

[0043] As a preferred technical solution, a segmentation quality scoring network scores the predicted segmentation mask and selects the optimal urine formed segmentation result based on the scoring results. The specific steps include:

[0044] Max pooling is applied to the predicted segmentation mask to obtain a third feature map of size 14×14×1.

[0045] The predicted segmentation mask is processed by RoIAlign to obtain a fourth feature map with a size of 14×14×256.

[0046] The third and fourth feature maps are added together to obtain the fifth feature map.

[0047] The fifth feature map is input into the segmentation quality scoring network for classification, and the mask IoU value of the segmentation quality scoring network is calculated based on the classification results.

[0048] Based on the mask IoU value, the predicted segmentation mask is scored for quality, and the optimal urine formed element segmentation result is selected based on the scoring results.

[0049] As a preferred technical solution, the mask IoU value of the segmentation quality scoring network is calculated using L2 loss function regression, and its expression is as follows:

[0050] L maskIoU =∑(T) pred -T gt ) 2

[0051] Among them, T pred T represents the mask for network prediction. gt The mask representing the actual annotation.

[0052] As a preferred technical solution, the specific steps for constructing the urine formed element labeled image dataset include:

[0053] Microscopic images of a specified area from different individuals were collected under different lighting conditions using the RZ1100 fully automated urine formed element analyzer.

[0054] The LABELME software was used to perform preliminary instance segmentation and annotation of different types of cells and formed elements in the microscope images, resulting in a dataset of annotated images of urine formed elements.

[0055] As a preferred technical solution, after dividing the urine formed element labeled image dataset into a training set and a test set, the method further includes: performing augmentation processing on the training set using brightness transformation, saturation transformation and contrast transformation methods.

[0056] As a preferred technical solution, the following parameters are set for the instance segmentation model during training:

[0057] Set the number of output categories for the instance segmentation model to 4.

[0058] Set the aspect ratio of the preset anchor points of the candidate region extraction network to [0.5, 1.2].

[0059] Set the size of the anchor box to [32, 64, 128, 256, 384].

[0060] The initial learning rate of the instance segmentation model is set to 0.001, the learning rate decay factor is set to 0.0001, and the maximum number of iterations is set to 1000. The learning rate is updated and the instance segmentation model is saved once every 1000 iterations.

[0061] Compared with the prior art, the beneficial effects of the technical solution of the present invention include: Based on the Mask Scoring R-CNN model, the present invention integrates the SE attention mechanism to perform instance segmentation on urine formed element images, which can effectively solve the problems of missed detection of urine formed elements and coarse segmentation of urine formed elements caused by cell adhesion, overlap and impurities, and improve the accuracy and efficiency of urine formed element instance segmentation. Attached Figure Description

[0062] Figure 1 This is a flowchart illustrating the urine formed element segmentation method provided in an embodiment of this application.

[0063] Figure 2 This is a schematic diagram of the instance segmentation model in the embodiments of this application.

[0064] Figure 3 This is a schematic diagram of the residual network that incorporates the SE attention mechanism in an embodiment of this application.

[0065] Figure 4 This is a schematic diagram illustrating the process of training an instance segmentation model that incorporates the SE attention mechanism in an embodiment of this application.

[0066] Figure 5This is a flowchart illustrating data augmentation in an embodiment of this application.

[0067] Figure 6 This is a directory structure diagram of the dataset in the embodiments of this application.

[0068] Figure 7 Visualize the segmentation results of urine formed parts from the traditional Mask Scoring R-CNN model. Figure I .

[0069] Figure 8 Visualize the segmentation results of urine formed parts from the traditional Mask Scoring R-CNN model. Figure II .

[0070] Figure 9 Visualize the segmentation results of urine formed parts from the traditional Mask Scoring R-CNN model. Figure III .

[0071] Figure 10 Visualize the segmentation results of urine formed parts from the traditional Mask Scoring R-CNN model. Figure IV .

[0072] Figure 11 This is the visualization segmentation result of urine formed parts from the Mask Scoring R-CNN model that incorporates the SE attention mechanism in this embodiment of the application.

[0073] Figure 12 This is the visualization segmentation result of urine formed parts from the Mask Scoring R-CNN model that incorporates the SE attention mechanism in this embodiment of the application.

[0074] Figure 13 This is the visualization segmentation result of urine formed parts from the Mask Scoring R-CNN model that incorporates the SE attention mechanism in this embodiment of the application.

[0075] Figure 14 The image shows the visualization segmentation result of urine formed parts from the Mask Scoring R-CNN model that incorporates the SE attention mechanism in this embodiment of the application. Detailed Implementation

[0076] The embodiments of the present invention will be described below with reference to the accompanying drawings and preferred technical solutions. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be understood that the preferred technical solutions are only for illustrating the present invention and are not intended to limit the scope of protection of the present invention.

[0077] It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of the present invention. Therefore, the drawings only show the components related to the present invention and are not drawn according to the actual number, shape and size of the components in the actual implementation. In the actual implementation, the form, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.

[0078] In the following description, numerous details are explored to provide a more thorough explanation of embodiments of the invention. However, it will be apparent to those skilled in the art that embodiments of the invention may be practiced without these specific details. In other embodiments, well-known structures and devices are shown in block diagram form rather than in detail to avoid obscuring embodiments of the invention.

[0079] Squeeze-and-Excitation Networks (SE) represent adding an attention mechanism to the channel dimension, with the key operations being squeezing and excitation.

[0080] Example 1

[0081] See Figure 1 This embodiment proposes a method for segmenting urine formed elements, including:

[0082] A urine formed element labeled image dataset is constructed, and the urine formed element labeled image dataset is divided into a training set and a test set.

[0083] Using the Mask Scoring R-CNN model as the base model, an instance segmentation model incorporating the SE attention mechanism is trained using the training set.

[0084] The labeled urine formed elements data from the test set are input into the trained instance segmentation model to perform instance segmentation and obtain the urine formed element segmentation results.

[0085] The urine formed element instance segmentation method proposed in this application, based on the Mask Scoring R-CNN model, integrates the SE attention mechanism to perform instance segmentation on urine formed element images. It can effectively solve the problems of missed detection of urine formed elements and coarse segmentation of urine formed elements caused by cell adhesion, overlap and impurities, and improve the accuracy and efficiency of urine formed element instance segmentation.

[0086] Example 2

[0087] This embodiment is an improvement on the urine formed element segmentation method proposed in Embodiment 1.

[0088] In this embodiment, the specific steps for constructing a labeled image dataset of urine formed elements include:

[0089] Microscopic images of specified areas from different individuals were acquired using the RZ1100 fully automated urine formed element analyzer under varying lighting conditions. The LABELME software was then used to perform preliminary instance segmentation and annotation of different cell types and formed elements in the microscopic images, resulting in a urine formed element labeled image dataset.

[0090] In the specific implementation process, LABELME software was used to annotate the microscope images to obtain JSON annotation files, and a Python script was used to convert the data into COCO dataset format.

[0091] In this embodiment, the ratio of the training set to the test set is 7:3. The training set is augmented using brightness transformation, saturation transformation, and contrast transformation methods.

[0092] In this embodiment, the following parameters are set for the instance segmentation model during training:

[0093] The number of output categories for the instance segmentation model is set to 4; the aspect ratio of the preset anchor points for the candidate region extraction network is set to [0.5, 1.2]; the size of the anchor boxes is set to [32, 64, 128, 256, 384]; the initial learning rate of the instance segmentation model is set to 0.001, the learning rate decay factor is set to 0.0001, and the maximum number of iterations is set to 1000, with the learning rate updated and the instance segmentation model saved every 1000 iterations. When reading the input image, the size is uniformly scaled to 1333×800.

[0094] In this embodiment, as Figure 2 As shown, Figure 2 This is a schematic diagram of the instance segmentation model in the embodiments of this application. The instance segmentation model includes a backbone feature network that integrates the SE attention mechanism, a candidate region extraction network, a masking terminal network, and a segmentation quality scoring network.

[0095] The backbone feature network is used to extract feature maps from the input image data, and performs SE attention operations on the feature maps to obtain feature maps with fused attention, specifically including:

[0096] A residual network and a feature pyramid network with fused SE attention mechanism are used to extract feature maps from the input image, and feature maps of the feature maps are extracted to obtain a first feature map of size [C, H, W]; as shown Figure 3 As shown, Figure 3 This is a schematic diagram of the residual network structure incorporating the SE attention mechanism in this embodiment of the application; the expression for extracting the feature map of the feature map is shown below:

[0097] u c =v c *X

[0098] Among them, u c Represents the first feature map, v c Here, X represents the filter kernel set, X represents the input image, * indicates the convolution operation, and C, H, and W represent the number of channels, height, and width of the first feature map, respectively.

[0099] The SE attention operation is performed on the first feature map, specifically including:

[0100] The first feature map is squeezed to obtain a second feature map of size [C, 1, 1], as shown in the following expression:

[0101]

[0102] Among them, z c This represents the second feature map.

[0103] The second feature map is activated, and the resulting feature vector is passed through a fully connected neural network to obtain the channel weights of the second feature map, as shown in the following expression:

[0104] s c =σ(W2δ(W1z) c ))

[0105] Among them, s c σ represents the channel weights of the second feature map, σ(·) is the sigmoid function, and W1 and W2 are the parameters of the fully connected layer.

[0106] The channel weights of the first and second feature maps are multiplied by a dot product to obtain the fused SE attention feature map, as shown in the following expression:

[0107] P = s c ·uc

[0108] Where P is the fused SE attention feature map, and · represents the dot product operation.

[0109] The candidate region extraction network is used to extract target candidate regions of urine formed elements based on the feature mapping of the input fusion attention. The specific steps include:

[0110] After performing a convolution operation on the feature map fused with attention using a 3×3 convolution kernel, two 1×1 fully connected layers are used sequentially to perform classification and regression operations on the convolution operation results to obtain redundant candidate regions.

[0111] The Soft-NMS method is used to filter out the target candidate region from the redundant candidate regions.

[0112] The masking network is used to detect, classify, and segment the target candidate regions to obtain a predicted segmentation mask.

[0113] In this embodiment, the masking network utilizes three fully connected networks to perform instance detection and classification, instance segmentation, and segmentation scoring operations on the target candidate region, respectively, to obtain a predicted segmentation mask; the loss function L of the masking network... RoI The expression is as follows:

[0114] L RoI =L box +L cls +L mask

[0115] Among them, L box For, L cls For classification loss, L mask For segmentation mask loss;

[0116] Border detection loss L box The SmoothL1 loss function is used, and its expression is as follows:

[0117]

[0118] Among them, T pred T represents the mask predicted by the terminal network. gt The mask representing the actual annotation.

[0119] Classification loss L cls The cross-entropy loss for all classes is expressed as follows:

[0120]

[0121] Among them, X labelTo predict the score for a category, label represents the category index, N represents the number of categories, and X... j Let be the score of the true category of the j-th class;

[0122] Segmentation mask loss L mask The expression is as follows:

[0123] L mask =L cls ×L maskIoU

[0124] Among them, L cls For classification loss, L maskIoU The loss of the segmented quality scoring network.

[0125] The segmentation quality scoring network is used to score the quality of the predicted segmentation mask, and based on the scoring results, the optimal urine formed segmentation result is selected. The specific steps include:

[0126] Max pooling is applied to the predicted segmentation mask to obtain a third feature map of size 14×14×1.

[0127] The predicted segmentation mask is processed by RoIAlign to obtain a fourth feature map of size 14×14×256;

[0128] The third and fourth feature maps are added together to obtain the fifth feature map;

[0129] The fifth feature map is input into the segmentation quality scoring network for classification, and the mask IoU value of the segmentation quality scoring network is calculated based on the classification results.

[0130] Based on the mask IoU value, the predicted segmentation mask is scored for quality, and the optimal urine formed element segmentation result is selected based on the scoring results.

[0131] In this embodiment, the segmentation quality scoring network comprises four convolutional layers and three fully connected layers. In the four convolutional layers, the kernel size of the first layer is 3×3×257, and the kernel size of the other three layers is 3×3×256. In the three fully connected layers, the output of the first two layers is 1024, and the number of output categories in the last layer is 4.

[0132] In this embodiment, the mask IoU value of the segmentation quality scoring network is calculated using the L2 loss function regression, and its expression is as follows:

[0133] L maskIoU =∑(T) pred -T gt ) 2

[0134] Among them, Tpred T represents the mask for network prediction. gt The mask representing the actual annotation.

[0135] In this embodiment, the best model is selected from the trained instance segmentation models based on the validation set for testing. The instance segmentation model with the added SE attention mechanism only assists the original Mask Scoring R-CNN model in improving performance during the training phase and will not be invoked during the testing phase. Therefore, fusing the SE attention mechanism does not add extra testing time. When reading test images, the model also uniformly scales the image size to 1333×800. Statistical analysis of urine formed element instance segmentation results is performed on the test set. This includes a comparison of urine formed element instance segmentation results between the Mask Scoring R-CNN model and the Mask Scoring R-CNN model with the SE attention mechanism in natural scenes.

[0136] Understandably, this invention integrates the SE attention mechanism on the basis of the Mask Scoring R-CNN model to perform instance segmentation on urine formed element images. It can effectively solve the problems of missed detection of urine formed elements and coarse segmentation of urine formed elements caused by cell adhesion, overlap and impurities, and improve the accuracy and efficiency of urine formed element instance segmentation.

[0137] Example 3

[0138] This embodiment is based on the urine formed element segmentation method proposed in Embodiment 2, and analyzes and compares urine data collected from a tertiary hospital in Beijing.

[0139] like Figure 4 As shown, Figure 4 This is a flowchart illustrating the process of training the instance segmentation model fused with the SE attention mechanism in this embodiment. In this embodiment, data was collected from a top-tier hospital in Beijing, with 280 images of formed elements from patient urine samples. The image resolution was 1600×1200 pixels. The collected urine images were manually labeled using LABLEME software, and 194 images were randomly selected as the training set, with the remaining 86 images used as the test set. Two experts participated in the manual labeling of the 280 collected urine images to ensure the accuracy of the labeled dataset. To avoid underfitting or overfitting, the training dataset was augmented, resulting in 776 images. Transfer learning techniques were then used for model pre-training. Figure 5 As shown, Figure 5This is a flowchart illustrating the data augmentation process in this embodiment. For data augmentation, three methods—brightness transformation, contrast transformation, and saturation transformation—are used to augment the training dataset. Regarding transfer learning, the weights of the pre-trained model are transferred to the instance segmentation model of this embodiment, enabling satisfactory results with a relatively small training dataset. Through these strategies, the model exhibits better robustness and generalization performance, meeting the needs of practical application scenarios.

[0140] In the specific implementation process, the training model software environment included: Windows 10 operating system, CUDA 9.0, CuDNN 7.4, PyTorch 1.2.0 and PyCharm, and Python 3.7. The hardware environment included: an AMD Ryzen 5 2600X processor, a GeForce GTX Titan X graphics card, 64.0GB of RAM, 160.0GB of hard drive space, and Windows 11 operating system.

[0141] The directory structure of the dataset involved in this invention is as follows: Figure 6 As shown, create a folder named "Dataset" in the root directory. This folder contains two subfolders: "Annotations" and "Image". "Image" and "Annotations" store the corresponding images and JSON annotation files, respectively. These subfolders are further divided into training and testing sets. This directory structure facilitates quick access to dataset information and effectively supports data preprocessing, model training, and testing processes.

[0142] In this invention, the Mask Scoring R-CNN model, which incorporates the SE attention mechanism, was trained using 776 images and tested using 86 images. Precision, recall, and F1 score were used as evaluation metrics. Precision and recall are crucial indicators of the model's detection and segmentation performance, while the F1 score reflects the balance between precision and recall. Higher values ​​for precision, recall, and F1 score indicate better model performance. These metrics provide a comprehensive evaluation of the model's performance.

[0143] Table 1. Urine formed element instance segmentation results of the traditional Mask Scoring R-CNN model.

[0144]

[0145] The segmentation results of the traditional Mask Scoring R-CNN model on the test set are shown in Table 1. This model detected 1037 formed elements in urine across four categories: Squamous, WBC, CaOX, and RBC. Of these, 930 were correctly detected, resulting in an accuracy of 50.08%, a recall of 91.16%, and an F1 score of 0.6421. Specifically, in the RBC category, the model detected 401 RBCs, correctly detecting 160, achieving an accuracy of 39.90%, a recall of 98.16%, and an F1 score of 0.5674.

[0146] The traditional Mask Scoring R-CNN model provides visualization results of urine formation and segmentation, as shown below. Figure 7 , Figure 8 , Figure 9 and Figure 10 As shown in the figure, the top left corner of each detection box is labeled CLS and MS, where CLS represents the classification score and MS represents the segmentation quality score. Based on the classification and segmentation scores, the traditional Mask Scoring R-CNN model performs well in segmenting urine formed elements. However, due to objective factors such as cell adhesion and background impurities in urine formed elements, problems such as missed detections, false detections, and uneven segmentation occur, which are indicated by red arrows in the figure. For example, Figure 7 , Figure 8 , Figure 9 and Figure 10 Both have the problem of missed detection. Figure 8 In addition to missed detections, a CaOX instance also exhibited an issue where two detection frames were displayed. Figure 10 The model exhibited an issue of uneven segmentation boundaries. Therefore, the Mask Scoring R-CNN model still has room for improvement in its performance regarding urine segmentation.

[0147] Table 2. Urine formed element instance segmentation results of the Mask Scoring R-CNN model integrating SE attention mechanism of this invention.

[0148]

[0149] Traditional Mask Scoring R-CNN models still suffer from poor segmentation quality and low scores, leading to missed detections. This invention proposes a Mask Scoring R-CNN model incorporating the SE attention mechanism. As shown in Table 2, this model detected 1037 urine formed elements (Squamous, WBC, CaOX, and RBC), correctly identifying 938 of them, achieving an accuracy of 59.70%, a recall of 92.09%, and an F1 score of 0.7244. The addition of the SE attention mechanism significantly improves the visualized segmentation results and reduces missed detections.

[0150] The present invention provides a visualization of urine formed parts segmentation results using a Mask Scoring R-CNN model that integrates SE attention mechanism, as shown below. Figure 11 , Figure 12 , Figure 13 and Figure 14 As shown, each detection box has labels CLS and MS in the upper left corner. CLS represents its classification score, and MS represents its segmentation quality score. Based on the classification and segmentation scores, the Mask Scoring R-CNN model with SE attention mechanism in this invention achieves satisfactory results in urine formed element segmentation. Compared to the traditional Mask Scoring R-CNN model, the model with added SE attention mechanism effectively improves performance in areas such as missed detections, duplicate detections, and the smoothness of segmentation boundaries. These improvements are indicated by red arrows in the figure. For example, Figure 11 , Figure 12 , Figure 13 and Figure 14 As shown, the missed detection problem in the model without SE attention is improved. Figure 12 In a CaOX, there is only one detection box, and there will be no repeated detection and segmentation. Figure 14 The segmentation boundaries also became smoother, thus verifying the effectiveness of the model in incorporating the SE attention mechanism.

[0151] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.

[0152] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this application, "N" means at least two, such as two, three, etc., unless otherwise explicitly specified.

[0153] Any process or method description in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or more N executable instructions for implementing custom logic functions or processes, and the scope of the preferred embodiments of this application includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order according to the functions involved, as should be understood by those skilled in the art to which the embodiments of this application pertain.

[0154] It should be understood that the various parts of this application can be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods can be implemented using software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented using any of the following techniques known in the art, or a combination thereof: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (FPGAs), field-programmable gate arrays (FPGAs), etc.

[0155] Those skilled in the art will understand that all or part of the steps of the methods described in the above embodiments can be implemented by a program instructing related hardware, and the program can be stored in a computer-readable storage medium. When executed, the program includes one or a combination of the steps of the method embodiments.

[0156] Obviously, the above embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the implementation of the present invention. Those skilled in the art can make other variations or modifications based on the above description. It is neither necessary nor possible to exhaustively describe all embodiments here. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the claims of the present invention.

Claims

1. A method for segmenting formed elements in urine, characterized in that, include: Construct a urine formed element labeled image dataset, and divide the urine formed element labeled image dataset into a training set and a test set; Using the Mask Scoring R-CNN model as the base model, an instance segmentation model incorporating the SE attention mechanism is trained using the training set. The labeled urine formed elements data from the test set are input into the trained instance segmentation model to perform instance segmentation and obtain the urine formed element segmentation results. The instance segmentation model includes a backbone feature network, a candidate region extraction network, a masking header network, and a segmentation quality scoring network that integrate the SE attention mechanism. The backbone feature network utilizes a residual network and a feature pyramid network that fuse SE attention mechanisms to extract feature maps from the input image, and extracts feature maps from these feature maps to obtain a size of [ C , H , W The first feature map; the expression for extracting the feature map from the feature map is as follows: in, Represents the first feature map. For filter kernel set, For the input image, This represents the convolution operation. C , H and W These represent the number of channels, height, and width of the first feature map, respectively. ; The first feature map undergoes SE attention operations, specifically including: Perform a compression operation on the first feature map to obtain a size of [ C The second feature map of [1,1] is expressed as follows: in, Represents the second feature map; The second feature map is activated, and the resulting feature vector is passed through a fully connected neural network to obtain the channel weights of the second feature map, as shown in the following expression: in, This represents the channel weights of the second feature map. For the sigmoid function, and These are the parameters of the fully connected layer; The channel weights of the first and second feature maps are multiplied by a dot product to obtain the fused SE attention feature map, as shown in the following expression: in, P To fuse SE attention feature maps, This represents the dot product operation; The candidate region extraction network is used to extract target candidate regions of urine formed elements based on the feature mapping of the input fusion attention. The masking network is used to detect, classify, and segment the target candidate region to obtain a predicted segmentation mask. The segmentation quality scoring network is used to score the quality of the predicted segmentation mask and select the optimal urine formed element segmentation result based on the scoring results.

2. The method for segmenting urine formed elements according to claim 1, characterized in that, The candidate region extraction network extracts target candidate regions of formed elements in urine based on the feature mapping of the input fused attention. The specific steps include: After performing a convolution operation on the feature map fused with attention using a 3×3 convolution kernel, two 1×1 fully connected layers are used sequentially to perform classification and regression operations on the convolution operation results to obtain redundant candidate regions. The Soft-NMS method is used to filter out the target candidate region from the redundant candidate regions.

3. The method for segmenting urine formed elements according to claim 1, characterized in that, The masking network utilizes three fully connected networks to perform instance detection and classification, instance segmentation, and segmentation scoring operations on the target candidate region, respectively, to obtain the predicted segmentation mask; the loss function of the masking network... The expression is as follows: in, For border detection loss, For classifying losses, For segmentation mask loss; Border detection loss use The loss function is expressed as follows: in, This represents the mask predicted by the terminal network. A mask representing the actual annotation; Classification loss The cross-entropy loss for all classes is expressed as follows: in, To predict the score value for the category, Indicates category index, Represents the total number of categories. For the first j The true category score; Segmentation mask loss The expression is as follows: in, For classifying losses, The loss of the segmented quality scoring network.

4. The method for segmenting urine formed elements according to claim 1, characterized in that, The segmentation quality scoring network scores the predicted segmentation mask and selects the optimal urine formed segmentation result based on the score. The specific steps include: Max pooling is applied to the predicted segmentation mask to obtain a third feature map of size 14×14×1. The predicted segmentation mask is processed by RoIAlign to obtain a fourth feature map of size 14×14×256; The third and fourth feature maps are added together to obtain the fifth feature map; The fifth feature map is input into the segmentation quality scoring network for classification, and the mask IoU value of the segmentation quality scoring network is calculated based on the classification results. Based on the mask IoU value, the predicted segmentation mask is scored for quality, and the optimal urine formed element segmentation result is selected based on the scoring results.

5. The method for segmenting urine formed elements according to claim 4, characterized in that, use L 2. The loss function regression calculates the mask IoU value of the segmentation quality scoring network, and its expression is as follows: in, This represents the mask predicted by the terminal network. The mask representing the actual annotation.

6. The method for segmenting urine formed elements according to claim 1, characterized in that, The specific steps for constructing the labeled image dataset of urine formed elements include: Microscopic images of a specified area from different individuals under different lighting conditions were collected using the RZ1100 fully automated urine formed element analyzer. The LABELME software was used to perform preliminary instance segmentation and annotation of different types of cells and formed elements in the microscope images, resulting in a dataset of annotated images of urine formed elements.

7. The method for segmenting urine formed elements according to claim 1, characterized in that, After dividing the urine formed element labeled image dataset into a training set and a test set, the method further includes: performing augmentation processing on the training set using brightness transformation, saturation transformation and contrast transformation methods.

8. The method for segmenting urine formed elements according to claim 1, characterized in that, When training the instance segmentation model, the following parameters are set for the instance segmentation model: Set the number of output categories for the instance segmentation model to 4; Set the aspect ratio of the preset anchor points of the candidate region extraction network to [0.5, 1.2]; Set the size of the anchor box to [32, 64, 128, 256, 384]; The initial learning rate of the instance segmentation model is set to 0.001, the learning rate decay factor is set to 0.0001, and the maximum number of iterations is set to 1000. The learning rate is updated and the instance segmentation model is saved once every 1000 iterations.