A multi-modal mirror image segmentation method based on boundary perception
By adding a coarse prediction module, a receptive field-enhancing convolutional layer, and a boundary refinement perception module to the mirror image segmentation model, the problem of boundary blurring in mirror image segmentation is solved, and higher-precision mirror region extraction is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- YANGZHOU UNIV
- Filing Date
- 2022-11-10
- Publication Date
- 2026-06-19
AI Technical Summary
Existing mirror image segmentation models lack further refinement of boundary regions, leading to blurred boundaries and guidance of erroneous regions, which affects segmentation accuracy.
A boundary-aware multimodal mirror image segmentation method is adopted. This method extracts deep features by adding a coarse prediction module to the backbone network, expands the receptive field using receptive field augmentation convolutional layers, and extracts regions and boundary maps layer by layer using a boundary refinement perception module. A loss function is then constructed to update the model and improve segmentation accuracy.
It effectively improves the accuracy of mirror image segmentation, enabling more complete extraction of mirror regions, reducing boundary blurring and erroneous areas, and improving the accuracy of image segmentation.
Smart Images

Figure CN115937515B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of computer vision and digital image processing, specifically to a boundary-aware multimodal mirror image segmentation method. Background Technology
[0002] Image segmentation is a technique and process that divides an image into multiple regions with different properties by classifying each pixel in the image and extracting the target of interest. Image segmentation has been widely used in traffic control systems, face recognition, satellite image analysis and other fields.
[0003] As a common object in daily life, segmenting mirrors from a scene is a very challenging task in computer vision. Due to the characteristics of mirrors reflecting light, the content displayed on the mirror surface in an image is a mirror image of an object. As a result, the system cannot directly distinguish the mirror image from the actual background object from the image. Existing methods rely on the contextual association in the image and the reflection information of the mirror to identify mirrors. In practice, due to the randomness of mirror placement and the randomness of objects in the scene, computer vision systems are easily misled by this information, leading to system failure.
[0004] By reviewing existing image segmentation techniques, we found that mirrors have more significant features in depth images. When using ToF technology to capture the depth of various points in a scene, depth cameras cannot obtain the true depth of the mirror surface like they can for objects of other materials. In the depth image, the mirror edge will show obvious image discontinuities. Therefore, we combine the depth image and input it into the depth branch of the image segmentation model to help improve the accuracy of image segmentation.
[0005] Existing neural networks achieve image segmentation by learning from training images. Among them, the method of fusing boundary features has been widely used in segmentation tasks. However, directly applying it to mirror image segmentation fails to achieve satisfactory results for the following reasons: the boundary is too similar to the background, and inaccurate boundaries cause the model to segment incorrect regions. In this invention, we propose a boundary-aware multimodal mirror image segmentation method, which can effectively achieve mirror image segmentation in different scenarios. Summary of the Invention
[0006] The purpose of this section is to outline some aspects of embodiments of the present invention and to briefly describe some preferred embodiments. Simplifications or omissions may be made in this section, as well as in the abstract and title of this application, to avoid obscuring the purpose of these documents; however, such simplifications or omissions should not be construed as limiting the scope of the invention.
[0007] In view of the above-mentioned problems, the present invention is proposed.
[0008] Therefore, the technical problem solved by the present invention is that the existing mirror image segmentation model lacks further refinement of the boundary region, which leads to the blurring and missing of the boundary, and even guides the model to learn the wrong target region.
[0009] To address the aforementioned technical problems, this invention provides the following technical solution: a boundary-aware multimodal mirror image segmentation method, comprising: acquiring a mirror image segmentation model based on boundary awareness; adding and using a coarse prediction module to extract deep features of the backbone network to obtain a coarse prediction map of the mirror image; expanding the receptive field of deep features by adding and using receptive field-enhancing convolutional layers; extracting multi-layer features of the depth image based on a multimodal feature fusion module; extracting region maps and boundary maps layer by layer using a boundary refinement awareness module; guiding the region maps to undergo constraint refinement through the boundary maps to obtain a mirror image segmentation map; inputting the mirror image segmentation map into the mirror image segmentation model and sampling it to the original image size; constructing a loss function to calculate the error between the prediction map and the ground truth map; and updating the mirror image segmentation model in reverse; inputting the mirror image segmentation map into the updated mirror image segmentation model to obtain and output the corresponding mirror image segmentation prediction map.
[0010] As a preferred embodiment of the boundary-aware multimodal mirror image segmentation method described in this invention, the use of the backbone network includes:
[0011] The backbone network adopts the ResNet-50 network;
[0012] Remove the last fully connected layer in the ResNet-50 network, and pass the RGB image S0 through the backbone network to obtain the output features S1, S2, S3, S4, and S5 of each layer.
[0013] As a preferred embodiment of the boundary-aware multimodal mirror image segmentation method described in this invention, the extraction of multi-layer features from the depth image includes:
[0014] In the multimodal feature fusion module, the depth image D0 is passed through four convolutional layers to obtain the output features D1, D2, D3, and D4 of each layer;
[0015] The output feature D2 and the second layer output feature S2 of the backbone network are concatenated along the channel dimension, and the concatenated outputs are input into the average pooling branch and the max pooling branch, respectively.
[0016] The input concatenated in the average pooling branch is passed through a global average pooling layer and two convolutional layers to obtain the average pooling features. The input concatenated in the max pooling branch is passed through a global max pooling layer and two convolutional layers to obtain max pooling features.
[0017] By the average pooling feature and the max pooling feature After addition, the fused feature f2 is obtained by multiplying the concatenated output by the Sigmoid activation layer;
[0018] The calculation of the fusion feature f2 includes,
[0019]
[0020] Where Cat(·,·) represents splicing along the channel dimension;
[0021] The fused feature f2 is input into three parallel branches, and the outputs are respectively... The features are stitched together along the channel dimension and passed through a convolutional layer to obtain the multimodal fusion feature F2.
[0022] The calculation of the multimodal fusion feature F2 includes,
[0023]
[0024]
[0025]
[0026]
[0027] Among them, Conv m×n,k (·) represents a convolutional layer with a kernel size of m×n and a number of output channels of k;
[0028] The method of repeatedly acquiring multimodal fusion feature F2 is used to obtain two other multimodal fusion features F3 and F4.
[0029] As a preferred embodiment of the boundary-aware multimodal mirror image segmentation method described in this invention, the step of expanding the receptive field of deep features includes:
[0030] The output feature S5, after passing through a convolutional layer, is divided into four equal parts according to the number of channels, and simultaneously input into the first branch, the second branch, the third branch, and the fourth branch.
[0031] The first branch's features are convolved and output. In the second branch, the output of the first branch is divided into two parts and then convolved with the features of the second branch.
[0032] The calculation of channel splicing and convolution includes,
[0033]
[0034] in, This indicates the output of the second branch. This indicates that the output of the first branch is one of the two equal parts.
[0035] After dividing the output of the second branch into two parts, channel concatenation and convolution are performed with the features of the third branch, while the features of the fourth branch remain unchanged at the output.
[0036] By concatenating the outputs of the four branches, the expanded receptive field feature is obtained.
[0037] The receptive field enhancement feature The calculations include,
[0038]
[0039] in, This indicates that the output of the first branch is one of the two parts. This indicates that the output of the second branch is one of the two parts. This indicates the output of the third branch.
[0040] As a preferred embodiment of the boundary-aware multimodal mirror image segmentation method described in this invention, the acquisition of the coarse prediction map of the mirror image includes:
[0041] The fifth layer output S5 is input into the boundary branch and the prediction branch respectively;
[0042] In the boundary branch, the features obtained by passing S5 through a receptive field-enhanced convolutional layer are... The perceptual boundary map E5 is obtained after passing through a convolutional layer;
[0043] In the prediction branch, the output features obtained by passing S5 through a convolutional layer and After channel splicing, and then through four convolutional layers, a rough prediction map R5 is obtained.
[0044] As a preferred embodiment of the boundary-aware multimodal mirror image segmentation method described in this invention, the extraction of the region map includes:
[0045] In the fourth layer, the perceptual boundary map E5 and the coarse prediction map R5 are sampled to obtain... and
[0046] rough prediction chart The result of subtracting 1 from the output of the input Sigmoid activation layer is multiplied by the output of the multimodal feature fusion feature F4 after passing through a convolutional layer to obtain the region input feature.
[0047] The region input features The calculations include,
[0048]
[0049] The multimodal feature fusion feature F4 is passed through a convolutional layer and a coarse prediction map. After concatenation, the boundary input features are obtained.
[0050] The boundary input features The calculations include,
[0051]
[0052] In the region branch, the boundary input features Guide the input features of the region Perform feature extraction. and Channel stitching is performed, followed by two convolutional layers and a coarse prediction map. The summation yields the predicted region map R4;
[0053] The calculation of the predicted region map R4 includes,
[0054]
[0055] Repeat the process of obtaining the predicted region map R4 to obtain the updated predicted region maps R2 and R3.
[0056] As a preferred embodiment of the boundary-aware multimodal mirror image segmentation method described in this invention, the boundary map extraction includes:
[0057] In the boundary branch, the boundary input features After passing through one convolutional layer, the predicted boundary map E4 is obtained;
[0058] The calculation of the predicted boundary map E4 includes,
[0059]
[0060] Repeat the process of obtaining the predicted boundary map E4 to obtain the updated predicted boundary maps E2 and E3.
[0061] As a preferred embodiment of the boundary-aware multimodal mirror image segmentation method described in this invention, the calculation of sampling to the original image size includes,
[0062]
[0063]
[0064] in, This represents the prediction region map of the i-th layer. Let UpSample(x,y) represent the prediction boundary map of the i-th layer, and UpSample(x,y) represent upsampling x to the same size as y.
[0065] As a preferred embodiment of the boundary-aware multimodal mirror image segmentation method described in this invention, the calculation of the loss function includes:
[0066] L bce+iou =l bce +l iou
[0067] Among them, l bce Let l represent the binary cross-entropy function. iou This represents a cross-joint function.
[0068] As a preferred embodiment of the boundary-aware multimodal mirror image segmentation method described in this invention, wherein: the predicted region map... The mirror image segmentation prediction map can be obtained by activating the Sigmoid layer and then outputting the result.
[0069] The beneficial effects of this invention are as follows: This invention provides a boundary-aware multimodal mirror image segmentation method. It uses a multimodal feature fusion module to extract multi-layer features from a depth image based on an RGB image as input and performs feature fusion. This effectively utilizes the characteristics of mirrors in the depth image, enhancing multi-scale semantic and spatial information, and helping the network to extract the region where the mirror is located more completely. A coarse prediction module is added and used to extract deep features from the backbone network, obtaining a coarse prediction map of the mirror image. Receptive field-enhancing convolutional layers are used to expand the receptive field of deep features, improving the ability to extract multi-scale semantic information. A boundary refinement awareness module is used to extract region maps and boundary maps layer by layer. The boundary map guides the region map for constraint refinement to obtain a mirror image segmentation map, effectively improving the accuracy of image segmentation. Attached Figure Description
[0070] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. Wherein:
[0071] Figure 1 A schematic diagram of the overall process of a boundary-aware multimodal mirror image segmentation method provided by the present invention;
[0072] Figure 2 This is a schematic diagram of the multimodal feature fusion module in a boundary-aware multimodal mirror image segmentation method provided by the present invention;
[0073] Figure 3 This is a schematic diagram of the receptive field augmented convolutional layer in a boundary-aware multimodal mirror image segmentation method provided by the present invention.
[0074] Figure 4 This is a schematic diagram of the coarse prediction module in a boundary-aware multimodal mirror image segmentation method provided by the present invention;
[0075] Figure 5 This is a schematic diagram of the boundary refinement perception module in a boundary-aware multimodal mirror image segmentation method provided by the present invention;
[0076] Figure 6 Example images of the output results of each side in a boundary-aware multimodal mirror image segmentation method provided by this invention: a-input image, b-ground value annotation, c-predicted region map. d-Prediction Area Map e-Prediction Area Map f-Prediction region map g-Prediction Boundary Map Detailed Implementation
[0077] To make the above-mentioned objects, features, and advantages of the present invention more apparent and understandable, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the protection scope of the present invention.
[0078] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.
[0079] Secondly, the term "one embodiment" or "embodiment" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the present invention. The phrase "in one embodiment" appearing in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that is mutually exclusive with other embodiments.
[0080] This invention is described in detail with reference to the schematic diagrams. When detailing the embodiments of this invention, for ease of explanation, the cross-sectional views illustrating the device structure may be partially enlarged, not adhering to the usual scale. Furthermore, the schematic diagrams are merely examples and should not be construed as limiting the scope of protection of this invention. In actual fabrication, the three-dimensional spatial dimensions of length, width, and depth should be included.
[0081] Furthermore, in the description of this invention, it should be noted that the terms "upper," "lower," "inner," and "outer," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. These terms are used solely for the convenience of describing the invention and for simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on the invention. In addition, the terms "first," "second," or "third" are used for descriptive purposes only and should not be construed as indicating or implying relative importance.
[0082] Unless otherwise explicitly specified and limited, the terms "installation," "connection," and "joining" in this invention should be interpreted broadly. For example, they can refer to fixed connections, detachable connections, or integral connections; similarly, they can refer to mechanical connections, electrical connections, or direct connections, or indirect connections through an intermediate medium, or internal connections between two components. Those skilled in the art can understand the specific meaning of the above terms in this invention based on the specific circumstances.
[0083] Example 1
[0084] Reference Figures 1-6 As one embodiment of the present invention, a boundary-aware multimodal mirror image segmentation method is provided, comprising:
[0085] S1: A boundary-aware mirror image segmentation model is used. A coarse prediction module is added to the backbone network to extract deep features, resulting in a coarse prediction map of the mirror image. It should be noted that:
[0086] The backbone network uses a ResNet-50 network, such as Figure 1 As shown, the last fully connected layer in the ResNet-50 network is removed, and the RGB image S0 is passed through the backbone network to obtain the output features S1, S2, S3, S4, and S5 of each layer.
[0087] S2: A multi-modal feature fusion module extracts multi-layer features from depth images and fuses them with RGB image features, enhancing multi-scale semantic and spatial information. It should be noted that:
[0088] like Figure 1 As shown, in the multimodal feature fusion module, the depth image D0 is passed through four convolutional layers to obtain the output features D1, D2, D3, and D4 of each layer, denoted as follows:
[0089] D1 = Conv 3×3,64 (D0)
[0090] D2 = Conv 3×3,64 (D1)
[0091] D3 = Conv 3×3,64 (D2)
[0092] D4 = Conv 3×3,64 (D3)
[0093] Among them, Conv m×n,k (·) represents a convolutional layer with a kernel size of m×n and a number of output channels of k;
[0094] Furthermore, such as Figure 2 As shown, the output feature D2 and the second layer output feature S2 of the backbone network are concatenated along the channel dimension, and the concatenated outputs are input into the average pooling branch and the max pooling branch, respectively.
[0095] The input concatenated in the average pooling branch is passed through a global average pooling layer and two convolutional layers to obtain the average pooling features. Average pooling characteristics The calculations include,
[0096]
[0097] Where GlobalAvg(·) represents global average pooling, and Cat(·,·) represents concatenation along the channel dimension;
[0098] The input concatenated in the max pooling branch is passed through a global max pooling layer and two convolutional layers to obtain max pooling features. Max pooling features The calculations include,
[0099]
[0100] Where GlobalMax(·) represents global max pooling;
[0101] Furthermore, by using average pooling features and max pooling features After addition, the concatenated outputs are multiplied by the Sigmoid activation layer to obtain the fused feature f2. The calculation of the fused feature f2 includes...
[0102]
[0103] The fused feature f2 is input into three parallel branches, and the outputs are as follows: The data is concatenated along the channel dimension and passed through a convolutional layer to obtain the multimodal fusion feature F2. The calculation of the multimodal fusion feature F2 includes...
[0104]
[0105]
[0106]
[0107]
[0108] Furthermore, by repeatedly acquiring the multimodal fusion feature F2, two additional multimodal fusion features F3 and F4 are obtained.
[0109] S3: By adding and using receptive field-enhancing convolutional layers, the receptive field of deep features is expanded, thereby improving the ability to extract multi-scale semantic information. It should be noted that:
[0110] like Figure 3 As shown, the output feature S5, after passing through a convolutional layer, is divided into four equal parts according to the number of channels. These parts are then input into the first branch, second branch, third branch, and fourth branch, as shown below.
[0111] [s1,s2,s3,s4]=split(Conv 1×1,64 (S5))
[0112] Where split(·) means to split along the channel dimension;
[0113] The output after convolving the features of the first input branch is represented as follows:
[0114]
[0115] After the second branch divides the output of the first branch into two equal parts, the features of the second branch are concatenated and convolved, as shown below.
[0116]
[0117] in, This indicates the output of the second branch. This indicates that the output of the first branch is one of the two equal parts.
[0118] After dividing the output of the second branch into two equal parts, it is concatenated with the features of the third branch and then convolved, as shown below.
[0119]
[0120] The characteristics of the fourth branch remain unchanged in the output.
[0121] Furthermore, by concatenating the outputs of the four branches, the expanded receptive field feature is obtained. Sensitive field enhancement features The calculations include,
[0122]
[0123] in, This indicates that the output of the first branch is one of the two parts. This indicates that the output of the second branch is one of the two parts. This indicates the output of the third branch.
[0124] S4: The boundary refinement perception module extracts the region map and boundary map layer by layer. The boundary map guides the region map for constraint refinement to obtain the mirror image segmentation map. It should be noted that:
[0125] like Figure 4 As shown, the output S5 of the fifth layer is input into the boundary branch and the prediction branch respectively. In the boundary branch, the features obtained by passing S5 through the receptive field augmenting convolutional layer are... After passing through a convolutional layer, the perceptual boundary map E5 is obtained. In the prediction branch, the output features obtained by passing S5 through a convolutional layer are combined with... After channel splicing, and then through four convolutional layers, a rough prediction map R5 is obtained;
[0126] Furthermore, such as Figure 5 As shown, in the fourth layer, the perceptual boundary map E5 and the coarse prediction map R5 are sampled to obtain... and Represented as,
[0127]
[0128]
[0129] UpSample(x,y) means upsampling x to the same size as y, using bilinear interpolation as the upsampling algorithm;
[0130] Furthermore, a rough prediction map The result of subtracting 1 from the output of the input Sigmoid activation layer is multiplied by the output of the multimodal feature fusion feature F4 after passing through a convolutional layer to obtain the region input feature. Region Input Features The calculations include,
[0131]
[0132] The multimodal features are fused into a F4 feature layer, which is then passed through a convolutional layer and a coarse prediction map. After concatenation, the boundary input features are obtained. Boundary input features The calculations include,
[0133]
[0134] It should be noted that in the regional branch, the boundary input features Guided region input features Perform feature extraction. and Channel stitching is performed, followed by two convolutional layers and a coarse prediction map. The summation yields the predicted region map R4. The calculation of the predicted region map R4 includes:
[0135]
[0136] It should be noted that in the boundary branch, the boundary input features After passing through a convolutional layer, the predicted boundary map E4 is obtained. The calculation of the predicted boundary map E4 includes...
[0137]
[0138] Furthermore, the process of repeatedly acquiring the predicted region map R4 is used to acquire updated predicted region maps R2 and R3, and the process of repeatedly acquiring the predicted boundary map E4 is used to acquire updated predicted boundary maps E2 and E3.
[0139] S5: Input the mirror image segmentation map into the mirror image segmentation model, sample it to the original image size, construct a loss function to calculate the error between the predicted map and the ground truth map, and update the mirror image segmentation model in reverse. Input the mirror image segmentation map into the updated mirror image segmentation model to obtain and output the corresponding mirror image segmentation prediction map. It should be noted that:
[0140] The calculation of sampling the predicted region maps R2-R5 and the predicted boundary maps E2-E5 to the original image size includes,
[0141]
[0142]
[0143] in, This represents the prediction region map of the i-th layer. Let UpSample(x,y) represent the prediction boundary map of the i-th layer, and UpSample(x,y) represent upsampling x to the same size as y.
[0144] The calculation of the loss function includes,
[0145] L bce+iou =l bce +l iou
[0146] Among them, l bce Let l represent the binary cross-entropy function. iou Indicates a cross-joint function;
[0147] It should be noted that the calculation of the binary cross-entropy function includes,
[0148]
[0149] Where G(r,c)∈{0,1} represents the pixel value of the true value, and P(r,c) represents the probability map of the mirror prediction region;
[0150]
[0151] Furthermore, the predicted area map will be... After activating the Sigmoid layer, the output can be used to obtain the mirror image segmentation prediction map.
[0152] It should be noted that this invention provides a boundary-aware multimodal mirror image segmentation method. This method uses a multimodal feature fusion module to extract multi-layer features from the depth image based on an RGB image as input and performs feature fusion. This effectively utilizes the characteristics of mirrors in the depth image, enhancing multi-scale semantic and spatial information, and helping the network to extract the region where the mirror is located more completely. A coarse prediction module is added and used to extract deep features from the backbone network, obtaining a coarse prediction map of the mirror image. Receptive field-enhancing convolutional layers are used to expand the receptive field of deep features, improving the ability to extract multi-scale semantic information. A boundary refinement awareness module is used to extract region maps and boundary maps layer by layer. The boundary map guides the region map for constraint refinement to obtain a mirror image segmentation map, effectively improving the accuracy of image segmentation.
[0153] Example 2
[0154] This embodiment differs from the first embodiment in that it provides a verification test for a boundary-aware multimodal mirror image segmentation method, to verify and illustrate the technical effectiveness of the method.
[0155] Table 1 lists comparative data for image segmentation methods proposed in recent years. Upward-pointing arrows indicate that a higher performance metric is better, such as IoU. Arrows pointing downwards indicate that the lower the value of the performance metric, the better, such as MAE and BER.
[0156] ①IoU is a standard used to measure the accuracy of segmenting corresponding objects, expressed as,
[0157]
[0158] Among them, TP, FP and FN represent the counts of true positives, false positives and false negatives, respectively;
[0159] ② It is a comprehensive measurement indicator, defined as the weighted harmonic mean of precision and recall, expressed as,
[0160]
[0161] β is usually taken as 0.3;
[0162] ③ The MAE index is calculated as the average of the absolute errors between the predicted and actual values in the forecast chart, expressed as:
[0163]
[0164] Where H represents the height of the prediction map, W represents the width of the prediction map, (x,y) represents the position of the current pixel, P(x,y) represents the prediction map, and G(x,y) represents the ground truth map.
[0165] ④BER is a metric used to evaluate image segmentation performance when the test set is imbalanced (positive and negative examples are not aligned). It is expressed as:
[0166]
[0167] Generally speaking, the higher the IoU, Lower MAE and BER indicate better performance.
[0168] Table 1: Results of the comparative experiment.
[0169]
[0170] As shown in Table 1, the best performance was almost all achieved based on the network model of our technical solution, which verifies that our technical solution can accurately locate mirrors in similar backgrounds. Moreover, the boundary-aware multimodal mirror image segmentation method can use the boundary map to guide the region map for constraint refinement to obtain a smoother mirror image segmentation map, which effectively improves the accuracy of mirror image segmentation.
[0171] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.
Claims
1. A method for multi-modal mirror image segmentation based on boundary-aware, the method comprising: include: A boundary-aware mirror image segmentation model is used. A coarse prediction module is added to the backbone network to extract deep features of the backbone network, resulting in a coarse prediction map of the mirror image. The receptive field of deep features is expanded by adding and using receptive field-enhancing convolutional layers, and multi-layer features of depth images are extracted based on a multimodal feature fusion module. The region map and boundary map are extracted layer by layer using the boundary refinement perception module. The boundary map guides the region map to perform constraint refinement to obtain a mirror image segmentation map. The mirror image segmentation map is input into the mirror image segmentation model and sampled to the original image size. A loss function is constructed to calculate the error between the predicted map and the ground truth map, and the mirror image segmentation model is updated in reverse. The mirror image segmentation map is input into the updated mirror image segmentation model to obtain and output the corresponding mirror image segmentation prediction map; The extraction of multi-layer features from the depth image includes, In the multi-modal feature fusion module, the depth image After four convolutional layers, the output features of each layer are obtained ; The output features and the second layer output features of the backbone network The concatenation is performed along the channel dimension, and the concatenated outputs are then input into the average pooling branch and the max pooling branch, respectively. The input spliced in the average pooling branch passes through a global average pooling layer and two convolutional layers to obtain average pooling features The input spliced in the maximum pooling branch passes through a global maximum pooling layer and two convolutional layers to obtain maximum pooling features ; By the average pooling feature and the max pooling feature After addition, after The fusion feature is obtained by multiplying the activation layer and the concatenated output. ; The fusion features The calculation includes, , wherein represents splicing along the channel dimension; The fusion features The outputs of the three parallel branches are as follows: , , The features are stitched together along the channel dimension and then passed through a convolutional layer to obtain multimodal fusion features. ; The multi-modal fusion feature The calculation includes, , , , , wherein, represents a convolutional layer with a convolution kernel size of , and an output channel number of k; Repeatedly acquire multimodal fusion features The method yields two additional multimodal fusion features. , ; The steps of the receptive field-enhanced convolutional layer include: output features The features after passing through a convolutional layer are evenly divided into 4 parts according to the number of channels, and are input into the first branch, the second branch, the third branch and the fourth branch. The first branch's features are convolved and output. In the second branch, the output of the first branch is divided into two parts and then convolved with the features of the second branch. The calculation of the channel splicing and convolution includes, , wherein, represents the output of the second branch, represents one output of the two equal parts of the output of the first branch; After dividing the output of the second branch into two parts, channel concatenation and convolution are performed with the features of the third branch, while the features of the fourth branch remain unchanged at the output. The extended receptive field after receptive field expansion features are obtained by channel splicing the outputs of the four branches and then outputting ; The receptive field augmentation feature The calculation includes, , wherein, represents one of the two equal parts of the output of the first branch, represents one of the two equal parts of the output of the second branch, represents the output of the third branch; The extraction of the region map includes, In the fourth layer, the perceptual boundary map and the coarse prediction map are sampled to obtain ; The sampled coarse prediction map enter The result of subtracting 1 from the output of the activation layer is fused with the multimodal features. The outputs after one convolutional layer are multiplied together to obtain the region input features. ; The regional input features The calculation includes, , fusing the multi-modal features through a convolutional layer and a coarse prediction map performing convolution after splicing to obtain boundary input features ; The boundary input features The calculation includes, , In the region branch, the boundary input feature The region input feature is guided Feature extraction is performed, And Channel splicing is performed, and two convolution layers and a rough prediction map are added to obtain a prediction region map ; The predicted region map The calculation includes, , repeating the process of acquiring a prediction region map acquiring an updated prediction region map , ; The extraction of the boundary map includes, In the boundary branch, the boundary input features After one convolutional layer, the predicted boundary map is obtained. ; The predicted boundary map The computation includes, , repeating the process of obtaining a prediction boundary map obtaining an updated prediction boundary map , .
2. The boundary-aware based multi-modal mirror image segmentation method of claim 1, wherein: The use of the backbone network includes The backbone network adopts the ResNet-50 network; The last fully connected layer in the ResNet-50 network is removed, and RGB images The output features of each layer are obtained through the backbone network 、 、 、 、 .
3. The boundary-aware multi-modal mirror image segmentation method according to any one of claims 1-2, characterized in that: The acquisition of the rough prediction map of the mirror image includes, output the fifth layer input the boundary branch and the prediction branch, respectively; In the boundary branch, Features obtained from the receptive field augmented convolutional layer The perceptual boundary map is obtained after passing through a convolutional layer. ; In the prediction branch, The output features obtained after one convolutional layer and After channel stitching and four convolutional layers, a rough prediction image is obtained. .
4. The boundary-aware multi-modal mirror image segmentation method of claim 3, wherein: The calculation of sampling to the original image size includes, , , in, Indicates the first i Predicted region map of the layer, Indicates the first i Predicted boundary map of the layer, Indicates will x Upsampling with y Same size.
5. The boundary-aware multi-modal mirror image segmentation method of claim 4, wherein: The calculation of the loss function includes, , wherein, denotes the binary cross-entropy function, denotes the cross-entropy joint function.
6. The boundary-aware multi-modal mirror image segmentation method of claim 5, wherein: The predicted region map By The mirror image segmentation prediction map is obtained after the layer is activated and output.