Image segmentation method considering boundary information, storage medium, and device

By constructing a multi-task learning image segmentation model that combines boundary learning, semantic segmentation, and feature alignment tasks, the problem of poor boundary information processing in existing technologies is solved, achieving image segmentation with higher accuracy and sensitivity.

CN118864841BActive Publication Date: 2026-06-16CHINA UNIV OF GEOSCIENCES (WUHAN)

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA UNIV OF GEOSCIENCES (WUHAN)
Filing Date
2024-06-28
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing image segmentation techniques suffer from low accuracy and insufficient sensitivity when processing boundary information, especially when the shapes are varied and the boundaries are blurred, which can easily lead to misjudgment.

Method used

A multi-task learning strategy is adopted to construct boundary learning tasks, semantic segmentation tasks, and boundary feature alignment tasks. Through feature extraction networks, feature fusion networks, and multi-task joint learning networks, combined with non-local attention networks and feature fusion networks, the image segmentation model is optimized, and the Canny operator is used to extract and align boundary features.

🎯Benefits of technology

It improves the accuracy and sensitivity of image segmentation, effectively processes boundary information, and enhances segmentation performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118864841B_ABST
    Figure CN118864841B_ABST
Patent Text Reader

Abstract

The application discloses an image segmentation method considering boundary information, a storage medium and equipment, and relates to the field of image processing.The method comprises the following steps: dividing a to-be-processed image into a training set, a verification set and a test set; constructing a boundary learning task, a semantic segmentation task and a boundary feature alignment task, and constructing a multi-task boundary optimization image segmentation model to jointly learn the multi-task; the model comprises a feature extraction network, a feature fusion network based on a non-local attention network and an attention feature fusion network, and a multi-task joint learning network; the model is trained by using the training set to obtain a trained model; the model weight of the trained model is verified by using the verification set to generate an evaluation result, the model weight parameter of the trained model is adjusted, and a verified model is obtained; and the performance of the verified model is tested by using the test set.The application effectively defines the boundary information and improves the segmentation performance.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing, and more particularly to an image segmentation method, storage medium, and device that takes into account boundary information. Background Technology

[0002] Image semantic segmentation is a crucial task in computer vision, aiming to assign each pixel in an image to a predefined semantic category, thereby achieving pixel-level understanding and analysis. Compared to object detection and image classification tasks, image semantic segmentation offers greater finer detail, providing more nuanced scene understanding and information extraction. It helps computers understand the objects or scenes represented by different regions in an image, enabling more accurate scene interpretation. However, most segmentation scenarios face challenges such as varying shapes and blurred boundaries. Accurately segmenting foreground objects and effectively handling boundary information from images remains a challenging problem.

[0003] In recent years, image segmentation technology has been widely used in fields such as autonomous driving, medical image analysis, and image editing and enhancement. Traditional segmentation methods mainly include thresholding, edge detection, clustering, and region growing. While these methods can achieve image segmentation to a certain extent, they have limitations, such as the need for manual threshold selection and sensitivity to noise and texture. With the development of deep learning, convolutional neural networks have become an important tool for image segmentation, achieving more accurate and efficient segmentation compared to traditional techniques. However, most current neural network models suffer from a large number of parameters, often employing a strategy of trading parameters for accuracy. For most image datasets, their shapes are varied, image quality is inconsistent, and the boundaries between images and surrounding tissues are blurred. The similar textures of these tissue structures can easily lead to misjudgments, resulting in current research on segmentation of such images being less stable and reliable.

[0004] Therefore, improving the accuracy and sensitivity of image segmentation while optimizing boundary processing capabilities are urgent problems to be solved. Summary of the Invention

[0005] The purpose of this invention is to address the problem of poor boundary processing capabilities in existing image segmentation techniques by proposing an image segmentation method that considers boundary information, comprising the following steps:

[0006] S1. Obtain the image to be segmented and divide the image into training set, validation set and test set;

[0007] S2. Based on a multi-task learning strategy, construct boundary learning task, semantic segmentation task, and boundary feature alignment task, and construct a multi-task boundary optimization image segmentation model to jointly learn multiple tasks;

[0008] Image segmentation models with multi-task boundary optimization include: feature extraction networks, feature fusion networks, and multi-task joint learning networks;

[0009] The feature extraction network is used to extract feature maps of different scales from the image input to the model; the feature fusion network is used to fuse the features of the extracted images of different scales; based on the image input to the model, feature maps of different scales, and the fused feature maps, the multi-task joint learning network generates segmentation feature maps and boundary feature maps, and aligns the segmentation feature maps and boundary feature maps.

[0010] S3. The multi-task boundary optimization image segmentation model is trained, validated, and tested using the training set, validation set, and test set to obtain the final image segmentation model.

[0011] S4. Input the image to be tested into the final image segmentation model to obtain the image segmentation result.

[0012] Furthermore, the feature extractor includes: a downsampling module and four residual modules;

[0013] The feature fusion network consists of three feature fusion subnetworks; each feature fusion subnetwork includes a nonlocal attention network and an attention feature fusion network.

[0014] The multi-task joint learning network includes: a hollow spatial pyramid pooling module, a Canny operator, a 1×1 convolution, three upsampling modules, and one downsampling module;

[0015] The image to be segmented is input into the downsampling module of the feature extractor to obtain a reduced-size image. The reduced-size image is input into the first residual module of the feature extractor to obtain the first feature map. The first feature map is input into the second residual module of the feature extractor to obtain the second feature map. The second feature map is input into the third residual module of the feature extractor to obtain the third feature map. The third feature map is input into the fourth residual module of the feature extractor to obtain the fourth feature map.

[0016] The third and fourth feature maps are input into the non-local attention network of the first feature fusion sub-network to obtain the fifth feature map. The fifth and third feature maps are input into the attention feature fusion network of the first feature fusion sub-network to obtain the first fused feature map. The second and first fused feature maps are input into the non-local attention network of the second feature fusion sub-network to obtain the sixth feature map. The sixth and second feature maps are input into the attention feature fusion network of the second feature fusion sub-network to obtain the second fused feature map. The first and second fused feature maps are input into the non-local attention network of the third feature fusion sub-network to obtain the seventh feature map. The seventh and first feature maps are input into the attention feature fusion network of the third feature fusion sub-network to obtain the third fused feature map.

[0017] Through the boundary learning task, the image to be segmented is processed by the Canny operator of the multi-task joint learning network to extract the boundary features of the image. After the boundary features of the image are processed by the downsampling module of the multi-task joint learning network, they are merged with the reduced-size image obtained by the feature extractor to obtain the merged features. The merged features are processed by the 1×1 convolution of the multi-task joint learning network to obtain the boundary feature map. The boundary feature map is then upsampled by the first upsampling module of the multi-task joint learning network to obtain the boundary prediction result.

[0018] Through the semantic segmentation task, the fourth feature map is input into the hollow spatial pyramid pooling module of the multi-task joint learning network to obtain the feature vector. After the feature vector passes through the second upsampling module of the multi-task joint learning network, it is merged with the third fusion feature map to obtain the semantic segmentation feature map.

[0019] After merging the boundary feature map and the semantic segmentation feature map, the semantic segmentation prediction result is obtained through the third upsampling module of the multi-task joint learning network. The boundary feature alignment task establishes a boundary alignment association between the semantic segmentation prediction result and the boundary prediction result.

[0020] Furthermore, the input to the nonlocal attention network consists of two feature maps at different scales. The lower-scale feature map is used to compute Value and Key, respectively. Value is passed through a 1×1 convolution, ASPP, and view function to obtain a linear matrix γ, and Key is passed through a 1×1 convolution, ASPP, and view function to obtain a linear matrix θ. The higher-scale feature map is used to compute Query, which is passed through a 1×1 convolution and view function to obtain a linear matrix. matrix After the dot product operation of θ, softmax is performed to obtain the spatial location feature attention weights. The spatial location feature attention weights are then linearly transformed into the globally associated feature map after matrix dot product calculation with γ.

[0021] Furthermore, the input to the attention feature fusion network is two feature maps x1 and x2 of the same scale. x1 and x2 are added together to obtain feature map x. Local and global attention weights are calculated using local and global feature branches respectively. The result of multiplying x1 with the global attention weight and the result of multiplying x2 with the local attention weight are added together to obtain the final output feature.

[0022] The local feature branch consists of two Conv modules, two BN modules, one ReLU module, and one Sigmoid module connected in series;

[0023] The global feature branch consists of a Pooling module, two Conv modules, two BN modules, a ReLU module, and a Sigmod module connected in series.

[0024] Furthermore, the loss function for the boundary learning task during training of the multi-task boundary optimization image segmentation model is as follows:

[0025]

[0026] weight=pos_weigh·pos_mask+neg_weight·neg_mask,

[0027] in, Here, is the loss function for the boundary learning task, edge is the prediction result of the boundary learning task, boundary is the true boundary, weight is the weight matrix, BinaryCrossEntropy is the binary cross-entropy loss function of the weights, pos_mask is the binary mask marking the boundary positions, neg_mask is the binary mask marking the non-boundary positions, pos_weigh represents the weight matrix of the binary mask marking the boundary positions, neg_weight represents the weight matrix of the binary mask marking the non-boundary positions, and · represents the dot product operation of the matrices.

[0028] Furthermore, the loss function for the semantic segmentation task during training of the multi-task boundary optimization image segmentation model is as follows:

[0029]

[0030] in, is the loss function for the semantic segmentation task, w_i is the weight coefficient of the i-th pixel, y_i is the true label of the i-th pixel, and p_i is the probability value predicted by the model.

[0031] Furthermore, the loss function for the boundary feature alignment task during training of the multi-task boundary optimization image segmentation model is as follows:

[0032]

[0033] M l =OhemCrossEntropy(Seg,Target)

[0034] in, M represents the loss function for the boundary feature alignment task. l Let represent the semantic segmentation loss matrix, edge represent the boundary prediction result, threshold represent a threshold value, num represent the number of boundary pixels, OhemCrossEntropy is a cross-entropy loss function, Seg represent the semantic segmentation prediction result, and Target represent the true label of the semantic segmentation.

[0035] Furthermore, the model training process is as follows:

[0036] Select the optimizer, set the number of training iterations, and train a multi-task boundary optimization image segmentation model using the loss function for the boundary learning task, the loss function for the semantic segmentation task, the loss function for the boundary feature alignment task, and the training set.

[0037] The network parameters of the multi-task boundary optimization image segmentation model are calculated and updated by the optimizer, and the values ​​of the three loss functions are adjusted according to the network parameters.

[0038] The training process is guided by a weight decay strategy and a learning rate reduction method using cosine annealing. The training end time is determined based on the number of training iterations and the value of the loss function.

[0039] The present invention also proposes a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described image segmentation method that takes into account boundary information.

[0040] The present invention also proposes an electronic device, including a processor and a memory, wherein the processor and the memory are interconnected, wherein the memory is used to store a computer program, the computer program including computer-readable instructions, and the processor is configured to invoke the computer-readable instructions to execute the above-described image segmentation method that takes into account boundary information.

[0041] The beneficial effects of the technical solution provided by this invention are:

[0042] The proposed image segmentation method, which takes into account boundary information, captures feature information and improves segmentation performance through two attention mechanisms: a non-local attention network and a feature fusion sub-network. It also constructs boundary learning tasks, semantic segmentation tasks, and boundary feature alignment tasks to effectively define boundary information and improve segmentation performance. Attached Figure Description

[0043] Figure 1 This is a flowchart of an image segmentation method that takes boundary information into account according to an embodiment of the present invention;

[0044] Figure 2 This is a non-local attention network according to an embodiment of the present invention;

[0045] Figure 3 This is the attention feature fusion network of this invention embodiment;

[0046] Figure 4 This is a multi-task boundary optimization image segmentation model according to an embodiment of the present invention;

[0047] Figure 5 This is a block diagram of an electronic device according to an exemplary embodiment of the present invention. Detailed Implementation

[0048] To make the objectives, technical solutions, and advantages of the present invention clearer, the embodiments of the present invention will be further described below with reference to the accompanying drawings.

[0049] A flowchart of an image segmentation method considering boundary information according to an embodiment of the present invention is shown below. Figure 1 Specifically, it includes the following steps:

[0050] S1. Obtain the image to be segmented and divide it into a training set, a validation set, and a test set. Specifically, divide the corresponding labeled files into the training set, validation set, and test set in a 6:2:2 ratio.

[0051] In this embodiment of the invention, the image to be segmented is a dental calculus image.

[0052] S2. Based on a multi-task learning strategy, multi-task learning is a machine learning method. In the field of machine learning, multi-task learning is a joint learning approach where multiple tasks learn in parallel, and the results influence each other. This involves constructing boundary learning, semantic segmentation, and boundary feature alignment tasks, and then building a multi-task boundary optimization image segmentation model.

[0053] Image segmentation models with multi-task boundary optimization include: feature extraction networks, feature fusion networks, and multi-task joint learning networks.

[0054] The feature extractor includes: one downsampling module and four residual modules. The residual modules are the main structure for feature extraction, which can effectively extract feature information, improve information flow, accelerate training convergence, solve the gradient vanishing and exploding problems, and build deeper network models.

[0055] The feature fusion network consists of three feature fusion subnetworks; each feature fusion subnetwork includes a nonlocal attention network and an attention feature fusion network.

[0056] The multi-task joint learning network includes: a hollow spatial pyramid pooling module, a Canny operator, a 1×1 convolution, three upsampling modules, and one downsampling module.

[0057] The image to be segmented is input into the downsampling module of the feature extractor to obtain a reduced-size image. The reduced-size image is input into the first residual module of the feature extractor to obtain the first feature map. The first feature map is input into the second residual module of the feature extractor to obtain the second feature map. The second feature map is input into the third residual module of the feature extractor to obtain the third feature map. The third feature map is input into the fourth residual module of the feature extractor to obtain the fourth feature map.

[0058] The third and fourth feature maps are input into the non-local attention network of the first feature fusion sub-network to obtain the fifth feature map. The fifth and third feature maps are input into the attention feature fusion network of the first feature fusion sub-network to obtain the first fused feature map. The second and first fused feature maps are input into the non-local attention network of the second feature fusion sub-network to obtain the sixth feature map. The sixth and second feature maps are input into the attention feature fusion network of the second feature fusion sub-network to obtain the second fused feature map. The first and second fused feature maps are input into the non-local attention network of the third feature fusion sub-network to obtain the seventh feature map. The seventh and first feature maps are input into the attention feature fusion network of the third feature fusion sub-network to obtain the third fused feature map.

[0059] Through the boundary learning task, the image to be segmented is processed by the Canny operator of the multi-task joint learning network to extract the boundary features of the image. After the boundary features of the image are processed by the downsampling module of the multi-task joint learning network, they are merged with the reduced-size image obtained by the feature extractor to obtain the merged features. The merged features are processed by the 1×1 convolution of the multi-task joint learning network to obtain the boundary feature map. The boundary feature map is then upsampled by the first upsampling module of the multi-task joint learning network to obtain the boundary prediction result.

[0060] In the semantic segmentation task, the fourth feature map is input into the Atrous Spatial Pyramid Pooling (ASPP) module of the multi-task joint learning network to obtain a feature vector. After the feature vector passes through the second upsampling module of the multi-task joint learning network, it is merged with the third fusion feature map to obtain the semantic segmentation feature map.

[0061] After merging the boundary prediction results and the semantic segmentation feature maps, the semantic segmentation prediction results are obtained through the third upsampling module of the multi-task joint learning network. The boundary feature alignment task establishes a boundary alignment association between the semantic segmentation prediction results and the boundary prediction results.

[0062] Specifically, the downsampling module of the feature extractor consists of a Conv module, a BN module, a ReLU module, and a Pooling module connected in series. The Conv module uses 64 convolutional kernels of size 7×7 with a stride of 2. The BN layer accelerates the training process of the model, making the gradient in the model more stable during backpropagation. ReLU is used as the activation function. The Conv module obtains data with features of 256×256×64. The Pooling module uses max pooling with a stride of 2 to obtain data with features of 128×128×64.

[0063] The four residual modules of the feature extractor output feature maps at different scales. In this embodiment, the four residual modules of the feature extractor adopt the Bottleneck structure of the ResNet network. The first residual module of the feature extractor is configured with 64 1×1 convolutional kernels, 64 3×3 convolutional kernels, and 256 1×1 convolutional kernels, outputting data with a feature size of 64×64×256. The second residual module of the feature extractor is configured with 128 1×1 convolutional kernels, 128 3×3 convolutional kernels, and 512 1×1 convolutional kernels, outputting data with a feature size of 32×32×512. The third residual module of the feature extractor is configured with 256 1×1 convolutional kernels, 256 3×3 convolutional kernels, and 1024 1×1 convolutional kernels, outputting data with a feature size of 16×16×1024. The fourth residual module of the feature extractor is configured with 512 1×1 convolutional kernels, 512 3×3 convolutional kernels and 2048 1×1 convolutional kernels, and outputs data with features of 16×16×2048.

[0064] Non-local attention network reference of the present invention Figure 2 In the attention mechanism, the Query is used to measure the relevance to each Key, the Key is used to construct the computational basis for the attention score, and the Value is used to perform a weighted sum based on the attention scores to generate the final attention output. The input to the Non-local Attention Mechanism Network (NAMN) of the feature fusion network is two feature maps of different scales, with sizes of... And H×W×C. In the two feature maps of different scales, the lower-scale feature map is used to calculate Value and Key respectively. Value is obtained by passing a 1×1 convolution, ASPP, and view function to obtain a linear matrix γ, and Key is obtained by passing a 1×1 convolution, ASPP, and view function to obtain a linear matrix θ. In the two feature maps of different scales, the higher-scale feature map is used to calculate Query. Query is obtained by passing a 1×1 convolution and view function to obtain a linear matrix. pass The original mapping of spatial location feature attention weights is calculated by matrix dot product of θ. The original mapping is normalized by softmax to obtain spatial location feature attention weights. The spatial location feature attention weights are then matrix dot producted with γ to obtain a feature matrix with long-distance dependencies. The feature matrix is ​​then linearly transformed into a globally associated output feature map of size H×W×C using the view function.

[0065] Attention Feature Fusion Network Reference of Embodiments of the Invention Figure 3 The Attention Feature Fusion Network (AFFN) takes two feature maps x1 and x2 of the same scale as input. It adds x1 and x2 together to get feature map x. It calculates local and global attention weights using local and global feature branches respectively. It adds the results of multiplying x1 and x2 with the local and global attention weights respectively to get the final output feature.

[0066] The local feature branch consists of a concatenated Conv module, BN module, ReLU module, and Sigmod module; the global feature branch consists of a concatenated Pooling module, Conv module, BN module, ReLU module, and Sigmod module.

[0067] The nonlocal attention network of the first feature fusion subnetwork of the feature fusion network globally correlates and fuses the feature maps output by the third and fourth residual modules of the feature extractor into 16×16×256 data. The attention feature fusion network of the first feature fusion subnetwork then fuses the 16×16×256 data output by the nonlocal attention network of the first feature fusion subnetwork with the 16×16×256 data output by the third residual module of the feature extractor to obtain the 16×16×256 feature map output by the first feature fusion subnetwork.

[0068] The nonlocal attention network of the second feature fusion subnetwork of the feature fusion network globally associates and fuses the feature map output by the second residual module of the feature extractor with the feature map output by the first feature fusion subnetwork to obtain a 32×32×256 data. The attention feature fusion network of the second feature fusion subnetwork then fuses the 32×32×256 data output by the nonlocal attention network of the second feature fusion subnetwork with the 32×32×256 data output by the second residual module of the feature extractor to obtain the 32×32×256 feature map output by the second feature fusion subnetwork.

[0069] The nonlocal attention network of the third feature fusion sub-network of the feature fusion network globally associates and fuses the feature map output by the first residual module of the feature extractor with the feature map output by the second feature fusion sub-network to obtain a 64×64×256 data. The 64×64×256 data output by the nonlocal attention network of the third feature fusion sub-network is then fused with the 64×64×256 data output by the first residual module of the feature extractor to obtain the 64×64×256 feature map output by the third feature fusion sub-network.

[0070] The feature fusion network obtains information from different spatial locations of the feature map, suppresses unimportant information, and highlights effective information.

[0071] The Atrous Spatial Pyramid Pooling (ASPP) module of the multi-task joint learning network is configured with 256 1×1 convolutional kernels, 256 3×3 convolutional kernels with an dilation coefficient of 12, 256 3×3 convolutional kernels with an dilation coefficient of 24, and 256 3×3 convolutional kernels with an dilation coefficient of 36. It is connected in sequence with 1×1 adaptive average pooling and bilinear interpolation upsampling, and outputs data with 16×16×256 features.

[0072] In the boundary learning task, the boundary features extracted by the Canny operator are downsampled by four times and merged with the output of the downsampling module of the feature extractor. The data is then processed by a Conv module with 256 convolutional kernels of size 1×1 and stride of 1 to obtain a boundary feature map of size 128×128×256 rich in boundary information. This boundary feature map is then upsampled by four times to obtain a boundary prediction result of size 512×512×256.

[0073] In the semantic segmentation task, the output of the hollow space pyramid pooling module is upsampled by eight times to obtain data with a feature size of 128×128×256. This data is then merged with the output of the third feature fusion sub-network of the feature fusion network to output a semantic segmentation feature map of size 128×128×256. After merging the semantic segmentation feature map with the boundary feature map, the semantic segmentation prediction result is obtained through upsampling, thus achieving the final semantic segmentation prediction.

[0074] In the boundary feature alignment task, the boundary feature alignment loss function is calculated using semantic segmentation prediction results, boundary prediction results, and true label Target information to achieve boundary alignment optimization.

[0075] We construct corresponding loss functions for boundary learning, semantic segmentation, and boundary feature alignment tasks, and complete multi-task joint training and optimization.

[0076] Based on the boundary prediction results, a loss function for the boundary learning task during the training of the multi-task boundary optimization image segmentation model is constructed. The loss function for the boundary learning task during the training of the multi-task boundary optimization image segmentation model is as follows:

[0077]

[0078] weight=pos_weigh·pos_mask+neg_weight·neg_mask,

[0079] in, Here, is the loss function for the boundary learning task, edge is the prediction result of the boundary learning task, i.e., the boundary prediction result, boundary is the true boundary, weight is the weight matrix, BinaryCrossEntropy is the binary cross-entropy loss function of the weights, num represents the number of boundary pixels, pos_mask is the binary mask marking the boundary positions, neg_mask is the binary mask marking the non-boundary positions, pos_weigh represents the weight matrix of the binary mask marking the boundary positions, neg_weight represents the weight matrix of the binary mask marking the non-boundary positions, and · represents the dot product operation of the matrices.

[0080] The loss function for semantic segmentation during the training of a multi-task boundary optimization image segmentation model is as follows:

[0081]

[0082] in, is the loss function for the semantic segmentation task, w_i is the weight coefficient of the i-th pixel, y_i is the true label of the i-th pixel, and p_i is the probability value predicted by the model.

[0083] The loss function for boundary feature alignment during the training of a multi-task boundary optimization image segmentation model is as follows:

[0084]

[0085] M l =OhemCrossEntropy(Seg,Target)

[0086] in, M represents the loss function for the boundary feature alignment task. l Let represent the semantic segmentation loss matrix, edge represent the boundary prediction result, threshold represent a threshold value, num represent the number of boundary pixels, OhemCrossEntropy is a cross-entropy loss function, Seg represent the semantic segmentation prediction result, and Target represent the true label of the semantic segmentation.

[0087] Reference for Multi-Task Boundary Optimization Image Segmentation Model in Embodiments of the Invention Figure 4 .

[0088] The boundary learning task utilizes the boundary information extracted by the Canny operator and the low-scale feature information of the encoder to achieve effective learning of boundary information; the semantic segmentation task utilizes the features learned by the boundary learning task to optimize the segmentation of the boundary; the boundary feature alignment task establishes boundary alignment associations for the previous two tasks; through multi-task joint optimization, the boundary information is effectively defined, and the segmentation accuracy is improved.

[0089] Specifically, this invention selects the OHEM loss function for training, employs stochastic gradient descent (SGD) as the optimizer with an initial learning rate of 0.01 and a momentum parameter of 0.9. A weight decay strategy with a decay coefficient of 0.0001 and a random seed of 1 is also applied. Furthermore, cosine annealing is used to decrease the learning rate, better guiding the training process and ensuring the model reaches a better convergence state during training. The number of iterations is 40,000 to fully exploit the model's potential. The preset accuracy is measured using the average intersection-over-union ratio (IoU), which can be set to 80%.

[0090] S3. The multi-task boundary optimization image segmentation model is trained, validated, and tested using the training set, validation set, and test set to obtain the final image segmentation model.

[0091] The model training process is as follows:

[0092] Select an optimizer, set the number of training iterations, and train a multi-task boundary optimization image segmentation model using the loss functions for boundary learning, semantic segmentation, and boundary feature alignment tasks, along with the training set. Calculate and update the network parameters of the multi-task boundary optimization image segmentation model using the optimizer, and adjust the values ​​of the three loss functions based on the network parameters. Guide the training process using a weight decay strategy and a cosine annealing learning rate reduction method, and determine whether training has ended based on the number of training iterations and the values ​​of the loss functions.

[0093] The model weights of the trained model are validated using a validation set to generate evaluation results. Based on these results, the model weight parameters of the trained model are adjusted to obtain the validated model. Specifically:

[0094] Input the validation set into the trained model, calculate the average crossover ratio and loss function value of the trained model, obtain the loss curve of the loss function graph of the trained model, and stop training the multi-task boundary optimization image segmentation model if the loss curve converges.

[0095] If the loss curve does not converge, continue training the multi-task boundary optimization image segmentation model.

[0096] The validated model was tested using a test set to assess its image segmentation performance. Specifically:

[0097] The validated model is tested using a test set. If the test accuracy is less than the preset accuracy threshold, the multi-task boundary optimization image segmentation model is trained again. If the test accuracy is not less than the preset accuracy threshold, the multi-task boundary optimization image segmentation model is trained again and the weight file of the validated model is output.

[0098] S4. Input the image to be tested into the final image segmentation model to obtain the image segmentation result.

[0099] The images used for testing in this invention are images of dental calculus. For specific segmentation results, please refer to [the relevant documentation / reference]. Figure 4 .

[0100] In one exemplary embodiment, a computer-readable storage medium is included, which stores a computer program that, when executed by a processor, implements the image segmentation method that takes into account boundary information described above.

[0101] Please see Figure 5 In one exemplary embodiment, the device further includes an electronic device including at least one processor, at least one memory, and at least one communication bus.

[0102] The memory stores a computer program, which includes computer-readable instructions. The processor calls the computer-readable instructions stored in the memory through the communication bus to execute the image segmentation method that takes into account boundary information.

[0103] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An image segmentation method that takes into account boundary information, characterized in that, Includes the following steps: S1. Obtain the image to be segmented and divide the image into training set, validation set and test set; S2. Based on a multi-task learning strategy, construct boundary learning task, semantic segmentation task, and boundary feature alignment task, and construct a multi-task boundary optimization image segmentation model to jointly learn multiple tasks; Image segmentation models with multi-task boundary optimization include: feature extraction networks, feature fusion networks, and multi-task joint learning networks; The feature extraction network is used to extract feature maps of different scales from the image input to the model; the feature fusion network is used to fuse the features of the extracted images of different scales; based on the image input to the model, feature maps of different scales, and the fused feature maps, the multi-task joint learning network generates segmentation feature maps and boundary feature maps, and aligns the segmentation feature maps and boundary feature maps. S3. The multi-task boundary optimization image segmentation model is trained, validated, and tested using the training set, validation set, and test set to obtain the final image segmentation model. S4. Input the image to be tested into the final image segmentation model to obtain the image segmentation result; The feature extractor includes: one downsampling module and four residual modules; The feature fusion network consists of three feature fusion subnetworks; each feature fusion subnetwork includes a nonlocal attention network and an attention feature fusion network. The multi-task joint learning network includes: a hollow spatial pyramid pooling module, a Canny operator, a 1×1 convolution, three upsampling modules, and one downsampling module; The image to be segmented is input into the downsampling module of the feature extractor to obtain a reduced-size image. The reduced-size image is input into the first residual module of the feature extractor to obtain the first feature map. The first feature map is input into the second residual module of the feature extractor to obtain the second feature map. The second feature map is input into the third residual module of the feature extractor to obtain the third feature map. The third feature map is input into the fourth residual module of the feature extractor to obtain the fourth feature map. The third and fourth feature maps are input into the non-local attention network of the first feature fusion sub-network to obtain the fifth feature map. The fifth and third feature maps are input into the attention feature fusion network of the first feature fusion sub-network to obtain the first fused feature map. The second and first fused feature maps are input into the non-local attention network of the second feature fusion sub-network to obtain the sixth feature map. The sixth and second feature maps are input into the attention feature fusion network of the second feature fusion sub-network to obtain the second fused feature map. The first and second fused feature maps are input into the non-local attention network of the third feature fusion sub-network to obtain the seventh feature map. The seventh and first feature maps are input into the attention feature fusion network of the third feature fusion sub-network to obtain the third fused feature map. Through the boundary learning task, the image to be segmented is processed by the Canny operator of the multi-task joint learning network to extract the boundary features of the image. After the boundary features of the image are processed by the downsampling module of the multi-task joint learning network, they are merged with the reduced-size image obtained by the feature extractor to obtain the merged features. The merged features are processed by the 1×1 convolution of the multi-task joint learning network to obtain the boundary feature map. The boundary feature map is then upsampled by the first upsampling module of the multi-task joint learning network to obtain the boundary prediction result. Through the semantic segmentation task, the fourth feature map is input into the hollow spatial pyramid pooling module of the multi-task joint learning network to obtain the feature vector. After the feature vector passes through the second upsampling module of the multi-task joint learning network, it is merged with the third fusion feature map to obtain the semantic segmentation feature map. After merging the boundary feature map and the semantic segmentation feature map, the third upsampling module of the multi-task joint learning network is used to obtain the semantic segmentation prediction result. The boundary feature alignment task establishes a boundary alignment association between the semantic segmentation prediction result and the boundary prediction result. The input to the nonlocal attention network consists of two feature maps at different scales. The lower-scale feature map is used to compute Value and Key, respectively. Value is passed through a 1×1 convolution, ASPP, and view function to obtain a linear matrix γ, and Key is passed through a 1×1 convolution, ASPP, and view function to obtain a linear matrix θ. The higher-scale feature map is used to compute Query, which is passed through a 1×1 convolution and view function to obtain a linear matrix φ. , After the dot product operation, softmax is performed to obtain the spatial location feature attention weights. These spatial location feature attention weights are then multiplied by... After performing matrix dot product calculation, the linear transformation results in the globally associated feature map.

2. The image segmentation method considering boundary information according to claim 1, characterized in that, The input to the attention feature fusion network is two feature maps of the same scale. and ,Will and The feature map x is obtained by adding the features together. Local and global attention weights are then calculated using local and global feature branches, respectively. The result of multiplying by the global attention weights The results of multiplying with the local attention weights are added together to obtain the final output features; The local feature branch consists of two Conv modules, two BN modules, one ReLU module, and one Sigmoid module connected in series; The global feature branch consists of a Pooling module, two Conv modules, two BN modules, a ReLU module, and a Sigmod module connected in series.

3. The image segmentation method considering boundary information according to claim 1, characterized in that, The loss function for the boundary learning task during the training of a multi-task boundary optimization image segmentation model is as follows: , in, The loss function for the boundary learning task. For the prediction results of the boundary learning task, The true boundary is represented by the weight matrix. Indicates the number of boundary pixels. Here, is the weighted binary cross-entropy loss function, pos_mask is a binary mask marking the boundary positions, and neg_mask is a binary mask marking the non-boundary positions. The weight matrix represents the binary mask that marks the boundary positions. represents the weight matrix of the binary mask that marks non-boundary positions, and · represents the dot product operation of the matrix.

4. The image segmentation method considering boundary information according to claim 3, characterized in that, The loss function for semantic segmentation during the training of a multi-task boundary optimization image segmentation model is as follows: , in, is the loss function for the semantic segmentation task, w_i is the weight coefficient of the i-th pixel, y_i is the true label of the i-th pixel, and p_i is the probability value predicted by the model.

5. The image segmentation method considering boundary information according to claim 4, characterized in that, The loss function for boundary feature alignment during the training of a multi-task boundary optimization image segmentation model is as follows: in, The loss function represents the boundary feature alignment task. This represents the semantic segmentation loss matrix, and edge represents the boundary prediction result. Represents a threshold. Indicates the number of boundary pixels. It is a cross-entropy loss function. This represents the semantic segmentation prediction result. The true label representing semantic segmentation.

6. The image segmentation method considering boundary information according to claim 5, characterized in that, The model training process is as follows: Select the optimizer, set the number of training iterations, and train a multi-task boundary optimization image segmentation model using the loss function for the boundary learning task, the loss function for the semantic segmentation task, the loss function for the boundary feature alignment task, and the training set. The network parameters of the multi-task boundary optimization image segmentation model are calculated and updated by the optimizer, and the values ​​of the three loss functions are adjusted according to the network parameters. The training process is guided by a weight decay strategy and a learning rate reduction method using cosine annealing. The training end time is determined based on the number of training iterations and the value of the loss function.

7. A computer-readable storage medium storing a computer program, characterized in that: When the computer program is executed by a processor, it implements the method as described in any one of claims 1-6.

8. An electronic device, characterized in that, The device includes a processor and a memory, the processor being interconnected with the memory, wherein the memory is used to store a computer program, the computer program including computer-readable instructions, and the processor is configured to invoke the computer-readable instructions to perform the method as described in any one of claims 1-6.