An image segmentation processing method, control device, and readable storage medium

By combining HRNetV2, ASPP and PointRend networks, and integrating multi-stage feature extraction and stitching, the problems of poor performance and slow speed in image segmentation networks for small objects are solved, achieving high-precision and efficient image segmentation.

CN116645509BActive Publication Date: 2026-06-30GUANGZHOU YUNCONG ARTIFICIAL INTELLIGENCE TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GUANGZHOU YUNCONG ARTIFICIAL INTELLIGENCE TECH CO LTD
Filing Date
2023-05-24
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing image segmentation networks perform poorly when dealing with small objects, and high-resolution image segmentation is slow, making them difficult to apply in real-world projects.

Method used

The HRNetV2 network is used for feature extraction, and the ASPP and PointRend networks are combined for image segmentation. Through multi-stage feature extraction and stitching, high-resolution feature information is preserved, and the network is optimized using cross-entropy and class imbalance loss functions.

Benefits of technology

It improves the accuracy and speed of image segmentation, especially in the segmentation of small categories, solving the problem of poor segmentation results, while improving segmentation accuracy without affecting speed.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116645509B_ABST
    Figure CN116645509B_ABST
Patent Text Reader

Abstract

This invention relates to the field of image processing, specifically providing an image segmentation processing method, control device, and readable storage medium, aiming to solve the problem of how to achieve more accurate segmentation of subtle categories in images. To this end, the image segmentation processing method of this invention includes: extracting features from the image to be segmented to obtain extracted features and stitched features, wherein the stitched features are obtained by further extracting features from the extracted features; and selecting, according to preset conditions, to perform image segmentation using either the stitched features or the combined stitched features and extracted features, thereby obtaining a segmentation category map. This solution preserves the high resolution of the image and improves the accuracy of image segmentation. During segmentation, in addition to using the stitched features output by the feature extraction network, feature information from the feature extraction stage is also incorporated, providing more referenceable feature information when optimizing the segmentation results, further ensuring the refinement of the segmentation results.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and specifically provides an image segmentation processing method, a control device, and a readable storage medium. Background Technology

[0002] Semantic segmentation, a crucial task in computer vision, has numerous applications in autonomous driving and road video surveillance due to its pixel-level classification methods. In these applications, many fine categories, such as lane lines, need to be segmented, and most segmentation networks perform poorly on such small objects. During image feature extraction, significant amounts of important information are lost for smaller images, and the segmentation results for some fine categories are not precise.

[0003] To achieve better segmentation results, it's often necessary to combine the results of specialized decoders with feature maps of different scales and resolutions. Images from high-speed surveillance cameras typically have high resolution, and adding these decoders to the network results in slow inference speeds when processing such high-resolution images. Although real-time speed isn't required, such slow speeds still pose significant challenges for practical applications.

[0004] Accordingly, there is a need in the field for a new image segmentation processing scheme to solve the above problems. Summary of the Invention

[0005] To overcome the above-mentioned shortcomings, this invention is proposed to provide an image segmentation processing method that solves, or at least partially solves, the technical problem of how to achieve more accurate segmentation of small categories in images.

[0006] In a first aspect, the present invention provides an image segmentation processing method, comprising:

[0007] The image to be segmented is subjected to feature extraction to obtain extracted features and spliced ​​features, wherein the spliced ​​features are obtained by further feature extraction of the extracted features and the resolution of the spliced ​​features is not less than that of the extracted features;

[0008] Based on preset conditions, the image segmentation is performed using either the stitched features or a combination of the stitched features and the extracted features, thereby obtaining a segmentation category map.

[0009] In one embodiment, the step of extracting features from the image to be segmented to obtain extracted features and concatenated features includes:

[0010] The image to be segmented is downsampled to obtain downsampled features;

[0011] The downsampled features are then used to extract features to obtain the first extracted features;

[0012] The first extracted feature is used to extract features to obtain the second extracted feature;

[0013] The second extracted features are then used to extract the third extracted features.

[0014] The third extracted features are concatenated to obtain the concatenated features.

[0015] In one embodiment, the feature extraction is implemented using the HRNetV2 network.

[0016] In one embodiment, the image segmentation using the stitching features includes:

[0017] The stitched features are input into the first image segmentation network to obtain the first segmentation result.

[0018] In one embodiment, the first image segmentation network is an ASPP network.

[0019] In one embodiment, the image segmentation using the stitched features and the extracted features includes:

[0020] The extracted features and the stitched features are input into a second image segmentation network to refine the first segmentation result, thereby obtaining a second segmentation result.

[0021] In one embodiment, the second image segmentation network is a PointRend network.

[0022] In one embodiment, the extracted features and the concatenated features are input into a second image segmentation network to refine the first segmentation result, resulting in a second segmentation result, including:

[0023] The first extracted feature, the third extracted feature, the concatenated feature, and the first segmentation result are all input into the second image segmentation network;

[0024] Based on the first segmentation result, select coarsely segmented pixels that are smaller than the preset threshold;

[0025] Obtain feature information corresponding to the coarsely segmented pixel points from the first extracted feature, the third extracted feature, and the splicing feature;

[0026] The feature information corresponding to the coarsely segmented pixels is fused to obtain the fused feature;

[0027] The coarsely segmented pixels are re-predicted based on the fusion features to obtain a second segmentation result.

[0028] In one embodiment, obtaining the segmentation category map includes:

[0029] The segmentation category map is obtained based on the segmentation results.

[0030] In one embodiment, selecting to use the stitched features for image segmentation or to use both the stitched features and the extracted features for image segmentation based on preset conditions includes:

[0031] Determine the required image segmentation accuracy based on the scenario or user needs;

[0032] Depending on the required image segmentation accuracy, the image segmentation can be performed using either the stitched features or a combination of the stitched features and the extracted features.

[0033] In one embodiment, the method further includes:

[0034] The neural network consisting of the feature extraction network, the first image segmentation network, and the second image segmentation network is trained, wherein the loss functions used include the cross-entropy loss function and the class imbalance loss function.

[0035] In a second aspect, a control device is provided, comprising a processor and a storage device, the storage device being adapted to store a plurality of program codes, the program codes being adapted to be loaded and executed by the processor to perform the image segmentation processing method described in any of the above-described technical solutions.

[0036] In a third aspect, a computer-readable storage medium is provided, wherein a plurality of program codes are stored therein, the program codes being adapted to be loaded and run by a processor to perform the image segmentation processing method described in any of the above-described technical solutions.

[0037] The above-described technical solutions of the present invention have at least one or more of the following beneficial effects:

[0038] In implementing the technical solution of this invention, this solution retains the high resolution of the image and improves the accuracy of image segmentation. During segmentation, in addition to using the high-resolution stitched features output by the feature extraction network, feature information from the feature extraction stage is also added, so that more feature information can be referenced when optimizing the segmentation results, further ensuring the refinement of the segmentation results.

[0039] Meanwhile, during the model application phase, the output of the optimized first image segmentation network can be selected as the final segmentation result, significantly improving both accuracy and speed in image segmentation. In the network optimization phase, in addition to the conventional cross-entropy loss function, class-imbalanced cross-entropy loss is also employed to alleviate the problem of imbalanced samples. Attached Figure Description

[0040] The disclosure of this invention will become more readily understood with reference to the accompanying drawings. It will be readily understood by those skilled in the art that these drawings are for illustrative purposes only and are not intended to limit the scope of protection of this invention. Furthermore, similar numbers in the drawings are used to denote similar components, wherein:

[0041] Figure 1 This is a schematic flowchart of the main steps of an image segmentation processing method according to an embodiment of the present invention;

[0042] Figure 2 This is a schematic diagram of the main structure of a neural network model according to an embodiment of the present invention;

[0043] Figure 3 This is a schematic flowchart of the main steps of a method for obtaining extracted and stitched features of an image to be segmented according to an embodiment of the present invention;

[0044] Figure 4 This is a schematic flowchart of the main steps of an image segmentation method based on preset conditions according to an embodiment of the present invention.

[0045] Figure 5 This is a schematic flowchart of the main steps of a method for refining a first segmentation result using a second image segmentation network to obtain a second segmentation result according to an embodiment of the present invention.

[0046] Figure 6 This is a schematic diagram of the main structure of a control device according to an embodiment of the present invention. Detailed Implementation

[0047] Some embodiments of the present invention will now be described with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are merely illustrative of the technical principles of the present invention and are not intended to limit the scope of protection of the present invention.

[0048] In the description of this invention, "module" and "processor" can include hardware, software, or a combination of both. A module can include hardware circuitry, various suitable sensors, communication ports, memory, and may also include software components, such as program code, or a combination of software and hardware. A processor can be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and / or signal processing capabilities. The processor can be implemented in software, in hardware, or a combination of both. Non-transitory computer-readable storage media includes any suitable medium capable of storing program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random access memory, etc. The term "A and / or B" means all possible combinations of A and B, such as only A, only B, or A and B. The terms "at least one A or B" or "at least one of A and B" have a similar meaning to "A and / or B" and can include only A, only B, or A and B. The singular terms "a" or "this" can also include plural forms.

[0049] See appendix Figure 1 , Figure 1 This is a schematic flowchart of the main steps of an image segmentation processing method according to an embodiment of the present invention. It mainly includes the following steps S10-S20.

[0050] S10, the image to be segmented is subjected to feature extraction to obtain extracted features and spliced ​​features, wherein the spliced ​​features are obtained by further feature extraction of the extracted features and the resolution of the spliced ​​features is not less than that of the extracted features.

[0051] In this embodiment, the image to be segmented undergoes multi-stage feature extraction sequentially. In addition to retaining the stitched features output from the end of the feature extraction network, the extracted features from each stage are also preserved. During feature extraction, the high resolution of the high-resolution branch of the main path is always maintained. Each stage of feature extraction generates a new low-resolution network branch based on the previous stage, and then fuses the features of this new low-resolution network branch with those of other branches. At this point, the feature map of the high-resolution branch contains feature information from multiple resolutions.

[0052] In one embodiment, such as Figure 3 As shown, the step of extracting features from the image to be segmented to obtain extracted features and concatenated features includes steps S101-S105:

[0053] S101, downsample the image to be segmented to obtain downsampled features;

[0054] S102, extract features from the downsampled features to obtain the first extracted features;

[0055] S103, perform feature extraction on the first extracted feature to obtain the second extracted feature;

[0056] S104, perform feature extraction on the second extracted feature to obtain the third extracted feature;

[0057] S105, the third extracted features are concatenated to obtain the concatenated features.

[0058] In one specific embodiment, the feature extraction is implemented using the HRNetV2 network. This allows the obtained feature map to fuse more levels of features while maintaining high resolution. Although the resolution is high, the number of channels is relatively small, so the overall speed loss is not significant. Specifically, in the downsampling stage, the image to be segmented is downsampled to 1 / 4 of its original size. Each feature extraction stage expands by a low-resolution branch compared to the previous stage, and the output features of this branch are downsampled by 2 times, with the number of channels increased by 2 times.

[0059] Based on this, the above steps S101-S105 specifically include:

[0060] Downsampling stage: Main path downsampling is performed by convolving the image to be segmented twice with a stride of 2 using a 3×3 convolution. This reduces the image's height (H) and width (W) to H / 4 and W / 4, respectively. Four Basic blocks or bottle blocks are then used for processing. The resulting downsampled high-resolution features (size [H / 4, W / 4, 256]) are then input into the first extraction stage.

[0061] The first extraction stage proceeds as follows: Based on the previous stage, a low-resolution branch is generated. Each branch (one high-resolution branch and one low-resolution branch) uses four Basic blocks or bottle blocks for feature extraction. Finally, multi-scale fusion is repeated to obtain the final features from the first extraction stage. The resulting outputs (sizes [H / 4, W / 4, 32] and [H / 8, W / 8, 64], respectively) are then input into the second extraction stage.

[0062] The second extraction stage proceeds: Based on the first extraction stage, a low-resolution branch is generated. Each branch then uses four Basic blocks or bottle blocks for feature extraction. Finally, repeated multi-scale fusion is performed to obtain the final features from the second extraction stage. The resulting outputs (with sizes [H / 4, W / 4, 32], [H / 8, W / 8, 64], and [H / 16, W / 16, 128]) are then input into the third extraction stage.

[0063] The third extraction stage proceeds: Based on the second extraction stage, a low-resolution branch is generated. Each branch then uses four Basicblocks or Bottleblocks for feature extraction. Finally, multi-scale fusion is repeated to obtain the final features from the third extraction stage. The resulting outputs (with sizes [H / 4, W / 4, 32], [H / 8, W / 8, 64], [H / 16, W / 16, 128], [H / 32, W / 32, 256]) are then input into the fourth concatenation stage.

[0064] The fourth stage involves upsampling the outputs of the three parallel low-resolution subnetworks to the size of the high-resolution subnetwork. Then, the results from the four branches of the same size are connected using a simple join operation. Finally, the connected results are transformed into the number of semantic segmentation categories through a 1×1 convolution, yielding the final result.

[0065] In this embodiment, the image to be segmented is first downsampled to obtain downsampled high-resolution features (located in high-resolution subnetwork branches). Then, the feature extraction stage begins, adding low-resolution subnetwork branches obtained from the high-resolution branches through downsampling at the start of each stage. Before the end, repeated multi-scale fusion is performed to ensure that the feature information encompasses multiple resolutions. Finally, in the fourth stage, the outputs of all parallel low-resolution subnetworks from the third extraction stage are upsampled to the size of the high-resolution subnetwork branches. Then, a high-resolution representation rich in information is obtained by connecting the representations of all subnetworks.

[0066] S20, based on preset conditions, select to use the stitching features for image segmentation or use the stitching features and the extracted features for image segmentation, thereby obtaining a segmentation category map.

[0067] In one embodiment, such as Figure 4 As shown, the step of selecting to use the stitched features for image segmentation or to use both the stitched features and the extracted features for image segmentation according to preset conditions includes steps S21-S22:

[0068] S21, Determine the required image segmentation accuracy based on the scenario or user needs;

[0069] S22, depending on the required image segmentation accuracy, select to use the stitching features for image segmentation or to use both the stitching features and the extracted features for image segmentation.

[0070] In this embodiment, the requirements for image segmentation accuracy vary depending on the different segmentation scenarios. If higher accuracy is required, then image segmentation using both concatenated and extracted features is chosen. If image accuracy requirements are not high, but faster image processing is needed to complete segmentation, then image segmentation using only concatenated features is chosen. The choice can be flexible and depends on the user's actual needs and the requirements of the scenario.

[0071] In one embodiment, such as Figure 2 As shown, the image segmentation using the stitched features includes step S201:

[0072] S201, The splicing features are input into the first image segmentation network to obtain the first segmentation result.

[0073] In this embodiment, the concatenated features are input into a first image segmentation network for segmentation. This first image segmentation network employs a spatial pyramid pooling network, ensuring that the features generated by convolution are no longer single-scale. This achieves multi-scale feature extraction, and the fusion of these multi-scale features enhances the model's ability to recognize segmented targets of different sizes, thereby improving segmentation accuracy.

[0074] Preferably, the first image segmentation network is an ASPP network. This can increase the receptive field without changing the feature map size, increasing parameters, or losing information.

[0075] In addition, such as Figure 2 As shown, the image segmentation using the stitched features and the extracted features includes step S202:

[0076] S202, the extracted features and the spliced ​​features are input into the second image segmentation network to refine the first segmentation result and obtain the second segmentation result.

[0077] In one embodiment, such as Figure 5 As shown, the extracted features and the concatenated features are input into a second image segmentation network to refine the first segmentation result, resulting in a second segmentation result, including steps S2021-S2025:

[0078] S2021, the first extracted feature, the third extracted feature, the stitching feature and the first segmentation result are all input into the second image segmentation network;

[0079] S2022, Based on the first segmentation result, select coarsely segmented pixels that are smaller than the preset threshold;

[0080] S2023, Obtain feature information corresponding to the coarse segmentation pixel from the first extracted feature, the third extracted feature, and the splicing feature;

[0081] S2024, fuse the feature information corresponding to the coarsely segmented pixels to obtain the fused feature;

[0082] S2025, based on the fusion features, re-predict the coarse segmentation pixels to obtain the second segmentation result.

[0083] In this embodiment, feature maps generated at different extraction stages are taken and input into a second image segmentation network, which refines points that are difficult to classify in the first image segmentation network. For this purpose, features from three stages—first extraction features, third extraction features, and concatenated features—are selected. During the continuous downsampling, feature extraction, and fusion process, the third extraction features contain more resolution feature information. For some points that lose a lot of feature information during downsampling and are difficult to classify, even more resolution information may not be sufficient for classification. Therefore, the first extraction features, which have not undergone many downsampling iterations, are used as a reference for relearning. Selecting features from multiple extraction stages to refine points that are difficult to classify in the first image segmentation network provides more feature information for these points compared to using only concatenated features, thus facilitating more refined image segmentation.

[0084] Preferably, the second image segmentation network is a PointRend network, which further optimizes the difficult-to-classify pixels. The PointRend network first selects N pixels that are difficult to classify at the pixel level from the first segmentation result. Then, it obtains the feature information corresponding to these N points from the features of the first extraction stage, the features of the third extraction stage, and the concatenated features, and merges the three. A multilayer perceptron (MLP) is used to learn again on the fused features, and the result of this relearning is used to refine the first segmentation result to obtain the second segmentation result.

[0085] In one embodiment, obtaining the segmentation category map includes: obtaining the segmentation category map based on the segmentation result. Specifically, as shown below... Figure 2 As shown, the first segmentation result output by the first image segmentation network is input into the classification network to obtain a segmentation category map. Alternatively, the second segmentation result output by the second image segmentation network is input into the classification network to obtain a segmentation category map.

[0086] In this embodiment, the width and height of the obtained category map are the same as the original. Figure 1 The number of channels equals the number of categories, corresponding to the probability of each pixel belonging to each category. Taking the maximum probability yields the category to which the pixel belongs. After traversing the entire image, all pixels belonging to a certain category can be extracted to form a category map for that category. Since a pixel can only belong to one category, all categories can be placed in one map, or each category can be stored separately. Image segmentation is then performed based on the obtained category maps.

[0087] Furthermore, for linear segmentation categories such as solid or dashed lines, the skeleton information is extracted and fitted into a straight line or curve for subsequent judgment of various vehicle behaviors. In one optional example, the Zhang-suen thinning method is used to extract the skeleton information, which is equivalent to extracting a long and thin region into a set of points for a line. Then, a straight line or third-order curve is fitted using the obtained point set.

[0088] In one embodiment, a neural network model is trained prior to step S10. The neural network model during the training phase includes a feature extraction network, a first image segmentation network, and a second image segmentation network.

[0089] In this embodiment, when training the neural network, the first segmentation result output by the first image segmentation network is used only as the input to the second image segmentation network. That is, after the extracted features and concatenated features are obtained through the feature extraction network, the training steps are similar to steps S221-S225, but the input is the already labeled training samples.

[0090] The first and second image segmentation networks work together to optimize the overall neural network, ensuring the accuracy of the overall neural network model in image processing. Because the second image segmentation network was used during training, the overall accuracy of the neural network was significantly improved, which in turn improved the accuracy of the segmentation category map obtained from the first segmentation result output by the first image segmentation network. This also makes it possible to use the optimized first image segmentation network alone for image segmentation in certain application scenarios (step S201). In other words, when image segmentation accuracy requirements are not very high, the category map of the image to be segmented can be obtained based on the first segmentation result to complete the image segmentation. Thus, in the application stage, both a certain level of segmentation accuracy and image segmentation speed are guaranteed.

[0091] In one embodiment, the neural network consisting of the feature extraction network, the first image segmentation network, and the second image segmentation network is trained using loss functions including cross-entropy loss and class imbalance loss. That is, in addition to the conventional cross-entropy loss, a class imbalance loss function (OHEMCrossEntropyloss) is introduced. The class imbalance loss function primarily selects samples with high diversity or high loss values ​​as hard samples, using them to optimize the network, which helps to solve the problem of uneven sample distribution.

[0092] In one specific embodiment, a training image set is created for the neural network model. Using a highway scene as the training background, 5000 surveillance images of a highway are first acquired through an image network and surveillance video. Different parts of the images are labeled, such as: background, road, emergency lane, solid lines, dashed lines, guide lines, and median strips. Image processing is then performed on the labeled images, including but not limited to image enhancement, random image rotation, image blurring, and color changes. After common image processing steps such as segmentation, the original aspect ratio is maintained and the images are used for training.

[0093] We conducted practical operational experiments on the trained neural network model, and collected and created training, validation, and test sets, with 7 categories. We compared our approach with conventional image processing networks DeeplabV3+, DANet, PointRend, and DeeplabV3+, and the results are shown in Table 1 below.

[0094] Table 1. Comparison results of network extraction

[0095] DeeplabV3+ DANet PointRend and DeeplabV3+ Ours the way 86.81 90.04 91.66 96.49 solid line 64.65 67.31 68.39 73.85 dotted line 62.6 65.73 66.48 72.19 emergency lane 82.15 85.17 85.75 90.43 Guide line 87.37 87.55 86.99 89.04 isolation zone 80.01 81.23 80.89 85.87 background 94.31 96.76 96.43 97.56 mIou 79.7 81.97 82.37 86.49

[0096] As shown in the table above, the mIOU of the second segmentation result was improved by about 5% on the self-made test set, and the accuracy of various vehicle behaviors was also significantly improved in actual use. Meanwhile, in terms of speed, ONNX inference for 1024x1024 images can achieve 13fps, while TRT inference can achieve 22fps.

[0097] In implementing the technical solution of this invention, this solution retains the high resolution of the image and improves the accuracy of image segmentation. During segmentation, in addition to using the high-resolution stitched features output by the feature extraction network, feature information from the feature extraction stage is also added, so that more feature information can be referenced when optimizing the segmentation results, further ensuring the refinement of the segmentation results.

[0098] Furthermore, in the application phase of the model, in scenarios where high accuracy is not required but processing speed is, only the output of the first image segmentation network optimized using the above training method can be selected as the final segmentation result. This significantly improves both accuracy and speed in image segmentation. During network optimization, in addition to the conventional cross-entropy loss function, class-imbalanced cross-entropy loss is also employed to alleviate the problem of imbalanced samples.

[0099] It should be noted that although the steps in the above embodiments are described in a specific order, those skilled in the art will understand that in order to achieve the effects of the present invention, different steps do not necessarily have to be executed in such an order. They can be executed simultaneously (in parallel) or in other orders, and these variations are all within the scope of protection of the present invention.

[0100] Those skilled in the art will understand that all or part of the processes in the method of the above embodiment of the present invention can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable file, or some intermediate form. The computer-readable storage medium can include any entity or device capable of carrying the computer program code, a medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, a read-only memory, a random access memory, an electrical carrier signal, a telecommunication signal, and a software distribution medium, etc. It should be noted that the content included in the computer-readable storage medium can be appropriately added or removed according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, the computer-readable storage medium does not include electrical carrier signals and telecommunication signals.

[0101] Furthermore, the present invention also provides a control device. In one embodiment of the control device according to the present invention, the control device includes a processor and a storage device. The storage device can be configured to store a program for executing the methods of the above-described method embodiments, and the processor can be configured to execute the program in the storage device. The program includes, but is not limited to, a program for executing the image segmentation processing method of the above-described method embodiments. For ease of explanation, only the parts related to the embodiments of the present invention are shown; for specific technical details not disclosed, please refer to the method section of the embodiments of the present invention. This control device can be a control device device comprising various electronic devices.

[0102] Furthermore, the present invention also provides a computer-readable storage medium. In one embodiment of the computer-readable storage medium according to the present invention, the computer-readable storage medium can be configured to store a program for performing the methods of the above-described method embodiments, the program being loaded and run by a processor to implement the above-described image segmentation processing method. For ease of explanation, only the parts related to the embodiments of the present invention are shown; for specific technical details not disclosed, please refer to the method section of the embodiments of the present invention. The computer-readable storage medium can be a storage device comprising various electronic devices; optionally, in the embodiments of the present invention, the computer-readable storage medium is a non-transitory computer-readable storage medium.

[0103] Furthermore, it should be understood that since the various modules are only provided to illustrate the functional units of the device of the present invention, the physical devices corresponding to these modules may be the processor itself, or a part of the processor's software, hardware, or a combination of software and hardware. Therefore, the number of modules shown in the figures is merely illustrative.

[0104] Those skilled in the art will understand that the various modules in the device can be adaptively split or combined. Such splitting or combining of specific modules will not cause the technical solution to deviate from the principles of the present invention; therefore, the technical solutions after splitting or combining will fall within the protection scope of the present invention.

[0105] The technical solution of the present invention has been described above with reference to the preferred embodiments shown in the accompanying drawings. However, it will be readily understood by those skilled in the art that the scope of protection of the present invention is obviously not limited to these specific embodiments. Without departing from the principles of the present invention, those skilled in the art can make equivalent changes or substitutions to the relevant technical features, and the technical solutions after such changes or substitutions will all fall within the scope of protection of the present invention.

Claims

1. An image segmentation processing method characterized by, include: The image to be segmented is subjected to feature extraction to obtain extracted features and spliced ​​features, wherein the spliced ​​features are obtained by further feature extraction of the extracted features and the resolution of the spliced ​​features is not less than that of the extracted features; Based on preset conditions, the image segmentation is performed using either the stitched features or by using both the stitched features and the extracted features, thereby obtaining a segmentation category map. The step of extracting features from the image to be segmented to obtain extracted features and concatenated features includes: The image to be segmented is downsampled to obtain downsampled features; The downsampled features are then used to extract features to obtain the first extracted features; The first extracted feature is used to extract features to obtain the second extracted feature; The second extracted features are then used to extract the third extracted features. The third extracted features are concatenated to obtain the concatenated features; The feature extraction is achieved through the HRNet V2 network, which allows the obtained feature map to fuse more levels of features while maintaining high resolution; The image segmentation using the stitched features includes: The splicing features are input into the first image segmentation network to obtain the first segmentation result; The first image segmentation network is an ASPP network; The image segmentation using the stitched features and the extracted features includes: The extracted features and the stitched features are input into a second image segmentation network to refine the first segmentation result, thereby obtaining a second segmentation result; The second image segmentation network is a PointRend network; The extracted features and the concatenated features are input into a second image segmentation network to refine the first segmentation result, resulting in a second segmentation result, including: The first extracted feature, the third extracted feature, the concatenated feature, and the first segmentation result are all input into the second image segmentation network; Based on the first segmentation result, select coarsely segmented pixels that are less than the preset threshold; Obtain feature information corresponding to the coarsely segmented pixel points from the first extracted feature, the third extracted feature, and the splicing feature; The feature information corresponding to the coarsely segmented pixels is fused to obtain the fused feature; Based on the fusion features, the coarsely segmented pixels are re-predicted to obtain a second segmentation result; The step of selecting to use the stitched features for image segmentation or to use both the stitched features and the extracted features for image segmentation according to preset conditions includes: Determine the required image segmentation accuracy based on the scenario or user needs; Depending on the required image segmentation accuracy, the image segmentation can be performed using either the stitched features or a combination of the stitched features and the extracted features.

2. The method according to claim 1, characterized in that, The obtained segmentation category map includes: The segmentation category map is obtained based on the segmentation results.

3. The method according to claim 1, characterized in that, The method further includes: The neural network consisting of a feature extraction network, a first image segmentation network, and a second image segmentation network is trained, and the loss functions used include the cross-entropy loss function and the class imbalance loss function.

4. A control device comprising a processor and a memory, wherein the memory stores a program, characterized in that, When the processor executes the program, it implements the method of any one of claims 1 to 3.

5. A computer-readable storage medium storing a program, characterized in that, When the program is executed by the processor, it implements the method of any one of claims 1 to 3.