Method, electronic device and computer program product for image segmentation

By selecting the category with the lowest recall rate for the image segmentation model to discard and adjusting the weights during training, the computational requirements and accuracy issues in class-imbalanced image segmentation are resolved, achieving efficient and accurate image segmentation results.

CN116993754BActive Publication Date: 2026-06-19DELL PROD LP

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
DELL PROD LP
Filing Date
2022-04-22
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to achieve efficient and accurate segmentation results in image segmentation tasks with imbalanced class composition, especially when the proportions of target classes are significantly different. Existing loss function methods cannot effectively address the class imbalance problem.

Method used

By training an image segmentation model, the category with the lowest recall rate is selected as the category to be discarded, and only the remaining categories are processed. The weights are dynamically adjusted during training to optimize the model's recall rate.

Benefits of technology

It reduces the computational requirements for image processing, improves processing speed and accuracy, is suitable for image segmentation on edge devices, and achieves higher security and lower latency image segmentation results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116993754B_ABST
    Figure CN116993754B_ABST
Patent Text Reader

Abstract

Embodiments of this disclosure relate to methods, electronic devices, and computer program products for image segmentation. The method can be performed by a trained image segmentation model. The method includes acquiring an image to be processed, which includes targets of multiple categories. The method further includes selecting a category to be discarded from the image to be processed based on the recall rate of each of the pre-acquired multiple categories. The method also includes processing the image to be processed based on multiple remaining categories other than the category to be discarded, to obtain a segmented image. This method significantly reduces the computational resources required for segmentation tasks, decreases the amount of image data processed, and improves image processing speed.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] Embodiments of this disclosure relate to the field of image processing, and more specifically, to methods, electronic devices, and computer program products for image segmentation of class-imbalanced images. Background Technology

[0002] In image processing tasks (e.g., image segmentation), class imbalance has always been a challenging problem. Image class imbalance refers to a significant difference in the proportion (quantity ratio, size ratio) of samples from different classes. This is usually caused by the difficulty of sampling samples or the small size of the sample instances. For example, in autonomous driving tasks, the size of objects such as pedestrians and obstacles in images is very small compared to the size of objects such as roads. Similarly, for some rare species, due to their scarcity, the number of samples that can be collected is relatively small. Therefore, how to identify these imbalanced classes is a crucial problem in current computer vision image processing tasks. Summary of the Invention

[0003] Embodiments of this disclosure provide a method, electronic device, and computer program product for image segmentation, and embodiments of this disclosure also provide a method, electronic device, and computer program product for training an image segmentation model.

[0004] According to a first aspect of this disclosure, a method for image segmentation is provided, the method being performed by a trained image segmentation model. The method includes acquiring an image to be processed, the image to be processed including targets of multiple categories. The method further includes selecting a category to be discarded in the image to be processed based on the recall rate of each of the pre-acquired multiple categories. The method further includes processing the image to be processed based on multiple remaining categories other than the category to be discarded, to obtain a segmented image.

[0005] According to a second aspect of this disclosure, an electronic device is provided. The electronic device includes at least one processor; and a memory coupled to the at least one processor and having instructions stored thereon, the instructions causing the device to perform actions when executed by the at least one processor, the actions including: acquiring an image to be processed, the image to be processed including targets of a plurality of categories; selecting a category to be discarded in the image to be processed based on the recall rate of each of the plurality of categories pre-acquired; and processing the image to be processed based on a plurality of remaining categories other than the category to be discarded, to obtain a segmented image.

[0006] According to a third aspect of this disclosure, a computer program product is provided, which is tangibly stored on a non-volatile computer-readable medium and includes machine-executable instructions that, when executed, cause a machine to perform the steps of the method in the first aspect of this disclosure.

[0007] According to a fourth aspect of this disclosure, a method for training an image segmentation model is provided. The method includes: acquiring a set of sample images, the set of sample images including targets of multiple categories, wherein the multiple categories include imbalanced categories. The method further includes: using the image segmentation model to acquire the recall change of each of the multiple categories. The method further includes: selecting a category to be adjusted in weights based on the recall change of each category. The method further includes: adjusting the weights of the category to be adjusted in weights, and training the image segmentation model to obtain a trained image segmentation model.

[0008] According to a fifth aspect of this disclosure, an electronic device is provided. The electronic device includes at least one processor; and a memory coupled to the at least one processor and having instructions stored thereon, the instructions causing the device to perform actions when executed by the at least one processor, the actions including: acquiring a set of sample images, the set of sample images including targets of multiple categories, wherein the multiple categories include imbalanced categories; using an image segmentation model to acquire a recall change for each of the multiple categories; selecting a category to be weighted based on the recall change for each category; adjusting the weights of the category to be weighted; and training the image segmentation model to obtain a trained image segmentation model.

[0009] According to a sixth aspect of this disclosure, a computer program product is provided, which is tangibly stored on a non-volatile computer-readable medium and includes machine-executable instructions that, when executed, cause a machine to perform the steps of the method in the fourth aspect of this disclosure. Attached Figure Description

[0010] The above and other objects, features and advantages of this disclosure will become more apparent from the accompanying drawings, in which like reference numerals generally denote like parts.

[0011] Figure 1 The illustration shows a schematic diagram of an example environment 100 in which the devices and / or methods according to embodiments of the present disclosure may be implemented;

[0012] Figure 2 A flowchart of a method 200 for image segmentation according to an embodiment of the present disclosure is illustrated;

[0013] Figure 3 illustrates an exemplary diagram of the image to be processed and the segmented image according to the present disclosure;

[0014] Figure 4 A schematic flowchart of a training method 400 for training an image segmentation model according to an embodiment of the present disclosure is shown.

[0015] Figure 5 A schematic flowchart of a method 500 for determining a category to be adjusted in accordance with an embodiment of the present disclosure is shown.

[0016] Figure 6 A schematic block diagram of an example device 600 suitable for implementing embodiments of the present disclosure is shown.

[0017] In the various figures, the same or corresponding reference numerals indicate the same or corresponding parts. Detailed Implementation

[0018] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.

[0019] In the description of embodiments of this disclosure, the term "comprising" and similar terms should be understood as open-ended inclusion, i.e., "including but not limited to". The term "based on" should be understood as "at least partially based on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first", "second", etc., may refer to different or the same objects. Other explicit and implicit definitions may also be included below.

[0020] In image segmentation tasks, handling class imbalance is typically achieved through designing an equalization loss function. There are three main types of loss function equalization methods: region-based equalization, statistical equalization, and performance-based equalization. Region-based equalization methods primarily use IoU as the evaluation parameter. However, considering the "false positives" and "false negatives" inherent in class imbalance, current designs often achieve high average accuracy even with low average IoU, making them unsuitable for class imbalance tasks. Statistical equalization methods adjust weights using inverse frequency cross-entropy loss. This method sometimes requires network modifications and optimization of the iteration process, making it unsuitable for image segmentation. Performance-based equalization loss functions are mainly used in object detection tasks, and research shows that this method is not very successful in other image processing tasks.

[0021] To address at least the aforementioned and other potential problems, embodiments of this disclosure propose a method for image segmentation, performed by a trained image segmentation model. The method includes acquiring an image to be processed, which includes targets of multiple categories. The method further includes selecting a category to be discarded from the image to be processed based on the recall rate of each of the pre-acquired multiple categories. The method also includes processing the image to be processed based on multiple remaining categories other than the category to be discarded, to obtain a segmented image. This method reduces the amount of image data required for processing, lowers the computational demands of the task, and enables more accurate segmentation results for class-imbalanced image processing tasks.

[0022] The embodiments of this disclosure will now be described in further detail with reference to the accompanying drawings, wherein... Figure 1 A schematic diagram of an example environment 100 in which embodiments of the present disclosure can be implemented is shown.

[0023] The example environment 100 includes a computing device 120, which includes an image segmentation model 122, which is a trained image segmentation model. The computing device 120 is used to receive an image 110 to be processed and to segment the image 110 to be processed using the image segmentation model 122 to obtain a segmented image 130.

[0024] In environment 100, the image to be processed 110 can be obtained through various types of image acquisition devices, which can be integrated with or separate from computing device 120. The image to be processed 110 may include images acquired in real time by the image acquisition device integrated in computing device 120, images received via a network or other transmission medium, or images read by accessing storage media. This disclosure does not limit the source of the image to be processed 110. In one embodiment, the image to be processed 110 includes targets of multiple categories.

[0025] In one embodiment, the image segmentation model 122 is trained, and the sample image set used to train the image segmentation model 122 includes targets of multiple categories, and these multiple categories include imbalanced categories, that is, the proportions (quantity proportions, size proportions) of samples of different categories in the sample image set are relatively disparate.

[0026] The computing device 104 includes, but is not limited to, personal computers, server computers, handheld or laptop devices, mobile devices (such as mobile phones, personal digital assistants (PDAs), media players, etc.), multiprocessor systems, consumer electronics, wearable electronic devices, smart home devices, minicomputers, mainframe computers, edge computing devices, and distributed computing environments that include any one of the above systems or devices.

[0027] The computing device 120 receives an image 110 to be processed. The computing device 120 includes an image segmentation model 122. The image segmentation model 122 is trained to select a category to be discarded from the image 110 based on the recall rate of each of a pre-acquired plurality of categories. Then, the image segmentation model 122 processes the image 110 based on the remaining categories other than the category to be discarded, to obtain a segmented image 130.

[0028] This method reduces the amount of image data required for processing, lowering the computational demands of the task. Therefore, the image segmentation method according to embodiments of this disclosure can also be deployed in edge devices, enabling image segmentation with higher security, lower latency, and higher reliability. Furthermore, the trained image segmentation model according to embodiments of this disclosure can achieve more accurate segmentation for class-imbalanced image processing tasks.

[0029] The above combination Figure 1 A block diagram of an example system 100 in which embodiments of the present disclosure can be implemented is described. The following is in conjunction with… Figure 2 A flowchart describing a method 200 for image segmentation according to embodiments of the present disclosure is provided. Method 200 can be performed in... Figure 1It can be executed on computing device 120 and any suitable computing device.

[0030] At box 202, computing device 120 can acquire image 110 to be processed, wherein image 110 includes targets of multiple categories. For example, Figure 3A An example of an image 310 to be processed is shown. This example image contains targets of multiple categories, such as cars, roads, traffic lines on roads, road shoulders, obstacles, etc. It should be understood that... Figure 3A The examples in this disclosure are merely illustrative and do not limit the specific content of the image 110 to be processed.

[0031] At box 204, computing device 120 can select the category to be discarded from the image to be processed 110 based on the recall rate of each of the multiple categories obtained in advance.

[0032] Recall rate represents the proportion of samples correctly predicted as positive out of all positive samples. A lower recall rate indicates a lower proportion of samples in the corresponding category being predicted as positive out of all positive samples. In one embodiment, for an image to be processed that includes imbalanced categories, computing device 120 can select a category to be discarded from the image 110 based on the recall rate of each of a plurality of pre-obtained categories. In one embodiment, computing device 120 can determine the category with the lowest recall rate as the category to be discarded.

[0033] In one embodiment, the computing device 120 may acquire multiple recall rates, each corresponding to the recall rate of a multiple class in the last iteration of the training phase. The computing device 120 determines the class with the lowest recall rate among the multiple recall rates and selects the determined class as the class to be discarded.

[0034] In another alternative embodiment, the computing device 120 may also acquire multiple recall rates, each corresponding to the recall rate of a multiple class in the last iteration of the training phase. The computing device 120 may also acquire multiple precision rates, each corresponding to the precision of a multiple class in the last iteration of the training phase. After acquiring the multiple recall rates and multiple precision rates, the computing device 120 determines the class with the lowest recall rate and the highest precision rate among the multiple recall rates, and selects the determined class as the class to be discarded.

[0035] Precision rate indicates how many of the predicted positive samples are actually positive samples, reflecting the model's ability to accurately predict positive samples. By considering both recall and precision, computing device 120 can select low-precision categories that are overconfident but actually predicted as positive samples as categories to be discarded. This allows for more accurate selection of categories with low correlation to the target classification in imbalanced classification, thereby reducing the amount of image data processed, lowering the computational requirements of the task, and improving the speed of image processing.

[0036] After selecting the category to be discarded, the image segmentation model 122 can discard the weights, connections, and computations associated with the category to be discarded during prediction, thus avoiding segmentation of the category to be discarded.

[0037] by Figure 3A Taking the image 310 to be processed as an example, the process is based on the recall rate of each category obtained in advance (e.g., Figure 3A The recall rate of each category in the image 310 to be processed during the last iteration of the training phase is used. Assuming that the recall rate of the road category is the lowest, the computing device 120 can select the road category as the category to be discarded. In this way, when the image segmentation model 120 performs segmentation processing on the image 310 to be processed, the weights, connections and calculations of the road will be discarded.

[0038] Similarly, taking accuracy into consideration, we still use... Figure 3A Taking the image to be processed in image 310 as an example, the process is based on the recall rate of each category obtained in advance (e.g., Figure 3A The recall rate of each class in the last iteration of the training phase and the precision of each class in the 310 images to be processed (e.g., Figure 3A The image segmentation model 120 can select the road category as the category to be discarded when it performs segmentation processing on the image 310 to be processed (the precision of each category in the last iteration of the training phase). Assuming that the road category has the lowest recall and the lowest precision, the image segmentation model 120 will discard the road weights, connections and calculations when it performs segmentation processing on the image 310 to be processed.

[0039] At box 206, computing device 120 can segment the image to be processed based on multiple remaining categories other than the category to be discarded among multiple categories in the image to be processed 110, thereby obtaining the segmented image 130.

[0040] by Figure 3ATaking the image to be processed 310 as an example, the image segmentation model 122 performs segmentation processing based on categories other than road categories in the image to be processed 310, thereby obtaining the segmented image 330, such as... Figure 3B As shown.

[0041] The image segmentation method 200 according to embodiments of the present disclosure significantly reduces the computational resources required for segmentation processing, decreases the amount of image data processed, and improves image processing speed. Therefore, the image segmentation method according to embodiments of the present disclosure is also suitable for deployment in edge devices, enabling image segmentation processing with higher security, lower latency, and higher reliability. Furthermore, the trained image segmentation model according to embodiments of the present disclosure can achieve more accurate segmentation for image processing tasks with imbalanced categories.

[0042] In one embodiment, in order to further reduce the amount of data processing of the image segmentation model 120 and improve the image processing speed, the image segmentation method 200 according to the present disclosure may further include determining the unlabeled category based on the confidence level and not labeling the unlabeled category in the segmented image, thereby improving the processing efficiency.

[0043] For example, in one embodiment, the computing device 120 can acquire the confidence score of each remaining category during the prediction process and compare these multiple confidence scores with a confidence threshold. Based on the comparison results between the multiple confidence scores and the confidence threshold, the computing device 120 can classify the remaining categories corresponding to confidence scores below the confidence threshold as unlabeled categories. Furthermore, in the segmented image, the computing device 120 can choose not to label these unlabeled categories; that is, the computing device 120 can label the segmented categories other than unlabeled categories and categories to be discarded.

[0044] By further selecting unlabeled categories based on confidence levels, the majority class in the imbalanced categories can be identified. Typically, for image processing of imbalanced categories, these majority classes can be considered as categories with low correlation to the target category in the imbalanced classification. By not labeling these majority classes, the segmented image can more prominently display the target category, thus facilitating subsequent processing, further reducing the amount of data, and improving image processing speed and efficiency.

[0045] In one embodiment, the image segmentation model 120 according to this disclosure includes a trained image segmentation model, which is trained based on a sample image set including multiple categories, including imbalanced categories. Furthermore, the trained image segmentation model is trained based on the recall rate change of each of the multiple categories during the training phase, wherein the recall rate change can include the difference in recall rate between two iterations of each category during the training phase. The training process of the image segmentation model 120 will be described in detail below with reference to the accompanying drawings. It should be understood that by using a sample image set including imbalanced categories and further training the image segmentation model 120 based on the recall rate change of each of the multiple categories during the training phase, the image segmentation model 120 according to this disclosure can obtain more accurate segmentation results for image processing tasks with imbalanced categories.

[0046] The following will combine Figure 4 This is a flowchart describing a training method 400 for an image segmentation model 120 according to an embodiment of the present disclosure. Method 400 can... Figure 1 The training method 400 can be executed at computing device 120, or at any other suitable computing device. This disclosure does not limit the computing device used to execute the training method 400. For the sake of brevity, the device used to execute the training method 400 will be collectively referred to as a training device in the following description, and it should be understood that the training device may include… Figure 1 The computing device 120 may also include any other suitable computing device.

[0047] At box 402, the training device acquires a set of sample images containing targets of multiple categories, including imbalanced categories. That is, the proportions (quantity ratio, size ratio) of samples of different categories in the sample image set differ significantly.

[0048] At box 404, the training device uses an image segmentation model to obtain the recall variation for each of the multiple categories.

[0049] In one embodiment, the training device can obtain the recall M of each class in the previous two iterations (the (i-2)th iteration and the (i-1)th iteration) before the current iteration (the i-th iteration) at predetermined training rounds. i-1,j and M i-2,j (Where, j is a positive integer greater than 1, representing the corresponding category).

[0050] The predetermined number of training epochs can be determined based on training needs. In one embodiment, the predetermined number of training epochs can be 5 epochs.

[0051] The training device can determine the recall change as the difference in recall between the previous two iterations. That is, for the j-th category, the difference in recall is:

[0052] ΔM j =M i-1,j -M i-2,j (Equation 1)

[0053] In an alternative embodiment, the training device may also obtain the average of the rates of change N times prior to the current iteration as the difference in the corresponding recall rates.

[0054] At box 406, the training device can select the category whose weights need to be adjusted based on the recall changes of each category.

[0055] At box 408, the training device can adjust the weights of the categories whose weights are to be adjusted to train the image segmentation model and obtain the trained image processing model.

[0056] In one embodiment, the training device varies ΔM based on the recall rate of each class j. j Select the category whose weights you want to adjust. The following will combine... Figure 5 This describes in detail the method 500 for selecting the category to be adjusted. Method 500 can be found in... Figure 1 The training method 500 can be executed at computing device 120, or at any other suitable computing device. For the sake of brevity, the device executing the training method 500 will be referred to as the training device in the following description, and it should be understood that the training device may include... Figure 1 The computing device 120 may also include any other suitable computing device.

[0057] At box 502, the training device can determine the difference in recall for each category to obtain multiple differences.

[0058] In one embodiment, the training device can obtain the difference ΔM of the recall rate for each category j using Equation 1 as described above. j This yields multiple differences, each corresponding to a category.

[0059] At box 504, the training device can sort multiple differences to form a first difference sequence and a second difference sequence.

[0060] In one embodiment, the training device can arrange the differences in recall in descending order to obtain a first sequence S1.

[0061] S1 = Argmax(M i-1,u -M i-2,u ),(M i-1,2 -Mi-2,2 )……(M i-1,n -M i-2,n )

[0062] The training device can arrange the differences in recall in ascending order to obtain the first sequence S2.

[0063] S2 = Argmin(M) i-1,v -M i-2,v ),(M i-1,2 -M i-2,2 )……(M i-1,n -M i-2,n )

[0064] It should be understood that the above illustrations of the first sequence S1 and the second sequence S2 are for illustrative purposes only. The differences in the first sequence S1 are sorted in descending order, and the differences in the second sequence S2 are sorted in ascending order.

[0065] At box 506, the training device can select at least one difference from the first difference sequence.

[0066] In one embodiment, the training device selects at least one difference from the first difference sequence S1. For example, the training device selects one or more differences from the first difference sequence S1 in a forward-to-back order. In one embodiment, the training device selects a first difference ΔM from the first difference sequence S1 in a forward-to-back order. u This difference represents the largest difference in recall among all the differences in recall.

[0067] At box 508, the training device can select at least one second difference from the second difference sequence.

[0068] In one embodiment, the training device selects at least one difference from the second difference sequence S2. For example, one or more differences are selected from the second difference sequence S2 in a front-to-back order. In one embodiment, the training device selects a first difference ΔM from the first difference sequence S1 in a front-to-back order. V This difference represents the minimum difference in recall among all the differences in recall.

[0069] At box 510, the training device can use the category corresponding to at least one first difference as the category of the first weight to be adjusted, and use the category corresponding to at least one second difference as the category of the second weight to be adjusted.

[0070] Continuing with the example above, after selecting the first difference and the second difference at boxes 504 and 508 respectively, the training device will use the first difference ΔM uThe corresponding category u is used as the first category to be adjusted in terms of weight, and at least one second difference ΔM is added. v The corresponding category v is used as the second category whose weights need to be adjusted.

[0071] In an alternative embodiment, the training device may further sort and form a difference sequence (e.g., a first difference sequence S1 or a second difference sequence S2). During difference training, the training device may select at least one first difference in a first order (e.g., from front to back) and at least one second difference in the difference sequence in a second order (e.g., from back to front). Similarly, the training device may use the category corresponding to the first difference as the category of the first weight to be adjusted and the category corresponding to the second difference as the category of the second weight to be adjusted.

[0072] At box 512, the training device can determine the category with the larger difference between the first and second categories of weights to be adjusted as the category to be increased in weight, and the category with the smaller difference as the category to be decreased in weight.

[0073] For categories with larger differences, their weights can be adjusted during training (e.g., increasing weights) to "reward" them, thus increasing their recall rate more rapidly with each training iteration, which is more beneficial for improving the image segmentation model's recall rate in the prediction phase. Conversely, for categories with smaller differences, their improvement during training can be considered limited. Their weights can be reduced to decrease their proportion during training, thereby further improving the image segmentation model's recall rate in the prediction phase.

[0074] In one embodiment, after determining the categories of the first weight to be adjusted and the second weight to be adjusted, the training device can further adjust the first weight to be adjusted and the second weight to be adjusted to train the image segmentation model and obtain the trained image processing model.

[0075] The following will describe in detail the specific implementation method of adjusting the weights of the training device.

[0076] In one embodiment, the training device can increase the weight of the category to be weighted by a predetermined value and decrease the weight of the category to be weighted by that predetermined value to obtain an updated threshold. For example, the training device can adjust the weights according to the following equations 2 and 3:

[0077] W i,u =W i-1,u +β Equation 2

[0078] W i,v =W i-1,v-β Equation 3

[0079] Among them, W i,u The weights of the categories to be weighted; and W i,v The weights of the categories whose weights are to be reduced. β represents the rate of change of the weights; in one embodiment, β can be determined to be 0.2.

[0080] The training device can adjust the weights W i,u and W i,v The image segmentation model is trained.

[0081] In one embodiment, the training device can, at predetermined training rounds, according to a combination Figure 5 Method 500 selects the category whose weights need to be adjusted, and combines the above methods. Figure 4 The method in box 408 adjusts the weights to train the image segmentation model.

[0082] By dynamically adjusting weights during training, categories with larger differences in recall are "rewarded" by increasing their weights during training. This allows the recall of such categories to improve faster with each training iteration, thus enhancing the image segmentation model's recall during the prediction phase. Conversely, categories with smaller differences in recall can have their weights reduced during training, further improving the image segmentation model's recall during the prediction phase.

[0083] In one embodiment, the training device can obtain the first error E obtained after the sample image set has been processed by the image segmentation model. (training) Then, the trained image segmentation model is used to process the image to be processed, and the second error E corresponding to the image to be processed is obtained. (testing) In response to determining that the difference between the first error and the second error is less than the error change threshold (i.e., EC = |E_t) (testing) -E (training) And EC <EC th The training device completes the training of the image segmentation model. In one embodiment, the error variation threshold EC th It can be equal to 0.1.

[0084] In addition, in one embodiment, the training device may also use a trained image segmentation model to process the image to be processed and obtain the error E corresponding to the image to be processed. (testing) If the training device detects error E (testing) If no improvement or enhancement occurs within a predetermined number of consecutive rounds, the training device completes the training of the image segmentation model. In one embodiment, the predetermined number can be K. E=20.

[0085] In addition, in one embodiment, the training device may also end the training of the image segmentation model after a predetermined number of rounds (e.g., 240 rounds of training).

[0086] This method reduces the amount of image data required for processing, lowering the computational demands of the task. Therefore, embodiments of this disclosure can also be deployed in edge devices, enabling image segmentation with higher security, lower latency, and higher reliability. Furthermore, the trained image segmentation model according to embodiments of this disclosure can achieve more accurate segmentation for class-imbalanced image processing tasks.

[0087] Figure 6 A schematic block diagram of an example device 600 that can be used to implement embodiments of the present disclosure is shown. Figure 1 The computing device 120 can be implemented using device 600. As shown, device 600 includes a central processing unit (CPU) 601, which can perform various appropriate actions and processes according to computer program instructions stored in read-only memory (ROM) 602 or loaded from storage unit 608 into random access memory (RAM) 603. RAM 603 can also store various programs and data required for the operation of device 500. CPU 601, ROM 602, and RAM 603 are interconnected via bus 504. Input / output (I / O) interface 605 is also connected to bus 604.

[0088] Multiple components in device 600 are connected to I / O interface 605, including: input unit 606, such as keyboard, mouse, etc.; output unit 607, such as various types of monitors, speakers, etc.; storage page 608, such as disk, optical disk, etc.; and communication unit 609, such as network card, modem, wireless transceiver, etc. Communication unit 609 allows device 600 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0089] The various processes and procedures described above, such as image segmentation method 200, image segmentation model training method 400, and related processes 500, can be executed by processing unit 601. For example, in some embodiments, image segmentation method 200, image segmentation model training method 400, and related processes 500 can be implemented as computer software programs tangibly contained in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program can be loaded and / or installed on device 600 via ROM 602 and / or communication unit 609. When the computer program is loaded into RAM 603 and executed by CPU 601, one or more actions of image segmentation method 200, image segmentation model training method 400, and related processes 500 described above can be performed.

[0090] This disclosure can be a method, apparatus, system, and / or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for performing various aspects of this disclosure.

[0091] Computer-readable storage media can be tangible devices capable of holding and storing instructions for use by an instruction execution device. Computer-readable storage media can be, for example—but not limited to—electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital multifunction disc (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punch cards or recessed protrusions storing instructions thereon, and any suitable combination of the foregoing. The computer-readable storage media used herein are not to be construed as transient signals themselves, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or electrical signals transmitted through wires.

[0092] The computer-readable program instructions described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network, such as the Internet, local area network, wide area network, and / or wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to the computer-readable storage media in the respective computing / processing device.

[0093] Computer program instructions used to perform the operations of this disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, status setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages ​​such as Smalltalk, C++, etc., and conventional procedural programming languages ​​such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuitry, such as programmable logic circuitry, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), is personalized by utilizing the status information of the computer-readable program instructions to implement various aspects of this disclosure.

[0094] Various aspects of this disclosure are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.

[0095] These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processing unit of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner. Thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.

[0096] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.

[0097] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

[0098] The various embodiments of this disclosure have been described above. These descriptions are exemplary and not exhaustive, and are not limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical applications, or technical improvements to the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.

Claims

1. An image segmentation method, the method being performed by a trained image segmentation model, the method comprising: Obtain an image to be processed, which includes targets of multiple categories; Based on the recall rate of each of the multiple categories obtained in advance, select the category to be discarded from the image to be processed; Based on the remaining categories other than the category to be discarded from the multiple categories, the image to be processed is processed to obtain a segmented image; The process of selecting a category to be discarded from the image to be processed, based on the recall rate of each of the pre-acquired multiple categories, includes: Multiple recall rates are obtained, each of which corresponds to the recall rate of the multiple categories in at least a specific iteration during the training phase; The category is determined based on the relative value of each of the plurality of recall rates; and Select the determined category as the category to be discarded.

2. The image segmentation method according to claim 1, wherein... The plurality of recall rates correspond to the recall rates of the plurality of categories in the last iteration of the training phase; and Determining the category includes identifying the category with the lowest recall rate among the plurality of recall rates.

3. The image segmentation method according to claim 1, wherein... The plurality of recall rates correspond to the recall rates of the plurality of categories in the last iteration of the training phase; The image segmentation method further includes obtaining multiple accuracies, each accuracies corresponding to the accuracies of the multiple categories in the last iteration of the training phase; and Determining a category includes identifying the category with the lowest recall rate among the plurality of recall rates and the highest precision among the plurality of precision rates.

4. The image segmentation method according to claim 1, further comprising: Multiple confidence levels are obtained, where each confidence level represents a confidence level for one of the multiple remaining categories. Each of the multiple confidence levels is compared with a confidence threshold. Based on the comparison of the multiple confidence levels with the confidence threshold, the remaining categories corresponding to the confidence levels that are less than the confidence threshold are classified as unlabeled categories.

5. The method according to claim 4, further comprising: In the segmented image, the segmented categories other than the unlabeled categories and the categories to be discarded are labeled.

6. The image segmentation method of claim 1, wherein, The trained image segmentation model is trained based on a set of sample images including the multiple categories, and the trained image segmentation model is trained based on the recall rate changes of each of the multiple categories during the training phase.

7. The image segmentation method of claim 6, wherein, The recall variation includes the difference in recall for each category across two iterations during the training phase.

8. The image segmentation method of claim 6, wherein, The multiple categories in the sample image set include imbalanced categories.

9. A method for training an image segmentation model, the method comprising: Obtain a set of sample images, the set of sample images including targets of multiple categories, wherein the multiple categories include imbalanced categories; Using the image segmentation model, the recall rate change of each of the multiple categories is obtained; Based on the recall rate changes of each category, select the category whose weight needs to be adjusted; Adjust the weights of the categories whose weights to be adjusted, and train the image segmentation model to obtain a trained image segmentation model; The process of obtaining the recall rate change for each of the multiple categories includes: Obtain the recall rate for each category in each iteration across multiple iterations during the training phase; and For each of the plurality of categories, determine the corresponding difference in recall rate for that category across the multiple iterations during the training phase; and The selection of the category whose weight to be adjusted based on the recall rate change of each category includes: selecting the category based on the difference determined for each of the plurality of categories.

10. The method of claim 9, wherein obtaining the recall change for each of the plurality of categories comprises: At predetermined training rounds intervals, obtain the recall rate of each category in the two previous iterations prior to the current iteration; as well as The difference in recall between the two previous iterations is defined as the recall change.

11. The method of claim 10, wherein, Based on the recall changes of each category, select the categories whose weights need to be adjusted, including: Determine the difference in recall rate for each category to obtain multiple differences; The multiple differences are sorted to form a first difference sequence and a second difference sequence; Select at least one first difference from the first difference sequence; Select at least one second difference from the second difference sequence; and The category corresponding to the at least one first difference is taken as the category of the first weight to be adjusted, and the category corresponding to the at least one second difference is taken as the category of the second weight to be adjusted.

12. The method of claim 11, wherein, Based on the recall changes of each category, select the categories whose weights need to be adjusted, including: Among the categories of the first and second weights to be adjusted, the category with the larger difference is determined as the category to be given more weight; and Among the first and second weight categories to be adjusted, the category with the smaller difference is determined as the category whose weight needs to be reduced.

13. The method of claim 12, wherein adjusting the weight of the category whose weight is to be adjusted comprises: Increase the weight of the category to be weighted by a predetermined value; as well as Reduce the weight of the category whose weight needs to be reduced by the predetermined value. Wherein, the predetermined value represents the rate of change of the weight of the category to be adjusted.

14. The training method according to claim 9, further comprising: The first error is obtained after the sample image set has been processed by the image segmentation model. The trained image segmentation model is used to process the image to be processed, and the second error corresponding to the image to be processed is obtained. In response to determining that the difference between the first error and the second error is less than the error change threshold, the training of the image segmentation model is completed.

15. An electronic device comprising: At least one processor; as well as At least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions, when executed by the at least one processor, causing the electronic device to perform the method according to any one of claims 1 to 8.

16. An electronic device comprising: At least one processor; as well as At least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions, when executed by the at least one processor, causing the electronic device to perform the method according to any one of claims 9 to 14.

17. A computer program product tangibly stored on a non-volatile computer-readable medium and comprising machine-executable instructions that, when executed, cause a machine to perform the steps of the method according to any one of claims 1 to 8.

18. A computer program product tangibly stored on a non-volatile computer-readable medium and comprising machine-executable instructions that, when executed, cause a machine to perform the steps of the method according to any one of claims 9 to 14.