Model training method, image segmentation method, terminal device, and storage medium

By customizing the target size and network depth, and combining reparameter blocks with multi-CPU and multi-GPU parallel processing, the U-Net image segmentation algorithm is optimized, solving the inefficiency problem in small target segmentation tasks and achieving efficient image segmentation and large-scale data processing.

CN116824289BActive Publication Date: 2026-06-23SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI
Filing Date
2022-03-18
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing image segmentation algorithms are inefficient when processing images of different resolutions and target sizes, especially for small target segmentation tasks, and do not support parallel processing with multiple CPUs and GPUs, resulting in slow processing speeds.

Method used

By customizing the target size and network depth, constructing convolutional networks using reparameter blocks, and combining multi-GPU and multi-CPU parallel processing, the U-Net image segmentation algorithm is optimized to achieve a pipelined processing flow.

Benefits of technology

It improves the processing speed of small object segmentation tasks, reduces memory usage, supports large-scale image analysis, is suitable for cell segmentation of TB-level or even PB-level data, and improves the efficiency of computing resource utilization.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116824289B_ABST
    Figure CN116824289B_ABST
Patent Text Reader

Abstract

The application provides a model training method, an image segmentation method, a terminal device and a storage medium. The model training method comprises: acquiring an image to be segmented; acquiring a target size to be segmented in the image to be segmented; training a target segmentation model by using the target size as a model parameter, to obtain a network model for segmenting an image target of the target size. In this way, the model training method is suitable for automatic model generation for small target segmentation tasks, and supports custom target size for an image to be processed.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image processing technology, and in particular to a model training method, an image segmentation method, a terminal device, and a storage medium. Background Technology

[0002] With the development of bioimaging technology, current techniques can perform micron-level imaging of whole-brain samples from rats and monkeys. Segmenting each individual cell in the imaging data can provide important information for studies such as cell category analysis, morphological analysis, and distribution statistics.

[0003] In deep learning, convolutional neural networks (CNNs or ConvNets) are a class of artificial neural networks most commonly used to analyze visual images. They are also known as translation-invariant or space-invariant artificial neural networks, based on a shared-weight structure of convolutional kernels or filters that slide along the input features and provide translation-equivalent responses, i.e., feature maps.

[0004] Image segmentation algorithms can be divided into semantic segmentation and instance segmentation. Semantic segmentation classifies all pixels in an image according to semantic categories, while instance segmentation requires further differentiation between different individuals. For cell image analysis tasks, if the cell density is low and the cells are spaced apart, the semantic segmentation result is approximately the same as the instance segmentation result. If the cell density is high and many cells are closely packed together, then an instance segmentation algorithm is required.

[0005] However, the resolution of different input images and the size of the target to be segmented in the image may vary, which will have a certain impact on the performance of the segmentation algorithm. General segmentation algorithms such as Cellpose assume the size of the target to be segmented. When inputting the image, they first predict the size of the target and then scale the input image to the target size assumed during model training. When the target in the input image is small, the output will be magnified, which will increase the amount of data to be processed exponentially. Summary of the Invention

[0006] This application provides a model training method, an image segmentation method, a terminal device, and a storage medium.

[0007] This application provides a model training method, the model training method comprising:

[0008] Obtain the image to be segmented;

[0009] Obtain the size of the target to be segmented in the image to be segmented;

[0010] Using the target size as a model parameter, the target segmentation model is trained to obtain a network model for segmenting image targets of the target size.

[0011] The model training method further includes, after using the target size as model parameters, the following:

[0012] Obtain the training dataset;

[0013] The image targets in the training dataset are adjusted to the target size, and then the training dataset is input into the target segmentation model for training.

[0014] The model training method further includes:

[0015] Get custom parameters input by the user;

[0016] Set the number of network channels and / or network depth of the target segmentation model according to the custom parameters.

[0017] This application also provides an image segmentation method, the image segmentation method comprising:

[0018] Obtain a target segmentation model, wherein the target segmentation model is trained by the model training method described above;

[0019] The image to be segmented is input into the target segmentation model to obtain the semantic segmentation image of the image to be segmented;

[0020] The semantic segmentation image is used to output the instance segmentation result of the image to be segmented.

[0021] The step of using the semantic segmentation image to output the instance segmentation result of the image to be segmented includes:

[0022] The semantic segmentation image is used to obtain the semantic segmentation result and gradient vector field;

[0023] The mask for recovering the instance segmentation result is restored based on the semantic segmentation result and the gradient vector field;

[0024] Perform any one or more of the following processes on the mask of the instance segmentation result: remove masks with quality below a preset threshold, fill mask holes, and write the mask to disk;

[0025] The instance segmentation result of the image to be segmented is obtained based on the processed mask.

[0026] The following steps are involved in using multi-GPU parallel processing:

[0027] The semantic segmentation image is used to obtain the semantic segmentation result and gradient vector field, and the mask of the instance segmentation result is recovered based on the semantic segmentation result and the gradient vector field;

[0028] The following steps are used for parallel processing using multiple CPUs:

[0029] Remove masks with quality below a preset threshold, fill mask holes, and write the mask to disk.

[0030] The image segmentation method further includes:

[0031] The CPU node main program scans the unprocessed mask obtained by the GPU node main program and assigns the unprocessed mask to the CPU node's child thread for processing.

[0032] Write the mask processed by the sub-thread of the CPU node to disk.

[0033] Before inputting the image to be segmented into the target segmentation model to obtain the semantic segmentation image of the image to be segmented, the image segmentation method further includes:

[0034] The convolutions and their parallel convolutional branches and identity mapping branches in the heavily parameterized blocks during the training of the target segmentation model are converted into an equivalent convolution.

[0035] This application also provides a terminal device, which includes a processor and a memory, wherein the memory stores program data, and the processor is used to execute the program data to implement the model training method and / or image segmentation method as described above.

[0036] This application also provides a computer-readable storage medium for storing program data, which, when executed by a processor, is used to implement the above-described model training method and / or image segmentation method.

[0037] The beneficial effects of this application are: the terminal device acquires an image to be segmented; the size of the target to be segmented in the image to be segmented is acquired; the target size is used as a model parameter to train a target segmentation model to obtain a network model for segmenting image targets of the specified size. Through the above method, the model training method is suitable for automatic model generation for small target segmentation tasks, and supports custom target sizes for the images to be processed. Attached Figure Description

[0038] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. Wherein:

[0039] Figure 1 This is a flowchart illustrating an embodiment of the model training method provided in this application;

[0040] Figure 2 This is a schematic flowchart of an embodiment of the image segmentation method provided in this application;

[0041] Figure 3 This is a schematic diagram of the network structure of the RepVGG architecture provided in this application;

[0042] Figure 4 This is a schematic diagram of the network structure of the U-Net model provided in this application;

[0043] Figure 5 This is a schematic diagram illustrating the process of generating a vector stream representation from the instance mask provided in this application;

[0044] Figure 6 This is a schematic diagram of the processing pipeline provided in this application;

[0045] Figure 7 This is a schematic diagram of the structure of an embodiment of the terminal device provided in this application;

[0046] Figure 8 This is a schematic diagram of an embodiment of the computer-readable storage medium provided in this application. Detailed Implementation

[0047] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.

[0048] When faced with the need for large-scale 3D cell image analysis, such as TB or even PB level, general deep learning algorithms are often unable to be applied due to processing speed limitations. This invention is based on the U-Net network segmentation algorithm and has made several optimizations when applying it to large-scale images, which greatly improves its processing speed. It can be applied to large-scale image analysis, such as cell instance segmentation in whole brain imaging data of rat brain or even monkey brain.

[0049] Please see Figure 1 , Figure 1 This is a flowchart illustrating an embodiment of the model training method provided in this application.

[0050] The model training method and / or image segmentation method of this application are applied to a terminal device, which can be a server or a system consisting of a server and a local terminal working together. Accordingly, the various parts of the terminal device, such as units, sub-units, modules, and sub-modules, can be all located in the server, or they can be located separately in the server and the local terminal.

[0051] Furthermore, the aforementioned server can be either hardware or software. When the server is hardware, it can be implemented as a distributed server cluster consisting of multiple servers, or as a single server. When the server is software, it can be implemented as multiple software programs or software modules, such as software or software modules used to provide distributed servers, or as a single software program or software module; no specific limitation is made here. In some possible implementations, the model training method and / or image segmentation method of the embodiments of this application can be implemented by a processor calling computer-readable instructions stored in memory.

[0052] Specifically, such as Figure 1 As shown, the model training method in this embodiment of the application specifically includes the following steps:

[0053] Different input images have varying resolutions and the size of the target to be segmented, which can affect the performance of the segmentation algorithm. General segmentation algorithms like Cellpose assume the size of the target to be segmented, making a target size prediction upon input and then scaling the input image to the assumed target size during model training. When the target in the input image is small, the output is magnified, significantly increasing the amount of data to be processed. To address this, this application proposes an automatic model generation method suitable for small target segmentation tasks, as detailed in steps 11 to S13:

[0054] Step S11: Obtain the image to be segmented.

[0055] Step S12: Obtain the size of the target to be segmented in the image to be segmented.

[0056] In this process, the terminal device acquires the size of the target to be segmented in the image to be segmented. Taking a cell image as an example, the terminal device needs to acquire the size of the cells in the cell image.

[0057] Specifically, the terminal device can obtain the size of user-annotated cells through manual annotation or directly obtain the target size input by the user. Alternatively, the terminal device can utilize other pre-trained image recognition models to identify cells in the image to be segmented and obtain the target size of the cells based on the identified bounding boxes.

[0058] Step S13: Use the target size as a model parameter to train the target segmentation model to obtain a network model for segmenting image targets of different sizes.

[0059] In this process, the terminal device uses the size of the target to be segmented as the model parameter of the target segmentation model to be trained. That is, the target segmentation model trained according to the target size can directly segment targets of the same size without the need for image scaling or other operations.

[0060] Specifically, the terminal device can also customize the number of network channels and network depth in the model through parameters, and the program generates the corresponding network model based on the customized parameters. Simultaneously, the terminal device specifies a preset target size and training dataset. The default dataset consists of publicly available image datasets, to which users can add their own training data. When processing input images, the program adjusts the target size in the dataset to the preset size for training, thus obtaining a model suitable for the data to be predicted, avoiding situations where the improvement in results is not significant while the computational load increases.

[0061] Furthermore, the network design for small target segmentation tasks can be simplified to a certain extent. For example, small targets do not require a large receptive field, so the network depth can be reduced. This can increase the network speed and reduce memory usage without affecting the results.

[0062] This application incorporates the ability to preset target size, network channels, and network depth in the model, preventing situations where improvements are minimal or exponentially increased network speed. After specifying these parameters, the network will be trained using a pre-defined training dataset, and users can simultaneously add their own training data.

[0063] In this embodiment, the terminal device acquires an image to be segmented; acquires the size of the target to be segmented in the image; and uses the target size as a model parameter to train a target segmentation model to obtain a network model for segmenting image targets of the specified size. Through this method, the model training approach is suitable for automatic model generation for small target segmentation tasks, and supports custom target sizes and network structures for the image to be processed.

[0064] The main drawbacks of existing technologies also include slow processing speed and lack of support for parallel operation using multiple CPUs and GPUs on computing clusters. This application makes appropriate optimizations to address several factors that slow down processing speed; please refer to the following for details. Figure 2 , Figure 2 This is a flowchart illustrating an embodiment of the image segmentation method provided in this application.

[0065] Specifically, such as Figure 2 As shown, the model training method in this embodiment of the application specifically includes the following steps:

[0066] Step S21: Obtain the target segmentation model.

[0067] The target segmentation model can be trained using the model training method described in the above embodiments, and the training process will not be repeated here.

[0068] Specifically, in order to further improve the performance of the target segmentation model, this application also proposes a method to replace the convolution operation with a reparameter block, which can effectively save GPU memory and improve the processing speed of the target segmentation model.

[0069] Specifically, RepVGG's work proposed a method of multi-branch model training and single-branch model inference, which improves running speed and reduces memory usage while ensuring high model performance. For example... Figure 3 As shown, Figure 3 This is a RepVGG architecture, where the left side is the ResNet module, the middle side is the RepVGG module during training, and the right side is the RepVGG module during inference.

[0070] The training-time reparameter block contains two 3x3 convolutional layers. Each 3x3 convolution has a parallel 1x1 convolutional branch, an identity mapping branch, and a non-linear activation function, forming a RepVGG Block. During inference, based on the additivity of convolutions, each 3x3 convolution in the training-time reparameter block transforms its parallel 1x1 convolutional branch and identity mapping branch into an equivalent 3x3 convolution. This application uses reparameter blocks to construct segmentation network models, improving model efficiency.

[0071] This allows us to simultaneously leverage the advantages of multi-branch model training (high performance) and the benefits of single-path model inference (fast speed, low memory usage). The key here clearly lies in the construction and transformation methods of this multi-branch model.

[0072] The implementation of this application involves adding parallel 1x1 convolutional branches and identity mapping branches to each 3x3 convolutional layer during training, forming a RepVGG Block. This design is inspired by ResNet, but the difference is that ResNet adds a branch every two or three layers, while this application adds one to every layer.

[0073] In this application embodiment, referring to the work of RepVGG, this application uses reparameter blocks in the convolution operation of the network. Compared with the original convolution operation, the reparameter blocks occupy less memory, have less computation, and run faster when making predictions.

[0074] Step S22: Input the image to be segmented into the target segmentation model to obtain the semantic segmentation image of the image to be segmented.

[0075] Step S23: Output the instance segmentation result of the image to be segmented using the semantic segmentation image.

[0076] This application is based on the U-Net image segmentation algorithm, which has excellent performance in biomedical image segmentation. U-Net, or U-shaped network, is a fully convolutional network with the following structure: Figure 4 As shown.

[0077] The first half of the network consists of convolution and pooling operations, while the second half consists of convolution and up-convolution. Skip-connections are also added to connect intermediate results at the same resolution.

[0078] Based on the U-Net image segmentation algorithm, this application, referencing the work of Cellpose, adds gradient mapping channels in the X and Y directions to the network output, such as... Figure 5 As shown, this is used to restore the instance segmentation result. The gradient here is the vector field gradient within the target. The vector field can be understood as the heat field within each cell, assuming a heat source point. Simultaneously, it is applied to a 3D image segmentation method: independently processing the two-dimensional planes in the XY, YZ, and XZ directions, then averaging the network output to obtain the 3D vector gradient, and finally running post-processing to restore the instance segmentation result.

[0079] This application also adds parallel interfaces for CPU and GPU computing, which can run in parallel on multiple specified GPUs and CPUs, making it easier for the program to be expanded when applied to large-scale data.

[0080] Specifically, large-scale data processing is often performed on computing clusters, requiring data processing methods to have the ability to process in parallel on multiple GPUs and CPUs. This invention uses the ProcessPoolExecutor from the Python standard library concurrent.futures to perform parallel task processing based on serial algorithms. Each subprocess's task can specify the GPU to use through parameters, thereby achieving parallel execution of tasks on multiple CPUs and GPUs.

[0081] By leveraging the parallel processing capabilities of multiple CPUs and GPUs, this application also proposes a pipelined functionality, which pipelines the entire data processing flow to increase the efficiency of computing resource utilization.

[0082] Specifically, the output from the input to the instance segmentation result can be divided into two parts. First, the output of the network model yields the semantic segmentation result and gradient vector field. Second, the post-processing part of the model involves recovering the instance segmentation result mask from the network output, removing low-quality masks, filling mask holes, and writing the mask to disk.

[0083] The computation of the network model output and mask recovery parts is more efficient using the GPU, while the remaining parts mainly use the CPU. To decouple the GPU and CPU processing parts and fully utilize computing resources, this application designs as follows: Figure 6 In the processing pipeline, after the GPU node main process receives the unprocessed mask, it hands the intermediate results to an asynchronous saver to write them to disk in .npy format. The asynchronous saver allows the GPU main process to immediately return to continue processing the next input. The CPU node main program scans the cache folder and assigns the newly written intermediate results to child threads in the thread pool for processing. After receiving the intermediate results, the child threads execute the post-processing flow calculated on the CPU and then write the final results to disk.

[0084] Specifically, the terminal device performs mask recovery using gradient flow tracing. The output of the neural network after being divided into blocks is a set of three graphs: horizontal gradient, vertical gradient, and pixel probability. The next step is to recover the mask from these graphs.

[0085] First, the terminal device thresholds the pixel probability map, considering only pixels above a threshold of 0.5. For each pixel, the terminal device runs a dynamic system starting from that pixel location, following the spatial derivatives specified by the horizontal and vertical gradient maps. The terminal device uses finite differences with a step size of 1, noting that it iterates 200 times per pixel, moving one step along the gradient direction at the nearest grid location in each iteration. After convergence, the pixels can be easily clustered based on their final location within the grid.

[0086] For robustness, our terminal device also extends clustering along regions where pixels converge at high density. For example, if a high-density peak occurs at location (Xy), we iteratively cluster eight vacant locations with at least three convergent pixels until all locations around the clustered region are clustered together, and until all locations around the clustered region have created a region with a very low gradient where it thinks the center should be.

[0087] Those skilled in the art will understand that, in the above-described method of the specific implementation, the order in which each step is written does not imply a strict execution order and does not constitute any limitation on the implementation process. The specific execution order of each step should be determined by its function and possible internal logic.

[0088] In this embodiment, the terminal device can apply a more efficient deep learning network model to the data to be analyzed by specifying parameters such as network structure and preset target size, reducing operations such as scaling and convolution that have little effect on accuracy but are time-consuming, thereby improving model inference efficiency; using heavy parameter blocks to build convolutional networks reduces memory usage and improves inference computation efficiency; it supports parallel operation of inference programs on multiple CPUs and GPUs, allowing deployment on more computing resources; by analyzing the program's execution flow, execution time, and computing resources, the processing flow is segmented and pipelined, increasing the utilization rate of computing resources and further reducing program execution time.

[0089] To implement the model training method and / or image segmentation method of the above embodiments, this application also proposes a terminal device, which can be found in detail below. Figure 7 , Figure 7 This is a schematic diagram of the structure of an embodiment of the terminal device provided in this application.

[0090] The terminal device 400 of this application embodiment includes a memory 41 and a processor 42, wherein the memory 41 and the processor 42 are coupled together.

[0091] The memory 41 is used to store program data, and the processor 42 is used to execute the program data to implement the model training method and / or image segmentation method described in the above embodiments.

[0092] In this embodiment, processor 42 can also be referred to as a CPU (Central Processing Unit). Processor 42 may be an integrated circuit chip with signal processing capabilities. Processor 42 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. The general-purpose processor can be a microprocessor, or processor 42 can be any conventional processor.

[0093] To implement the model training method and / or image segmentation method of the above embodiments, this application also provides a computer-readable storage medium, such as... Figure 8 As shown, the computer-readable storage medium 500 is used to store program data 51, which, when executed by a processor, is used to implement the model training method and / or image segmentation method as described in the above embodiments.

[0094] This application also provides a computer program product, wherein the computer program product includes a computer program operable to cause a computer to perform the model training method and / or image segmentation method as described in the embodiments of this application. The computer program product may be a software installation package.

[0095] The model training method and / or image segmentation method described in the above embodiments of this application, when implemented as software functional units and sold or used as independent products, can be stored in a device, such as a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0096] The above description is merely an embodiment of this application and does not limit the patent scope of this application. Any equivalent structural or procedural transformations made using the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.

Claims

1. An image segmentation method, characterized in that, The image segmentation method includes: Obtain a target segmentation model, wherein the target segmentation model is trained by a model training method; The image to be segmented is input into the target segmentation model to obtain the semantic segmentation image of the image to be segmented; The semantic segmentation image is used to output the instance segmentation result of the image to be segmented; The model training method includes: acquiring an image to be segmented; acquiring the size of the target to be segmented in the image to be segmented; and using the target size as a model parameter to train a target segmentation model to obtain a network model for segmenting image targets of the target size. The step of outputting the instance segmentation result of the image to be segmented using the semantic segmentation image includes: obtaining a semantic segmentation result and a gradient vector field using the semantic segmentation image; recovering a mask of the instance segmentation result based on the semantic segmentation result and the gradient vector field; performing any one or more of the following processing on the mask of the instance segmentation result: removing masks with quality below a preset threshold, filling mask holes, and writing the mask to disk; and obtaining the instance segmentation result of the image to be segmented based on the processed mask. The process utilizes multi-GPU parallel processing in the following steps: obtaining semantic segmentation results and gradient vector fields from the semantic segmentation image, and recovering the mask of the instance segmentation results based on the semantic segmentation results and the gradient vector fields. The process involves using multiple CPUs to process the following steps in parallel: removing masks with quality below a preset threshold, filling mask holes, and writing the mask to disk. The image segmentation method further includes: scanning the unprocessed mask obtained by the GPU node main program using the CPU node main program, and assigning the unprocessed mask to the sub-thread of the CPU node for processing; and writing the mask processed by the sub-thread of the CPU node to disk.

2. The image segmentation method according to claim 1, characterized in that, Before inputting the image to be segmented into the target segmentation model to obtain the semantic segmentation image of the image to be segmented, the image segmentation method further includes: The convolutions and their parallel convolutional branches and identity mapping branches in the heavily parameterized blocks during training of the target segmentation model are converted into an equivalent convolution.

3. The image segmentation method according to claim 1, characterized in that, After using the target size as a model parameter, the model training method further includes: Obtain the training dataset; The image targets in the training dataset are adjusted to the target size, and then the training dataset is input into the target segmentation model for training.

4. The image segmentation method according to claim 1 or 3, characterized in that, The model training method further includes: Get custom parameters input by the user; Set the number of network channels and / or network depth of the target segmentation model according to the custom parameters.

5. A terminal device, characterized in that, The terminal device includes a processor and a memory, the memory storing program data, and the processor executing the program data to implement the image segmentation method as described in any one of claims 1-4.

6. A computer-readable storage medium, characterized in that, The computer-readable storage medium is used to store program data, which, when executed by a processor, is used to implement the image segmentation method according to any one of claims 1-4.