Breast lesion segmentation device, model training method and electronic device
By introducing self-attention units and boundary loss training into the U-Net network, the problem of inaccurate lesion segmentation in breast ultrasound images was solved, achieving higher accuracy in breast lesion segmentation and supporting early diagnosis of breast cancer.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HANGZHOU INNOVATION RES INST OF BEIJING UNIV OF AERONAUTICS & ASTRONAUTICS
- Filing Date
- 2022-02-14
- Publication Date
- 2026-06-23
AI Technical Summary
Existing breast ultrasound image lesion region segmentation networks still need to improve segmentation accuracy when faced with problems such as noise and artifacts, especially in the early diagnosis of breast cancer, where the problems of blurred lesion boundaries and inaccurate segmentation have not been effectively solved.
A U-Net network combined with self-attention units is employed, and training using self-attention mechanism and boundary loss improves the accuracy of lesion segmentation. The self-attention unit connects downsampling and upsampling units in the U-Net network, utilizing local fine-grained and global coarse-grained feature learning, combined with boundary loss training on lesion contour images, to enhance attention to lesion edge features.
It improves the accuracy of breast lesion segmentation, effectively suppresses background noise interference, enhances the learning of lesion edge features, and improves the segmentation accuracy for early diagnosis of breast cancer.
Smart Images

Figure CN114494230B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image processing technology, and more specifically, to a breast lesion segmentation device, a model training method, and an electronic device. Background Technology
[0002] Breast cancer is a common malignant tumor that occurs in women. While its malignancy was originally not high, the age at which women are diagnosed with breast cancer is getting younger, and the disease is progressing more rapidly, making it a major killer. Early detection, diagnosis, and treatment of breast cancer can help improve patients' annual survival rate and quality of life. The increasing sophistication of magnetic resonance imaging (MRI) technology has played a valuable role in the early diagnosis and treatment of breast cancer. In particular, accurate segmentation of breast lesions in ultrasound images is crucial for the early diagnosis and treatment of breast cancer.
[0003] Segmentation of breast lesions in ultrasound images has been extensively studied in the field. With the continuous development of computer vision, numerous deep learning-based image segmentation methods have emerged, and attempts to apply these methods to medical images are increasingly common. For example, many traditional semantic segmentation networks such as FCN, SegNet, and U-Net have been widely used for segmenting ultrasound lesions.
[0004] Although existing segmentation networks have improved the accuracy of lesion segmentation in breast ultrasound images to some extent, the low imaging quality of ultrasound images, such as severe noise and artifacts, and blurred lesion boundaries, still limits the segmentation accuracy of neural networks. In other words, the segmentation accuracy of current segmentation networks still needs to be improved. Summary of the Invention
[0005] This application provides a breast lesion segmentation device, a model training method, an electronic device, and a readable storage medium, which can obtain a highly accurate breast lesion segmentation mask by utilizing a breast lesion segmentation device that includes a self-attention unit trained by boundary loss based on the output image of the self-attention unit and the corresponding sample contour image.
[0006] The embodiments of this application can be implemented as follows:
[0007] In a first aspect, embodiments of this application provide a breast lesion segmentation device, the device comprising a connected U-Net network and at least one self-attention unit, the U-Net network comprising multiple downsampling units and multiple upsampling units.
[0008] The U-Net network is used to obtain a segmentation mask for breast lesions based on the breast ultrasound image to be analyzed.
[0009] Each of the self-attention units is used to obtain a third target feature map based on the first target feature map output by the connected downsampling unit and the second target feature map input to the connected upsampling unit, and input the third target feature map to the connected upsampling unit; the downsampling unit and the connected upsampling unit connected to any one of the self-attention units are located in the same layer; each of the self-attention units is trained based on the boundary loss of the sample contour image and the output image of the self-attention unit.
[0010] Secondly, embodiments of this application provide a model training method for training a breast lesion segmentation device, the method comprising:
[0011] Obtain multiple sample breast ultrasound images and corresponding sample mask images and sample lesion contour images for each of the multiple sample breast ultrasound images;
[0012] The sample breast ultrasound image is input into a preset neural network model to obtain at least one third target feature map generated by the neural network model and a mask image to be analyzed output by the neural network model. The neural network model includes a U-Net network and at least one self-attention unit. The U-Net network includes multiple downsampling units and multiple upsampling units. Each self-attention unit is used to obtain a third target feature map based on the self-attention mechanism, according to the first target feature map output by the connected downsampling unit and the second target feature map input to the connected upsampling unit, and input the third target feature map to the connected upsampling unit. The downsampling unit and the connected upsampling unit of any self-attention unit are located in the same layer.
[0013] The total loss is calculated based on the third target feature map, the mask image to be analyzed, the corresponding sample mask image, and the corresponding sample lesion contour image. The total loss includes the boundary loss between the third target feature map and the corresponding sample lesion contour image.
[0014] The neural network model is adjusted based on the total loss to train the breast lesion segmentation device.
[0015] Thirdly, embodiments of this application provide an electronic device, including a processor and a memory, wherein the memory stores machine-executable instructions that can be executed by the processor, and the processor can execute the machine-executable instructions to implement the breast lesion segmentation device described in the foregoing embodiments.
[0016] Fourthly, embodiments of this application provide a readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the breast lesion segmentation device as described in the foregoing embodiments.
[0017] The breast lesion segmentation device, model training method, and electronic device provided in this application embodiment include a U-Net network connected to at least one self-attention unit. The U-Net network comprises multiple downsampling units and multiple upsampling units. The U-Net network is used to obtain a breast lesion segmentation mask based on the breast ultrasound image to be analyzed. Each attention unit is used to obtain a third target feature map based on a self-attention mechanism, using a first target feature map output by the connected downsampling unit and a second target feature map input to the connected upsampling unit, and then inputting the third target feature map into the connected upsampling unit. The downsampling unit and the connected upsampling unit connected to any self-attention unit are located in the same layer. Each attention unit is trained based on the boundary loss between the sample contour image and the output image of the self-attention unit. Thus, a breast lesion segmentation device including a self-attention unit that pays more attention to the edge features of the lesion can obtain highly accurate lesion segmentation results. Attached Figure Description
[0018] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0019] Figure 1 A block diagram illustrating an electronic device provided in an embodiment of this application;
[0020] Figure 2 This is one of the block diagrams of the breast lesion segmentation device provided in the embodiments of this application;
[0021] Figure 3 This is a schematic diagram of the structure of the breast lesion segmentation device provided in the embodiments of this application;
[0022] Figure 4 A second block diagram of the breast lesion segmentation device provided in the embodiments of this application;
[0023] Figure 5 for Figure 4 A schematic diagram of the self-attention unit in the diagram;
[0024] Figure 6 A schematic diagram illustrating the process of obtaining a third target feature map for a self-attention unit;
[0025] Figure 7 A schematic diagram of the block compression operation provided in an embodiment of this application;
[0026] Figure 8 A schematic diagram of the processing procedure for the attention calculation subunit;
[0027] Figure 9 A schematic flowchart illustrating the model training method provided in this application embodiment;
[0028] Figure 10 A schematic diagram illustrating the acquisition of a sample mask image and a sample lesion contour image provided in an embodiment of this application;
[0029] Figure 11 for Figure 9 A flowchart illustrating the sub-steps included in step S230;
[0030] Figure 12 A schematic diagram illustrating the process of obtaining boundary loss;
[0031] Figure 13 This is a block diagram of the model training device provided in an embodiment of this application.
[0032] Icons: 100 - Electronic device; 110 - Memory; 120 - Processor; 130 - Communication unit; 200 - Breast lesion segmentation device; 210 - U-Net network; 211 - Downsampling network; 213 - Upsampling network; 220 - Self-attention unit; 221 - Preprocessing subunit; 222 - Attention calculation subunit; 223 - Processing subunit; 230 - Preprocessing unit; 300 - Model training device; 310 - Image acquisition module; 320 - Training module. Detailed Implementation
[0033] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. The components of the embodiments of this application described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.
[0034] Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely to illustrate selected embodiments of the application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.
[0035] It should be noted that relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0036] The following detailed description of some embodiments of this application is provided in conjunction with the accompanying drawings. Unless otherwise specified, the following embodiments and features can be combined with each other.
[0037] Please refer to Figure 1 , Figure 1 This is a block diagram of an electronic device 100 provided in an embodiment of this application. The electronic device 100 may be, but is not limited to, a computer, a server, etc. The electronic device 100 may include a memory 110, a processor 120, and a communication unit 130. The memory 110, processor 120, and communication unit 130 are electrically connected to each other directly or indirectly to achieve data transmission or interaction. For example, these components can be electrically connected to each other through one or more communication buses or signal lines.
[0038] The memory 110 is used to store programs or data. The memory 110 may be, but is not limited to, random access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), etc.
[0039] The processor 120 is used to read / write data or programs stored in the memory 110 and execute corresponding functions. For example, the memory 110 stores a breast lesion segmentation device and / or a model training device, which may each include at least one software function module stored in the memory 110 in the form of software or firmware. The processor 120 executes various functional applications and data processing by running the software programs and modules stored in the memory 110, such as the breast lesion segmentation device and / or model training device in the embodiments of this application, to obtain the breast lesion segmentation device, and / or accurately segment lesions from breast ultrasound images.
[0040] The communication unit 130 is used to establish a communication connection between the electronic device 100 and other communication terminals through the network, and to send and receive data through the network.
[0041] It should be understood that, Figure 1 The structure shown is only a schematic diagram of the electronic device 100. The electronic device 100 may also include components that are larger than... Figure 1 The more or fewer components shown, or having the same Figure 1 The different configurations shown. Figure 1 The components shown can be implemented using hardware, software, or a combination thereof.
[0042] Please refer to Figure 2 , Figure 2 This is one of the block diagrams of a breast lesion segmentation device 200 provided in an embodiment of this application. The breast lesion segmentation device 200 may include a connected U-Net network 210 and at least one self-attention unit 220. The U-Net network 210 is used to obtain a breast lesion segmentation mask based on the breast ultrasound image to be analyzed. The foreground portion of the breast lesion segmentation mask represents the breast lesion.
[0043] Please refer to the reference. Figure 3 , Figure 3This is a schematic diagram of the structure of the breast lesion segmentation device 200 provided in this application embodiment. The U-Net network 210 may include a downsampling network 211 and an upsampling network 213. The downsampling network 211 includes multiple downsampling units, which are used to perform convolution and downsampling on the breast ultrasound image to obtain multiple first feature maps. The upsampling network 213 includes multiple upsampling units, which are used to obtain some deep features through convolution and upsampling to obtain multiple second feature maps. The second feature map output by the upsampling network 213 is the breast lesion segmentation mask. Each upsampling unit generates its own output image based on the output image of the previous layer and the output image of the downsampling unit in the same layer. Same layer means that an upsampling unit and a downsampling unit are at the same layer in the U-Net network, that is, the layers are symmetrical. For example, Figure 3 The downsampling unit 2 and the upsampling unit 3 are in the same layer.
[0044] like Figure 3 As shown, each downsampling unit may include a convolutional layer, a normalization layer, and a ReLU layer (including the ReLU activation function), and some downsampling units (e.g., Figure 3 Downsampling units 2-4 in the model may also include a max-pooling layer, which is connected to the output of the previous layer. Each upsampling unit may include a convolutional layer, a normalization layer, and a ReLU layer (including the ReLU activation function). Some upsampling units (e.g., Figure 3 The upsampling units 2-4 in the middle may also include an upsampling layer, a convolutional layer, a normalization layer for layer normalization, and a ReLU layer (including the ReLU activation function). The upsampling layer may also be connected to the output of the previous layer.
[0045] Each of the self-attention units 220 is used to obtain a third target feature map based on a self-attention mechanism, according to the first target feature map output by the connected downsampling unit and the second target feature map input to the connected upsampling unit, and input the third target feature map to the connected upsampling unit so that the connected upsampling unit obtains a fourth target feature map based on the second target feature map and the third target feature map. The multiple second feature maps generated by the upsampling network 213 include the fourth target feature map. Therefore, the breast lesion segmentation mask is obtained based on at least one of the fourth target feature maps.
[0046] Each self-attention unit is connected to a downsampling unit and an upsampling unit located in the same layer. Each self-attention unit is trained based on the boundary loss of the sample contour image and the output image of that self-attention unit. The sample contour image is used to indicate boundaries, and the boundary loss corresponding to a self-attention unit is obtained based on the difference between the boundary in the sample contour image and the boundary in the output image of that self-attention unit. That is, a self-attention unit is trained based on the boundary in the output image of that self-attention unit and the boundary in the sample contour image. It can be understood that the sample contour image and the output image corresponding to a self-attention unit are obtained from the same image including the breast lesion; the sample contour image is essentially a label image of the image including the breast lesion, and includes the contour line of the breast lesion.
[0047] Optionally, the breast lesion segmentation device 200 may include one self-attention unit 220 or multiple self-attention units 220, the specific number of which can be set according to actual needs. Optionally, at most one self-attention unit 220 may be set at each cascade of the U-Net network.
[0048] As an optional implementation method, such as Figure 3 As shown, one of the self-attention units 220 is located at the lowest cascaded position in the U-net network. The following is in conjunction with... Figure 3 Taking the breast lesion segmentation device 200 as an example, which includes a self-attention unit 220 and is located at the bottom cascade position of the U-Net network, the method of obtaining the breast lesion segmentation mask and calculating the boundary loss is illustrated.
[0049] exist Figure 3 In the U-Net network, there are 4 downsampling units (i.e., Figure 3 The downsampling units 1, 2, 3, and 4 are arranged from top to bottom, along with three upsampling units (i.e., ...). Figure 3 The upsampling units 1, 2, and 3 are arranged from bottom to top in the U-Net network. A self-attention unit 220 is located at the bottom cascade position of the U-Net network, that is, the self-attention unit 220 is connected to the downsampling unit 4 in the downsampling network 211 and the upsampling unit 1 in the upsampling network 213.
[0050] Assumptions: The image output by downsampling unit 4 is the first target feature map a1, and the image output by downsampling unit 5 is the second target feature map a2. The self-attention unit 220 can then obtain the third target feature map a3 based on the first target feature map a1 and the second target feature map a2. Upsampling unit 1 can concatenate the second target feature map a2 and the third target feature map a3 to obtain a concatenated map, and then perform convolution and normalization processing on this concatenated map to obtain a fourth target feature map a4. Afterwards, upsampling unit 1 can generate another second feature map based on the fourth target feature map a4. This process continues, with the last layer's upsampling unit 4 generating a breast lesion segmentation mask.
[0051] During training, the boundary loss can be calculated based on the lesion boundary in the third target feature map a3 and the lesion boundary in the corresponding sample contour image, and then the self-attention unit 220 in the breast lesion segmentation device 200 can be trained based on the boundary loss.
[0052] Please refer to Figure 3 and Figure 4 , Figure 4 This is a second block diagram of the breast lesion segmentation device 200 provided in an embodiment of this application. In this embodiment, the breast lesion segmentation device 200 may further include at least one preprocessing unit 230.
[0053] The preprocessing unit 230 is used to sum the first target feature map and the second target feature map to obtain the image to be processed. Optionally, the preprocessing unit 230 may include a summation layer and a ReLU layer (including a ReLU activation function). The self-attention unit 220 is used to obtain the third target feature map based on the image to be processed. The self-attention unit 220 and the preprocessing unit 230 have a one-to-one correspondence; that is, one self-attention unit 220 is connected to one preprocessing unit 230.
[0054] Please refer to Figure 5 and Figure 6 , Figure 5 for Figure 4 A schematic diagram of the self-attention unit 220 in the diagram. Figure 6 This is a schematic diagram illustrating the process by which the self-attention unit 220 obtains the third target feature map. In this embodiment, the self-attention unit 220 may include a preprocessing subunit 221, an attention calculation subunit 222, and a processing subunit 223.
[0055] The preprocessing subunit 221 is used to segment the image to be processed into multiple image blocks of the same size and obtain the feature vector of each image block.
[0056] Optionally, in this embodiment, the preprocessing subunit 221 can obtain the feature vector of each image block through block compression operation and layer normalization processing.
[0057] Among them, such as Figure 7 As shown, the patch embedding operation is used to convert the original 2D image into a series of 1D patch embeddings. An embedding is a feature extracted from the original data, which is a low-dimensional vector after mapping through a neural network. As the name suggests, the entire operation is divided into two parts: patching and compression. First, patching: to better consider global information in ultrasound images, the entire image is divided into different small blocks, each called a patch. Then, compression: these patches are compressed into vectors of a certain length. These vectors are used as input for subsequent processing to consider the global information of the entire image. By performing patch compression on the image to be processed, the original feature vector of each image patch can be obtained.
[0058] Layer normalization normalizes all features of each sample (i.e., image patch), that is, it normalizes the original feature vector of each image patch obtained through patch embedding. When training the self-attention unit 220, this method makes the self-attention unit 220 more stable and acts as a regularizer. The normalization process uses Z-score normalization, calculated as follows:
[0059] The specific calculation formula is as follows:
[0060]
[0061]
[0062]
[0063] in, This represents the mean of the original feature vector for each image patch. Let represent the standard deviation of the original feature vector for each image patch, where i is the sample number and H is the total number of samples. This represents the original feature vector of image patch i. Let i represent the feature vector of image block i.
[0064] The self-attention calculation subunit 222 is used to obtain a third initial target feature map based on the feature vector of each image block according to the self-attention mechanism.
[0065] Alternatively, as a possible implementation, the focusing self-attention calculation subunit 222 can obtain the third initial target feature map based on the focusing self-attention mechanism.
[0066] Focal Self Attention is a novel self-attention mechanism. Considering that visual dependencies between nearby regions are often stronger than those between distant regions, Focal Self Attention performs fine-grained self-attention locally and coarse-grained globally. During attention, the closer a region is to the query, the finer the granularity and the more feature information it learns; conversely, the farther away, the coarser the granularity and the less feature information it learns. Compared to full self-attention, this method effectively covers the entire high-resolution feature map while introducing significantly fewer tokens in the self-attention computation. Therefore, it effectively captures both short-range and long-range visual dependencies.
[0067] The principle of focusing self-attention is as follows: Figure 8 As shown, fine-grained self-attention is used locally, while coarse-grained self-attention is used globally. For a complete input image, it can be divided into L levels for consideration. The lower the L level, the smaller the scope of consideration, and the finer the granularity; the higher the L level, the larger the scope of consideration, and the coarser the granularity. Therefore, the input image (the image targeted by the self-attention calculation, with dimensions H×W, i.e., the feature map) can be considered. ) cut into (For example, a 2x2 grid, where each grid cell is treated as a sub-window, has a total of...) Each grid is then used. Next, for each child window, sub-window pooling is performed at each level, which involves using a simple linear layer. This pooling process is used to pool each sub-window, thereby expanding the receptive field. The receptive field area corresponding to a sub-window differs at different pooling levels; that is, the area targeted by sub-window pooling varies. This process can be represented by the following formula:
[0068]
[0069]
[0070] in, This represents the image after being divided into grids at different levels. This represents the image after sub-window pooling. This represents the linear layer used for pooling operations at different L levels.
[0071] In self-attention, the query is computed only for the lowest-level token, while the value and key are obtained by concatenating the tokens from all levels of images after sub-window pooling and then computing them through a linear layer. This process can be represented as:
[0072]
[0073]
[0074]
[0075] in, It corresponds to the linear layer of Q. It corresponds to the linear layer of K. It is a linear layer corresponding to V.
[0076] Then, the output image can be obtained based on the multi-head attention mechanism.
[0077] Optionally, a positional offset is added during the final self-attention calculation, allowing the attention map to be further biased, i.e., focusing more on key regions. The calculation formula used by the attention calculation subunit when calculating the output image is as follows:
[0078]
[0079] in, Represents the query matrix. Represents the key matrix, Represents a value matrix, Indicates position offset.
[0080] In this embodiment, the attention calculation subunit 222 is specifically used for: dividing the third feature map into multiple grids of the same size, wherein the third feature map is an image determined based on the feature vector of each image block; treating each grid as a sub-window, and performing sub-window pooling at different levels for each sub-window to obtain pooling results at each level, wherein the level is positively correlated with the receptive field size and pooling window size corresponding to the sub-window pooling; calculating a query vector for each sub-window, and calculating a key vector and a value vector based on the pooling results at each level corresponding to the sub-window; and obtaining the third initial target feature map based on the query vector, key vector, and value vector corresponding to each sub-window using a multi-head attention mechanism.
[0081] For example, such as Figure 8 As shown, the input image can be divided into multiple 2×2 grids. The different levels include Level 1 and Level 2. Figure 8 Taking the gray grid as an example, sub-window pooling can be performed on the receptive field region corresponding to level 1 of the gray grid to obtain a 4×4 pooling result. Figure 8 In This represents the window size, which is the window size used when pooling child windows, and is 1×1. This indicates the number of child windows within the entire region (region size).
[0082] In sub-window pooling, the pooling level is positively correlated with the size of the receptive field and the pooling window. For example... Figure 8 As shown, at level 1, the receptive field size is 4×4 and the pooling window size is 1×1; at level 2, the receptive field size is 6×6 and the pooling window size is 2×2.
[0083] The calculation formula used to obtain the third initial target feature map is as follows:
[0084]
[0085] in, Represents the query matrix. Represents the key matrix, Represents a value matrix, Indicates position offset.
[0086] The processing subunit 223 is used to upsample the third initial target feature map to obtain the third target feature map.
[0087] Because the image to be processed has undergone block compression, that is, the image is cut into patches in the Patch Embedding operation, and each patch is compressed into a vector to take into account global information, the third initial target feature map can be obtained as follows: Figure 6 As shown, the third initial target feature map is upsampled to restore the image size, thereby obtaining the third target feature map. The size of the third target feature map is the same as the size of the image to be processed. Optionally, nearest neighbor linear interpolation can be used to restore the complete image size.
[0088] Currently, due to the presence of noise and shadows in two-dimensional breast ultrasound images, and the various shortcomings of ultrasound images such as unclear boundaries between lesion areas and background, irregular shapes of breast lesion areas, and uneven distribution within breast lesions, lesion segmentation in two-dimensional breast ultrasound images is a relatively difficult task compared to natural images.
[0089] Furthermore, breast ultrasound images contain many pixels outside the lesion area, which resemble the appearance of breast lesions. However, the surrounding images of these lesion-like areas differ significantly from the surrounding images of the actual lesion. Combining information from the surrounding area of the lesion can provide long-term non-local feature learning for the segmentation of ultrasound-guided breast lesions, thereby effectively filtering out false lesions in the segmentation results. Most previous solutions expanded the receptive field and learned global information through operations such as dilated convolution and pooling. However, these methods lose local information while acquiring global information, which is crucial for accurate lesion segmentation.
[0090] Furthermore, while self-attention mechanisms can be used to consider global information, these methods often fuse global features in a fine-grained manner. For the task of ultrasound lesion segmentation, indiscriminate fusion of global features introduces noise and artifacts from the background region, which interferes with the network's learning of lesion features.
[0091] In this embodiment, a self-attention unit based on focused self-attention can capture the fine features of the lesion region using local fine-grained self-attention, while simultaneously using global coarse-grained self-attention to obtain global feature map information of the area surrounding the lesion and suppress noise interference from the background region. Furthermore, during the training of the self-attention unit, the actual contour information of the lesion is used to constrain the attention unit, making it pay more attention to the features of the lesion edge, thereby achieving better segmentation results.
[0092] Please refer to Figure 9 , Figure 9 This is a schematic flowchart of the model training method provided in an embodiment of this application. The aforementioned breast lesion segmentation device 200 can be trained using this model training method. The method can be applied to the aforementioned electronic device 100. The method may include steps S210 to S240.
[0093] Step S210: Obtain multiple sample breast ultrasound images and corresponding sample mask images and sample lesion contour images for each of the multiple sample breast ultrasound images.
[0094] The foreground region in the sample mask image represents the breast lesion region, and the contour image of the sample lesion shows the outline of the breast lesion region.
[0095] Please refer to Figure 10 , Figure 10 This is a schematic diagram illustrating the acquisition of sample mask images and sample lesion contour images provided in an embodiment of this application. The acquisition process can be as follows: 1. First, collect ultrasound cross-sectional images including breast lesions as raw breast ultrasound images; 2. Then, ask a doctor to outline the contours of each lesion in each raw breast ultrasound image; 3. Combine the contour lines to generate a binary mask with the lesion as the foreground, as the raw mask image; 4. Use the Canny edge detection algorithm to obtain the contour lines of the lesion region edges to obtain the raw lesion contour image. The sizes of the above three types of images (including raw breast ultrasound images, raw mask images, and raw lesion contour images) can be unified to a preset size (e.g., 512×512) to obtain multiple sample breast ultrasound images and their corresponding sample mask images and sample lesion contour images, i.e., to obtain the training dataset.
[0096] Step S220: Input the sample breast ultrasound image into a preset neural network model to obtain at least one third target feature map generated by the neural network model and the mask image to be analyzed output by the neural network model.
[0097] The architecture of the neural network model can be pre-defined. The architecture of the neural network model can be as follows: Figure 3 As shown, the neural network model includes a U-Net network and at least one self-attention unit. The U-Net network includes multiple downsampling units and multiple upsampling units. Each self-attention unit is used to obtain a third target feature map based on a self-attention mechanism, according to a first target feature map output by the connected downsampling unit and a second target feature map input to the connected upsampling unit, and then inputs the third target feature map to the connected upsampling unit. The downsampling unit and the connected upsampling unit connected to any self-attention unit are located in the same layer.
[0098] With a training dataset available, the neural network model can be trained using this dataset to obtain the breast lesion segmentation device 200. During training, the sample breast ultrasound images can be input into the neural network model to obtain the third target feature map generated by each attention unit in the neural network model and the mask image to be analyzed output by the neural network model.
[0099] Step S230: Calculate the total loss based on the third target feature map, the mask image to be analyzed, the corresponding sample mask image, and the corresponding sample lesion contour image.
[0100] The total loss includes the boundary loss of each third target feature map and the corresponding sample lesion contour image. This enhances the learning of lesion edge features by the self-attention unit.
[0101] Please refer to Figure 11 , Figure 11 for Figure 9 A flowchart illustrating the sub-steps included in step S230. In this embodiment, step S230 may include sub-steps S231 to S234.
[0102] Sub-step S231: Process each of the third target feature maps to obtain a single-channel feature map, and downsample the sample lesion contour image corresponding to the sample breast ultrasound image to the size of the single-channel feature map to obtain the sample feature map.
[0103] Sub-step S232: Calculate the total boundary loss based on the single-channel feature map corresponding to each group and the sample feature map.
[0104] like Figure 12 As shown, for each of the third target feature maps (i.e., feature maps output by the attention unit), channel compression and sigmoid functions are applied to obtain a single-channel feature map. The corresponding sample lesion contour image is then downsampled to the size of the single-channel feature map to obtain a sample feature map. Then, based on the corresponding single-channel feature map and the sample feature map, boundary loss is calculated. Afterwards, the boundary losses corresponding to each attention unit are summed to obtain the total boundary loss.
[0105] Sub-step S233: Based on the mask image to be analyzed and the corresponding sample mask image, calculate the cross-entropy loss and accuracy loss.
[0106] Sub-step S234: Calculate the total loss based on the total boundary loss, cross-entropy loss, and accuracy loss.
[0107] The total loss consists of three parts: cross-entropy loss, dice loss, and boundary loss. The formulas for calculating each loss are as follows:
[0108]
[0109]
[0110]
[0111]
[0112] in, This indicates the lesion region determined based on the mask image to be analyzed, i.e., the lesion region obtained by the neural network model; This indicates the lesion area determined based on the sample mask image; express and The area of overlapping lesions, |Y| represents the area of the lesion region predicted by the neural network model, and |Y| represents the area of the lesion region under the true label. P represents the lesion boundary contour map under the true label, i.e., the sample lesion contour image; P represents the above single-channel feature map, i.e., the lesion boundary contour map obtained after the feature map output by the self-attention unit has been processed by channel compression and other methods. This indicates the number of self-attention units used in the entire neural network model.
[0113] Step S240: Adjust the neural network model according to the total loss to train the breast lesion segmentation device.
[0114] After obtaining the total loss, the parameters in the neural network model can be adjusted based on the total loss, and training can continue to obtain the breast lesion segmentation device. Optionally, during training, the network training optimization method can be the stochastic gradient descent method.
[0115] The ultrasound lesion segmentation network (Focal U-Net) based on focusing self-attention mechanism and boundary loss provided in this application combines the focusing self-attention module with the traditional semantic segmentation network U-Net. This effectively captures both local fine-grained and global coarse-grained features, enabling feature learning of the lesion region and its surrounding area while suppressing background noise interference. Furthermore, in this embodiment, boundary loss is used to constrain the focusing self-attention module, enhancing its learning of lesion edge features and effectively improving the segmentation accuracy of ultrasound lesions.
[0116] The self-attention module focuses on focusing on the image by dividing it into different levels. At each level, the receptive field and the granularity of the fused features differ. Fine-grained local consideration effectively captures the detailed features of the lesion region, while coarse-grained global consideration captures features from the surrounding area of the lesion and suppresses background noise. Fine-grained local feature learning improves lesion segmentation accuracy, while coarse-grained global feature learning filters out false lesions from the background and reduces background noise interference.
[0117] To perform the corresponding steps in the above embodiments and various possible methods, an implementation of a model training device 300 is given below. Optionally, the model training device 300 can adopt the above-described... Figure 1 The device structure of the electronic device 100 shown. Further, please refer to... Figure 13 , Figure 13 This is a block diagram of the model training device provided in this embodiment. It should be noted that the basic principle and technical effects of the model training device 300 provided in this embodiment are the same as those in the above embodiments. For the sake of brevity, any parts not mentioned in this embodiment can be referred to the corresponding content in the above embodiments. The model training device 300 may include: an image acquisition module 310 and a training module 320.
[0118] The image acquisition module 310 is used to acquire multiple sample breast ultrasound images and the sample mask image and sample lesion contour image corresponding to each of the multiple sample breast ultrasound images.
[0119] The training module 320 is used to input the sample breast ultrasound image into a preset neural network model to obtain at least one third target feature map generated by the neural network model and a mask image to be analyzed output by the neural network model. The neural network model includes a U-Net network and at least one self-attention unit. The U-Net network includes multiple downsampling units and multiple upsampling units. Each self-attention unit is used to obtain a third target feature map based on a self-attention mechanism, according to the first target feature map output by the connected downsampling unit and the second target feature map input to the connected upsampling unit, and then input the third target feature map into the connected upsampling unit. The downsampling unit and the connected upsampling unit connected to any self-attention unit are located in the same layer.
[0120] The training module 320 is further configured to calculate the total loss based on each of the third target feature maps, the mask image to be analyzed, the corresponding sample mask image, and the corresponding sample lesion contour image. The total loss includes the boundary loss between the third target feature map and the corresponding sample lesion contour image.
[0121] The training module 320 is also used to adjust the neural network model according to the total loss in order to train the breast lesion segmentation device.
[0122] Optionally, the above modules can be stored in the form of software or firmware. Figure 1 The memory 110 shown is either stored in or embedded in the operating system (OS) of the electronic device 100, and can be used by... Figure 1The processor 120 executes the program. Meanwhile, the data and program code required to execute the above modules can be stored in the memory 110.
[0123] This application embodiment also provides a readable storage medium storing a computer program thereon, which, when executed by a processor, implements the breast lesion segmentation device 200 or the model training device 300.
[0124] In summary, this application provides a breast lesion segmentation device, a model training method, and an electronic device. A U-Net network is connected to at least one self-attention unit. The U-Net network includes multiple downsampling units and multiple upsampling units. The U-Net network is used to obtain a breast lesion segmentation mask based on the breast ultrasound image to be analyzed. Each attention unit is used to obtain a third target feature map based on a self-attention mechanism, using a first target feature map output by the connected downsampling unit and a second target feature map input to the connected upsampling unit, and then inputting the third target feature map into the connected upsampling unit. The downsampling unit and the connected upsampling unit connected to any self-attention unit are located in the same layer. Each attention unit is trained based on the boundary loss between the sample contour image and the output image of the self-attention unit. Thus, a breast lesion segmentation device including self-attention units that pay more attention to lesion edge features can obtain highly accurate lesion segmentation results.
[0125] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can also be implemented in other ways. The apparatus embodiments described above are merely illustrative. For example, the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions marked in the blocks may occur in a different order than those marked in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram and / or flowchart, and combinations of blocks in block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0126] In addition, the functional modules in the various embodiments of this application can be integrated together to form an independent part, or each module can exist independently, or two or more modules can be integrated to form an independent part.
[0127] If the aforementioned functions are implemented as software functional modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0128] The above description is merely an optional embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.
Claims
1. A breast lesion segmentation device, characterized in that, The device includes a connected U-Net network and at least one self-attention unit. The U-Net network includes multiple downsampling units and multiple upsampling units. The U-Net network is used to obtain a segmentation mask for breast lesions based on the breast ultrasound image to be analyzed. Each of the self-attention units is used to obtain a third target feature map based on the first target feature map output by the connected downsampling unit and the second target feature map input to the connected upsampling unit, and input the third target feature map to the connected upsampling unit; the downsampling unit and the connected upsampling unit connected to any self-attention unit are located in the same layer; each self-attention unit is trained based on the boundary loss between the sample contour image and the output image of the self-attention unit; the total loss used in the training process of the breast lesion segmentation device is determined by the total boundary loss. The loss, cross-entropy loss, and accuracy loss are calculated. The cross-entropy loss and accuracy loss are calculated based on the mask image to be analyzed output by the breast lesion segmentation device during the training process based on the sample breast ultrasound image and the sample mask image corresponding to the sample breast ultrasound image. The total boundary loss is obtained as follows: each of the third target feature maps is processed to obtain a single-channel feature map; the sample lesion contour image corresponding to the sample breast ultrasound image is downsampled to the size of the single-channel feature map to obtain the sample feature map; the total boundary loss is calculated based on the single-channel feature map corresponding to each group and the sample feature map.
2. The apparatus according to claim 1, characterized in that, One of the self-attention units is located at the bottom cascade position of the U-Net network.
3. The apparatus according to claim 1 or 2, characterized in that, The device further includes at least one preprocessing unit, and the self-attention unit includes a preprocessing subunit, an attention calculation subunit, and a processing subunit. The preprocessing unit is used to sum the first target feature map and the second target feature map to obtain the image to be processed; The preprocessing subunit is used to segment the image to be processed into multiple image blocks of the same size and obtain the feature vector of each image block; The attention calculation subunit is used to obtain a third initial target feature map based on the feature vector of each image block, using a self-attention mechanism. The processing subunit is used to upsample the third initial target feature map to obtain the third target feature map.
4. The apparatus according to claim 3, characterized in that, The attention calculation subunit is used to obtain the third initial target feature map based on a self-attention mechanism. Specifically, the attention calculation subunit is used for: The third feature map is divided into multiple grids of the same size, wherein the third feature map is an image determined based on the feature vector of each image block; Each grid is treated as a sub-window. For each sub-window, pooling is performed at different levels to obtain the pooling results at each level. The level is positively correlated with the size of the receptive field and the pooling window size corresponding to the sub-window pooling. For each sub-window, a query vector is calculated based on that sub-window, and a key vector and a value vector are calculated based on the pooling results of each level corresponding to that sub-window. Based on the multi-head attention mechanism, the third initial target feature map is obtained according to the query vector, key vector and value vector corresponding to each sub-window.
5. The apparatus according to claim 4, characterized in that, The calculation formula used when obtaining the third initial target feature map is: in, Represents the query matrix. Represents the key matrix, Represents a value matrix, Indicates position offset.
6. The apparatus according to claim 3, characterized in that, The preprocessing subunit is specifically used for: Obtain the original feature vector for each image patch; The original feature vector of each image block is normalized to obtain the feature vector of each image block.
7. A model training method, characterized in that, The method for training a breast lesion segmentation device includes: Obtain multiple sample breast ultrasound images and corresponding sample mask images and sample lesion contour images for each of the multiple sample breast ultrasound images; The sample breast ultrasound image is input into a preset neural network model to obtain at least one third target feature map generated by the neural network model and a mask image to be analyzed output by the neural network model. The neural network model includes a U-Net network and at least one self-attention unit. The U-Net network includes multiple downsampling units and multiple upsampling units. Each self-attention unit is used to obtain a third target feature map based on the self-attention mechanism, according to the first target feature map output by the connected downsampling unit and the second target feature map input to the connected upsampling unit, and input the third target feature map to the connected upsampling unit. The downsampling unit and the connected upsampling unit of any self-attention unit are located in the same layer. Based on the third target feature maps, the mask image to be analyzed, the corresponding sample mask image, and the corresponding sample lesion contour image, the total loss is calculated. The total loss includes the total boundary loss obtained from the boundary losses of the third target feature maps and the corresponding sample lesion contour images, the cross-entropy loss, and the precision loss. The cross-entropy loss and precision loss are calculated based on the mask image to be analyzed and the corresponding sample mask image. The total boundary loss is obtained as follows: The third target feature maps are processed to obtain single-channel feature maps; the sample lesion contour image corresponding to the sample breast ultrasound image is downsampled to the size of the single-channel feature map to obtain the sample feature map; the total boundary loss is calculated based on the corresponding single-channel feature maps and the sample feature map. The neural network model is adjusted based on the total loss to train the breast lesion segmentation device.
8. The method according to claim 7, characterized in that, The processing includes channel compression, and the calculation of the total loss based on each of the third target feature maps, the mask image to be analyzed, the corresponding sample mask image, and the corresponding sample lesion contour image includes: Based on the mask image to be analyzed and the corresponding sample mask image, the cross-entropy loss and accuracy loss are calculated. The total loss is calculated based on the total boundary loss, cross-entropy loss, and accuracy loss.
9. An electronic device, characterized in that, It includes a processor and a memory, the memory storing machine-executable instructions that can be executed by the processor, the processor executing the machine-executable instructions to implement the breast lesion segmentation device according to any one of claims 1-6.
10. A readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the breast lesion segmentation device as described in any one of claims 1-6.