Self-supervised learning for image processing
By training an image processing neural network through self-contrast learning and optimizing model parameters using the self-contrast learning method, the problem of insufficient reconstruction quality of undersampled MRI images in existing technologies is solved, generating higher resolution or denoised output images and improving image processing performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANGHAI UNITED IMAGING INTELLIGENCE CO LTD
- Filing Date
- 2022-08-03
- Publication Date
- 2026-06-23
AI Technical Summary
Existing deep learning-based image processing techniques rely on training with high-quality gold-standard images, resulting in suboptimal output image quality, especially in the reconstruction of undersampled MRI images, where the images are often blurry or excessively noisy.
A self-comparative learning method is used to train an image processing neural network. The difference between the first iteration and the gold standard image is minimized, and the difference between the second iteration and the predicted image is maximized. The model parameters are optimized by combining L1 norm, L2 norm or hinge loss and triplet loss function to generate high-quality output images.
It improves the reconstruction quality of undersampled MRI images, generates higher resolution or denoised output images, and enhances the image processing effect.
Smart Images

Figure CN115147402B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of medical image processing. Background Technology
[0002] Machine learning, especially deep learning, has been widely used in image processing tasks such as image reconstruction, image denoising, image super-resolution, and motion estimation. Conventional deep learning-based image processing techniques typically rely on annotated or high-quality images as the gold standard for training image processing models. However, predictions from trained models (e.g., output images) may still be of suboptimal quality when compared to the gold standard image. For example, a magnetic resonance (MR) image reconstructed from undersampled k-space data from a trained model may be blurry compared to a fully sampled gold standard image. Similar problems exist in other image processing tasks mentioned above (e.g., excessive image noise) when training models using only positive examples (e.g., the gold standard image). Therefore, it may be desirable to leverage additional information in conjunction with the gold standard image to train models for image processing tasks. Summary of the Invention
[0003] This paper describes neural network-based systems, methods, and apparatuses associated with medical image processing. In the examples, the systems, methods, and apparatuses may be implemented using a processor and / or a storage medium including an executable computer program for implementing a model to generate an output image based on an input image using machine learning techniques. An image processing neural network system (e.g., using one or more artificial neural networks that may include convolutional neural networks) can be trained to receive input images of anatomical structures (e.g., myocardium, cortex, cartilage, etc.) and generate an output image based on the input image, which is generated by a medical imaging modality. The image processing neural network system can be configured to implement a model for generating an output image based on an input image. The model can be learned through a training process during which parameters associated with the model are adjusted to maximize the difference between a first image (e.g., the output image) predicted using first parameter values of the model and a second image predicted using second parameter values of the model, and to minimize the difference between the second image and a gold standard image.
[0004] An image processing neural network system can be trained according to a process including a first iteration and a second iteration. A first image can be predicted during the first iteration, and a second image can be predicted during the second iteration. The first parameter values of the model can be obtained during the first iteration by minimizing the difference between the first image and the gold standard image. The second parameter values of the model can be obtained during the second iteration by maximizing the difference between the first image and the second image. The first iteration of the training process can be performed under different training settings than the second iteration of the training process.
[0005] In an embodiment, the image processing neural network system can determine the difference between a first image and a second image, or the difference between a second image and a gold standard image, based on the L1 norm, L2 norm, or hinge loss. In another embodiment, a triplet loss function can be used during training to maximize the difference between the first image and the second image and minimize the difference between the second image and the gold standard image.
[0006] In one embodiment, the medical imaging mode can be a magnetic resonance imaging (MRI) scanner, and the input image can be an MRI image. In another embodiment, the input image can be an undersampled MRI image of an anatomical structure (e.g., myocardium, cortex, cartilage, etc.), and the output image can be a fully sampled MRI image of the anatomical structure.
[0007] In an embodiment, the output image may be a higher resolution version of the input image, or the output image may be a denoised version of the input image. Attached Figure Description
[0008] This disclosure will be more fully understood from the detailed description given below and from the accompanying drawings of various embodiments thereof. However, the drawings should not be construed as limiting this disclosure to the particular embodiments, but are for explanation and understanding only.
[0009] Figure 1 A simplified block diagram of an example image processing neural network system as described in this article is shown.
[0010] Figure 2 A simplified block diagram of an example system described herein that can be used to perform image processing tasks is shown.
[0011] Figure 3 A simplified diagram illustrating the effect of image processing performed by a neural network system as described herein is shown.
[0012] Figure 4 A simplified diagram illustrating the use of the triplet loss function to learn a neural network model, as described in this article, is shown.
[0013] Figure 5 A flowchart is shown as an example method for training a neural network for image processing, as described in this article.
[0014] Figure 6 A simplified block diagram illustrating an example neural network system for performing image processing as described herein is shown. Detailed Implementation
[0015] Figure 1A simplified block diagram of an example image processing neural network system as described herein is shown. The neural network system may include an artificial neural network (ANN) 100 (such as a deep convolutional neural network (DCNN)), which may include multiple layers, such as an input layer for receiving data input (e.g., input image 102), an output layer for generating an output (e.g., output image 108), and one or more hidden layers. Hidden layers may include one or more convolutional layers, one or more pooling layers, and / or one or more fully connected layers. Each convolutional layer may include multiple convolutional kernels or filters configured to extract specific features from the input image 102. Following the convolution operation may be batch normalization and / or non-linear activation, and the features extracted by the convolutional layers (e.g., in the form of one or more feature maps) may be downsampled (e.g., using a 2×2 window and a stride of 2) by pooling layers and / or fully connected layers to reduce feature redundancy and / or size (e.g., reduced by a factor of 2).
[0016] In one embodiment, input image 102 may include images of anatomical structures (e.g., myocardium, cortex, cartilage, etc.) generated by a medical imaging modality. In another embodiment, this medical imaging modality may include a magnetic resonance imaging (MRI) scanner, such that input image 102 may include MRI images. In another embodiment, input image 102 may include undersampled MRI images of the anatomical structures, and output image 108 may include fully sampled or otherwise adapted MRI images of the anatomical structures (e.g., higher resolution, less noise, etc.). In another embodiment, input image 102 may be derived from undersampled MRI data (e.g., k-space data) by applying a Fourier transform (e.g., Fast Fourier Transform or FFT) to the undersampled MRI data.
[0017] Artificial neural network 100 can be configured to implement an image processing model for generating an output image 108 based on an input image 102, and this model can be learned through a contrastive training process utilizing one or more positive images 104 and one or more negative images 106. Positive image 104 can refer to a gold standard image obtained for training the image processing model, while negative image 106 can refer to an image generated using preliminary (e.g., coarse) parameters of the image processing model. Training can be performed in a contrastive manner, for example, by maximizing the difference between the image predicted by artificial neural network 100 (e.g., output image 108) and negative image 106 and minimizing the difference between the predicted image and positive image 104. For example, training of neural network 100 (e.g., the image processing model implemented by neural network 100) can include multiple rounds or iterations. In the first round or iteration, neural network 100 can be trained to predict an output image (e.g., a fitted version of input image 102) similar to the gold standard of the fitted image. Neural network 100 can do this, for example, by tuning its parameters to minimize the difference between the predicted image and the gold standard image. In the second round or iteration of training, the neural network 100 can be further trained using the image predicted during the first round or iteration of training as the negative image 106 and the gold standard image as the positive image 104. The neural network 100 can also adjust its parameters by minimizing the difference between the image predicted by the neural network and the positive image 104 and by maximizing the difference between the image predicted by the neural network and the negative image 106.
[0018] In an embodiment, the neural network may be configured to implement (e.g., learn) a first model during a first round or iteration of training, and to implement (e.g., learn) a second model (e.g., different from the first model) during a second round or iteration of training. The first round or iteration of training may be based on a first loss function such as an L1 or L2 loss function (e.g., to minimize the difference between the predicted image and the gold standard image), and the second round or iteration of training may be based on a second loss function such as a triplet loss function (e.g., to minimize the difference between the predicted image and the positive image or the gold standard image and to maximize the difference between the predicted image and the negative image).
[0019] In an embodiment, neural network 100 can be configured to implement (e.g., learn) the same model through a first round or iteration of training and a second round or iteration of training. Neural network 100 can tune the model's parameters based on a triplet loss function during the first and second rounds or iterations of training. As described herein, neural network 100 can use outputs from previous iterations of training as negative examples during subsequent iterations of training (e.g., to manipulate the neural network from negative examples). Thus, at the start of training (e.g., when there are no previous iterations), neural network 100 can use randomly generated images (or blank / empty images) as negative examples.
[0020] Images predicted by neural network 100 during earlier rounds or iterations of training can be used as negative examples to guide training, since the quality of such initially generated images may be unsatisfactory (e.g., the image may be blurry compared to a gold standard). By forcing neural network 100 to move away from these negative examples and towards positive examples during later rounds or iterations of training, the parameters of neural network 100 can be further optimized. This training / learning process can be called a self-contradictory training / learning process because the output generated by neural network 100 from earlier rounds or iterations of training (e.g., using the same model) can be used as negative images 106 during later rounds or iterations of training. See below. Figure 3 and Figure 4 Further examples to describe this self-comparative training / learning process.
[0021] like Figure 1 As shown, neural network 100 is an illustrative network that generates output image 108 based on input image 102. Embodiments of this disclosure are not limited to a specific type of image processing task. For example, input image 102 can be any image that can benefit from some form of image processing, such as image reconstruction, image denoising, image super-resolution, motion estimation, etc.
[0022] Figure 2 A simplified block diagram of an example system (e.g., device) 200 described herein that can be used to perform image processing tasks is shown. System 200 may be a standalone computer system or a networked computing resource implemented in a computing cloud, and may include a processing unit 202 and a storage unit 204, wherein the storage unit 204 may be communicatively coupled to the processing unit 202. The processing unit 202 may include one or more processors, such as a central processing unit (CPU), a graphics processing unit (GPU), or accelerator circuitry. The storage unit 204 may be a network interface card (NIC) Figure 2(Not shown) A storage device, hard disk, or cloud storage device is connected to the processing device 202. The processing device 202 can be programmed to implement the pre-learned image processing model described herein via instructions 208 (e.g., instructions 208 can implement an artificial neural network for implementing the pre-learned model).
[0023] Processing device 202 can execute instructions 208 and perform the following operations: at 210, receiving an input image (e.g., Figure 1 The input image 102 may include images of anatomical structures produced by a medical imaging modality such as an MRI scanner, such that the input image may include MRI images; at 212, an artificial neural network (e.g., which may be trained to implement an image processing model) is used to generate an output image (e.g., a reconstructed, higher-resolution, less-noise, etc.) corresponding to an adapted version of the input image (e.g., a reconstructed, higher-resolution, less-noise, etc.). Figure 1 The output image is 108. As described above, the artificial neural network trained using the techniques described herein can be used for various image processing tasks with respect to the input image, including, for example, image reconstruction, super-resolution, denoising, etc.
[0024] As described in this paper, artificial neural networks can be trained to generate output images based on input images through contrastive learning (e.g., combining...). Figure 1 The image processing model is learned using self-contrast learning (described in the text). For example, the model can be learned through a training process during which parameters associated with the model are adjusted to maximize the difference between a first image (e.g., a negative image) predicted using first or preliminary parameter values of the model and a second image predicted using second or refined parameter values of the model, and to minimize the difference between the second image and a gold standard image (e.g., a positive image). As described above, in embodiments, the training process may include a first iteration and a second iteration, during which the first and second images are predicted. The first parameter values may be obtained during the first iteration (e.g., guided only by the gold standard image) by minimizing the difference between the first image and the gold standard image. The second parameter values may be obtained during the second iteration (e.g., guided by a negative image and / or a positive image) by maximizing the difference between the first and second images and / or minimizing the difference between the second image and the gold standard image.
[0025] Figure 3 A simplified diagram illustrating the effect of image processing 300 performed by a neural network system as described herein is shown. Figure 3 The shaded region 302 in the diagram can represent the output of a neural network system (e.g., Figure 1 The output image 108) is associated with the image space R. n The part. Figure 3The non-shaded region 304 in the image can be represented as the desired output (e.g., Figure 1 The positive image 104) is associated with the image space R. n As shown in the figure, self-contradictory training / learning can have the following effect: it forces the neural network to refine its parameters, causing the output generated by the neural network to be pushed away from negative results (e.g., the portion of space 302, denoted as “N”, located outside the expectation space 304) and towards positive results (e.g., the portion of space 302, denoted as “P”, located inside the expectation space 304).
[0026] In this embodiment, the output image, positive image, and negative image can be in the image space R. n The image space R is represented by corresponding feature maps or feature vectors associated with various images. In an embodiment, in the image space R... n The distance between the representation of the output image and the representation of the positive image P, and the distance between the representation of the output image and the representation of the negative image N, can be measured according to a specified loss function. In embodiments, the loss function can be based on, for example, the L1 norm, L2 norm, or hinge loss. Figure 3 In some embodiments, the loss function may include a triplet loss function, which is designed to reduce the image space R n The distance between the output of the neural network and the negative image N is maximized, and the image space R is... n The distance between the output of the neural network and the representation of the positive image P is minimized.
[0027] Then, the parameters associated with the neural network system can be updated based on maximization and minimization, such that the output representation is pushed away from the position in the output image space containing the representation of the negative image N (which may be a previously predicted suboptimal output image) and pulled closer to the position in the output image space containing the representation of the positive image P (e.g., the position where the output image space 302 intersects with the sharp image space 304).
[0028] Figure 4 An example illustrating the use of the triplet loss function as described herein for learning an image processing model (e.g., learning the parameters of an image processing neural network) is shown using a simplified 400-pixel graph.
[0029] Output 1 can be in image space (e.g., Figure 3 R n The first predicted image in ) (e.g., Figure 1 The output image 108 is a representation of the output image, while the output 2 can be a second predicted image in the image space (e.g., Figure 1Another output image (108) is represented. Output 1 may be generated, for example, during the first iteration of training, and output 2 may be generated, for example, during the second iteration of training. During the training / learning process of the neural network described herein (e.g., regarding...) Figure 1 Neural network 100 and Figure 2 The system 200 can adjust the parameters (e.g., weights) of the neural network based on the triplet loss function, such that later outputs (e.g., output 2) are closer to P (e.g., positive image) and farther away from N (e.g., negative image) compared to earlier outputs (e.g., output 1).
[0030] The triplet loss function can be expressed as follows:
[0031] L = max(d(output1,P)-d(output1,N)+margin,0)
[0032] Here, the margin can be a configurable parameter that forces the distance (d) between output 1 and P, and between output 1 and N, to be greater than this margin. The triplet loss function can be minimized such that distance d(output 1, P) is pushed toward 0, and distance d(output 1, N) is pushed toward d(output 1, P) + margin. Therefore, after training the model, output 2, corresponding to the subsequent predicted output image, can be closer to P and farther from N than the previous output 1.
[0033] Figure 5 A flowchart of an example method 500 for training a neural network for image processing, as described herein, is shown. Method 500 can be executed by a processing device that may include hardware (e.g., circuitry, special-purpose logic), computer-readable instructions (e.g., running on a general-purpose computer system or a special-purpose machine), or a combination of both.
[0034] Method 500 can be started, and in 502, the execution parameters of the neural network (e.g., weights associated with one or more layers of the neural network) can be initialized. For example, the parameters can be initialized based on samples of one or more probability distributions or parameter values from another neural network with a similar architecture.
[0035] In section 504, the neural network can receive input images of anatomical structures (e.g., training images). The input images can be generated by a medical imaging modality or simulated using one or more computing devices. As described above, the medical imaging modality can include an MRI scanner, such that the input images of the anatomical structures include MRI images.
[0036] In 506, the neural network can participate in the first iterative process (e.g., the first round of training or the first stage of the training process), wherein the neural network can generate an output image corresponding to a fitted version of the input image (e.g., reconstructed, higher resolution, less blurry, etc.), determine the difference between the output image and the gold standard image based on a loss function (e.g., L1 loss function, L2 loss function, hinge loss function, etc.), and adjust the parameters of the neural network based on gradient descent associated with the loss function. The neural network can repeat the above operations multiple times (e.g., a pre-configured number of times) iteratively or until it fully converges, and this iterative process can constitute the first round or iteration of training.
[0037] In a 508 model, the neural network can perform one or more of the following operations: The neural network can receive an input training image (e.g., as described herein), generate an output image corresponding to an adapted version of the input training image (e.g., reconstructed, higher resolution, less blurred, etc.), and determine the corresponding differences between the output image and the negative image, and between the output image and the positive image, based on a loss function (e.g., a triplet loss function). The negative image can be generated based on the input training image and first parameter values learned in a 506 model (e.g., by feeding the input training image to the model learned in the first round of training), while the positive image can be the gold standard image. The neural network can then tune its parameters based on a loss function (e.g., gradient descent of the loss function) to maximize the difference between the output image and the negative image and minimize the difference between the output image and the positive image.
[0038] At 510, the neural network can determine whether one or more training termination criteria are met. For example, if the neural network has completed a predetermined number of training iterations, or if the difference between the output image predicted by the network and the gold standard image is below a predetermined threshold, the neural network can determine that the training termination criterion is met. If it is determined at 510 that the training termination criterion is not met, the system can return to 508. If it is determined at 510 that the training termination criterion is met, method 500 can end. The operations associated with 508 and 510 can constitute a second iteration of training (e.g., a second round of training or a second phase of the training process).
[0039] The first and second rounds of training described herein can be performed under the same settings (e.g., the same time period) or under different settings (e.g., different time periods). The first and second rounds of training described herein can be based on the same model (e.g., the model parameters can be adjusted between the first and second rounds) or on different models (e.g., one or more models different from the model trained in the second round can be used to generate negative images to facilitate the second round of training). If multiple models are used to generate negative images, these models can be trained under the same set of settings or under different sets of settings.
[0040] For the sake of simplicity, the operations of Method 500 are depicted and described in a specific order herein. However, it should be understood that these operations can occur in various orders, simultaneously, and / or with other operations not presented or described herein. Furthermore, it should be noted that not all operations that a neural network system can perform are depicted and described herein, and not all operations exemplified by Method 500 need to be performed by the system.
[0041] Figure 6 A simplified block diagram illustrating an example neural network system 600 for performing image processing as described herein is shown. In embodiments, the neural network system 600 may be connected (e.g., via a network such as a local area network (LAN), intranet, extranet, or the Internet) to other computer systems. The neural network system 600 may operate as a server or client computer in a client-server environment, or as a peer-to-peer computer in a peer-to-peer or distributed network environment. The neural network system 600 may be provided by a personal computer (PC), tablet PC, set-top box (STB), personal digital assistant (PDA), cellular phone, network device, server, network router, switch, or bridge, or any device capable of executing a set of instructions (sequential or otherwise) specifying the action to be taken by the device. Further, the term "computer" should include any collection of computers that individually or jointly execute a set (or more sets) of instructions to perform any one or more of the methods described herein.
[0042] Furthermore, the neural network system 600 may include a processing device 602 (e.g., Figure 2 The processing device 202), volatile memory 604 (e.g., random access memory (RAM)), non-volatile memory 606 (e.g., read-only memory (ROM) or electrically erasable programmable ROM (EEPROM)), and data storage device 616 (e.g., ... Figure 2The storage device 204 can communicate with each other via bus 608. The processing device 602 can be provided by one or more processors, such as a general-purpose processor (e.g., a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a microprocessor that implements other types of instruction sets, or a microprocessor that implements a combination of multiple types of instruction sets) or a special-purpose processor (e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a digital signal processor (DSP), or a network processor).
[0043] The neural network system 600 may further include a network interface device 622, a video display unit 610 (e.g., an LCD), an alphanumeric input device 612 (e.g., a keyboard), a cursor control device 614 (e.g., a mouse), a data storage device 616, and / or a signal generation device 620. The data storage device 616 may include a non-transitory computer-readable storage medium 624 on which instructions 626 encoded for any one or more of the image processing methods or functions described herein may be stored. The instructions 626 may also reside wholly or partially within volatile memory 604 and / or processing device 602 during execution by the computer system 600, thus volatile memory 604 and processing device 602 may also constitute machine-readable storage media.
[0044] Although computer-readable storage medium 624 is shown as a single medium in the illustrative example, the term "computer-readable storage medium" should include a single medium or multiple media (e.g., a centralized or distributed database, and / or associated caches and servers) that store a set or more sets of executable instructions. The term "computer-readable storage medium" should also include any tangible medium capable of storing or encoding a set of instructions executable by a computer, which causes the computer to perform any or more of the methods described herein. The term "computer-readable storage medium" should include, but is not limited to, solid-state memory, optical media, and magnetic media.
[0045] The methods, components, and features described herein may be implemented by discrete hardware components or integrated into the functionality of other hardware components such as ASICs, FPGAs, DSPs, or similar devices. Alternatively, the methods, components, and features may be implemented by firmware modules or functional circuitry within a hardware device. Furthermore, the methods, components, and features may be implemented as any combination of hardware devices and computer program components, or as a computer program.
Claims
1. An apparatus for performing an image processing task, comprising one or more processors and one or more storage devices, the one or more storage devices being configured to store instructions that, when executed by the one or more processors, cause the one or more processors to: Receive an input image of an anatomical structure, the input image being generated by a medical imaging modality; and One or more artificial neural networks are used to generate an output image based on the input image, wherein... The one or more artificial neural networks are configured to implement a model for generating the output image based on the input image; as well as The model learns through a training process during which parameters associated with the model are adjusted to maximize the difference between a first image predicted using a first parameter value and a second image predicted using a second parameter value, and to minimize the difference between the second image and the gold standard image. The training process includes a first iteration and a second iteration, wherein the first image is predicted during the first iteration, the second image is predicted during the second iteration, the first parameter value of the model is obtained during the first iteration by minimizing the difference between the first image and the gold standard image, and the second parameter value of the model is obtained during the second iteration by maximizing the difference between the first image and the second image.
2. The device according to claim 1, wherein, The first iteration of the training process is performed under different training settings than the second iteration of the training process.
3. The device according to claim 1, wherein, The difference between the first image and the second image or the difference between the second image and the gold standard image is determined based on at least one of the following: L1 norm, L2 norm, or hinge loss.
4. The device according to claim 1, wherein, During the training process, a triplet loss function is used to maximize the difference between the first image and the second image, and to minimize the difference between the second image and the gold standard image.
5. The device according to claim 1, wherein, The medical imaging modality includes a magnetic resonance imaging (MRI) scanner, the input image includes an undersampled MRI image, and the output image includes a reconstructed MRI image.
6. The device according to claim 5, wherein, The reconstructed MRI image has the quality of a fully sampled MRI image, or the output image includes a higher resolution version relative to the input image, or the output image includes a denoised version of the input image, or the one or more artificial neural networks include convolutional neural networks.
7. A method for image processing implemented by a neural network system, the method comprising: The neural network system receives an input image of an anatomical structure, which is generated by a medical imaging modality. as well as The neural network system uses one or more artificial neural networks to generate an output image based on the input image, wherein, The one or more artificial neural networks are configured to implement a model for generating the output image based on the input image; and The model learns through a training process during which parameters associated with the model are adjusted to maximize the difference between a first image predicted using a first parameter value and a second image predicted using a second parameter value, and to minimize the difference between the second image and the gold standard image. The training process includes a first iteration and a second iteration, wherein the first image is predicted during the first iteration, the second image is predicted during the second iteration, the first parameter value of the model is obtained during the first iteration by minimizing the difference between the first image and the gold standard image, and the second parameter value of the model is obtained during the second iteration by maximizing the difference between the first image and the second image.
8. A method for training a neural network for image processing, the method comprising: Initialize the parameters of the neural network; During the first round of training: Receive the first training image; Predict the first output image based on the first training image; as well as The parameters of the neural network are adjusted to minimize the difference between the first output image and the first gold standard image; as well as During the second round of training: Receive the second training image; Predict the second output image based on the second training image; as well as The parameters of the neural network are adjusted to maximize the difference between the second output image and the negative image and minimize the difference between the second output image and the second gold standard image, wherein the negative image is obtained based on the second training image using the parameters of the neural network learned during the first round of training.
9. The method according to claim 8, wherein, Each of the first training image and the second training image includes an undersampled magnetic resonance imaging (MRI) image, and each of the first output image and the second output image includes a reconstructed MRI image.