Generating and using synthetic training data

By generating synthetic training data and replacing and degrading medical images with natural images, the problem of insufficient training data is solved, enabling efficient machine learning algorithm training and image quality improvement, applicable to various medical imaging modalities.

CN122228518APending Publication Date: 2026-06-16KONINKLIJKE PHILIPS NV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
KONINKLIJKE PHILIPS NV
Filing Date
2024-11-07
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

In existing technologies, it is difficult to collect training data for medical imaging modalities. In particular, the training data for high-resolution CT systems is limited by X-ray radiation dose restrictions and privacy regulations, resulting in insufficient training data for machine learning algorithms and making it difficult to achieve high-quality image processing.

Method used

By generating synthetic training data, replacing parts of medical images with natural images and degrading the image quality, training pairs are formed to train machine learning algorithms, especially convolutional neural networks.

🎯Benefits of technology

It provides a large amount of high-quality training data, which improves the robustness of machine learning algorithms and image processing performance, reduces reliance on high-dose X-rays, and protects patient privacy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122228518A_ABST
    Figure CN122228518A_ABST
Patent Text Reader

Abstract

A computer-implemented method for generating synthetic training data for training a machine learning algorithm (1) of a software module, wherein the software module is configured for image processing of images acquired by a medical imaging modality, the method comprising the steps of: (a) receiving a ground truth image (5) generated by receiving a medical image obtained by the medical imaging modality or another medical imaging modality and replacing at least a portion of the medical image with at least one image portion taken from at least one natural image (3); (b) artificially degrading the image quality of the ground truth image (5) to obtain an input training image (6); and (c) providing the input training image (6) together with the ground truth image (5) as a training pair of training data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to a computer-implemented method for generating synthetic training data, a method for generating training weights, training weights, a software module, the use of the software module, and an image processing apparatus. Background Technology

[0002] Computed tomography (CT) is a widely used medical imaging modality that typically images cross-sections of a patient's body. During image acquisition, X-rays are used to irradiate the patient, and then the X-rays are measured by a detector unit after passing through the patient. Recovering information about the scanned volume and reconstructing the image are important factors, for example, when aiming to reduce the X-ray dose level applied to the patient and improve the final image quality. To improve these processes, machine learning algorithms, such as those based on convolutional neural networks (CNNs), are applied. However, machine learning algorithms must be trained using training data during the training phase. In particular, large amounts of training data are often required to achieve good results.

[0003] The demand for training data often hinders development because the amount of available CT image data is limited, for example due to privacy regulations regarding access to medical data, such as the EU's General Data Protection Regulation (GDPR). Furthermore, the generation of CT image data is further restricted due to the negative effects of X-ray radiation on the human body. Especially on high-resolution CT systems, the X-ray dose must be significantly increased to achieve a sufficient signal-to-noise ratio (SNR), which is only possible to a limited extent. Therefore, training data from high-resolution CT systems is often limited in quality, i.e., limited in resolution or SNR. In addition, other types of image defects (such as motion) can reduce the value of such training data.

[0004] Therefore, collecting diverse training datasets requires significant effort, and for some problems, existing technological methods may simply be insufficient to obtain a sufficient quantity and quality of training data. This problem not only concerns CT training data but also, in particular, other imaging modalities that employ potentially harmful radiation, such as X-rays. Furthermore, the difficulty in acquiring sufficient training data (e.g., limitations due to privacy concerns) can be universally applied to image processing related to any imaging modality.

[0005] Object of the present invention Therefore, the object of this invention is to provide a method that better solves the above-mentioned problems. Furthermore, it is desirable to find a solution that allows for the full training of machine learning algorithms and provides fully trained machine learning algorithms. In particular, it is desirable to find a method that can provide more training data. Summary of the Invention

[0006] To better achieve this objective, methods according to claim 1, computer programs according to claim 11, methods according to claim 12, and data processing systems according to claim 14 are provided. Advantageous embodiments are set forth in the dependent claims. Any features, advantages, or alternative embodiments described herein with respect to the claimed methods also apply to other categories of claims, and vice versa.

[0007] According to a first aspect, a computer-implemented method is provided for generating synthetic training data for training a machine learning algorithm of a software module. The software module is configured to perform image processing on images acquired via a medical imaging modality. The medical imaging modality may, in particular, be an X-ray-based medical imaging modality, such as computed tomography (CT) imaging. The method includes the following steps: (a) Receive a reference true image (5) generated by: receiving a medical image obtained by a medical imaging modality or another medical imaging modality, and replacing at least a portion of the medical image with at least one image portion taken from at least one natural image (3); (b) Artificially degrading the image quality of the baseline ground truth image (5) to obtain the input training image (6); and (c) The input training image (6) is provided together with the benchmark ground image (5) as a training pair for training data.

[0008] Advantageously, the method of the present invention allows for the provision of training data pairs with images of high quality (i.e., the baseline ground truth image) and lower quality (i.e., the input training image). The method of the present invention is applicable to various medical imaging modalities; that is, the training data can be used to train a machine learning algorithm to improve the quality of images obtained through any imaging modality. The method of the present invention may be particularly useful for imaging modalities involving the application of X-ray radiation to a subject or patient (e.g., X-ray imaging or computed tomography (CT)), because the amount of available training data for these imaging modalities is often particularly limited due to the adverse effects of X-ray radiation on the human body, which ethically restricts the possibility of acquiring image data. However, the method is also applicable to other medical imaging modalities, such as magnetic resonance imaging (MRI), ultrasound imaging, or positron emission tomography (PET). The method can also be used to obtain training data for training image-guided therapy (IGT) methods.

[0009] In the context of this application, the term "synthetic training data" should be understood broadly. This term generally refers to training data generated at least in part by means other than medical imaging scans utilizing an imaging modality, particularly training data generated by software modules. For example, synthetic training data may also include images that are partially synthetic and partially obtained by an imaging modality. Furthermore, synthetic training data may include images derived from images obtained through medical imaging scans, but which have been processed, for example, to artificially degrade image quality or to alter the overall appearance of the images. Training pairs can be provided so that they can be used for training, particularly supervised training. Alternatively, training pairs can be provided for validating trained machine learning algorithms. In the context of this invention, this alternative may also be included by the term "training data." Therefore, the term "training data" may also include its use for validation purposes.

[0010] Training data can be used to train machine learning algorithms for the software module. Machine learning algorithms can be, for example, neural networks (NNs), particularly convolutional neural networks (CNNs). In this context, the term "software module" must be understood broadly, as it can generally describe a component of a software architecture that includes one or more software modules. Optionally, a software module can also be the only software module in the software architecture. The software module is configured to perform image processing on images acquired through a medical imaging modality. Therefore, the software module can include program code configured to perform image processing. In particular, the software module includes machine learning algorithms.

[0011] In this context, the term "medical image processing" must be understood broadly. This term generally describes the processing of medical image data, such as image reconstruction, image quality enhancement, and artifact removal.

[0012] The ground truth image is obtained at least partially from image data acquired by an improved variant of the medical imaging modality or by another technique besides the medical imaging modality. "Partially obtained" can be specifically understood as a portion of the image being acquired by the corresponding imaging modality or technique. Therefore, a portion or all of the ground truth image can be obtained from the corresponding modality or technique. An improved variant of the medical imaging modality can be understood as an improvement in at least one aspect of the imaging modality. For example, an improved variant may include a better scanner, such as one capable of scanning at higher resolution, producing fewer image artifacts, achieving better temporal resolution, and / or producing images with lower noise levels. For example, real conventional CT images can be acquired as CT images using a high-resolution scanner, and the ground truth image can be processed using AI denoising to obtain the ground truth image. Advantageously, in this example, the high-resolution scanner acquiring these images does not need to be the same type of scanner used in the imaging modality that the software module will later use. An improved variant can also be another model of the same type of imaging modality that is improved in one aspect. In some respects, the other model can also be equivalent or even worse, for example, in aspects that are not critical for the purpose of image quality or for a particular aspect of image quality. For instance, when the goal is to achieve low noise levels by training a machine learning algorithm with training data, temporal resolution may not be critical for the process. Another technique can be another imaging modality. For example, one imaging modality could be a CT-based modality, while another could be an MRI-based modality. Another technique can also be understood as another technical variation using the same imaging modality.

[0013] The baseline ground truth image can be obtained entirely or partially from natural image data. The term "natural image data" should be understood broadly in the context of this invention, and it generally describes images that humans might observe in the real world. For example, a natural image can depict a landscape, object, or animal, or parts thereof. For example, a natural image could be a depiction of sand or a depiction of a part of a tiger. In particular, natural image data can be or include optical images captured using an optical camera that includes an infrared camera and a video stream. Thus, natural images can also include frames extracted from optical / natural video. Therefore, the term "natural image data" can describe image data not acquired through a medical imaging modality, particularly not through an imaging modality of the same kind as the medical imaging modality for which the training data will be used. Surprisingly, it has been found that training data generated from natural images can be used to achieve feasible training results, thus making the provision of actual medical images unnecessary. Therefore, when using natural image data, access to high-resolution imaging data, such as CT projection data acquired using high X-ray doses, is not necessarily required. Advantageously, natural image data is often readily available and has high resolution and very little noise. Therefore, it is relatively easy to provide many training pairs with very good image quality, particularly high resolution and low noise. The high image resolution achievable using natural images (typically higher than that possible from X-ray-based medical imaging modalities (e.g., due to dose limitations)) can have a fundamental advantage, enabling machine learning algorithms to improve not only perceived image sharpness but also real-world physical spatial resolution. Thus, in some cases, even beyond the advantage of having more training data, training performance can be improved by utilizing natural image data, due to the quality of the training data itself. Before applying natural images, they can be converted to a specific color code, such as grayscale, corresponding to the typical color code of the medical imaging modality. Additionally or alternatively, training data based on natural image data can also be used to supplement the training data based on images obtained through the medical imaging modality.

[0014] To obtain the input training images, the image quality of the benchmark ground truth image is degraded. Therefore, the input training images are degraded versions of the benchmark ground truth image. Degradation can generally be understood as any kind of reduction in image quality and / or perceived image information. Degradation can be selected to correspond to one or more types of image degradation typically associated with a medical imaging modality. Therefore, degradation can be selected based on the medical imaging modality and / or the quality level of the medical imaging modality to be used. Thus, degradation can be an image quality of the degraded image corresponding to the typical image quality available in the medical imaging modality. Therefore, the training pair can be adapted to train a machine learning algorithm to improve the image quality of images from the medical imaging modality. Advantageously, the input training images can therefore be generated based on the benchmark ground truth image.

[0015] According to embodiments, degrading image quality includes adding noise to image data, adding image artifacts to image data, and / or reducing the resolution of image data. When adding image artifacts, training pairs can be adapted to train machine learning algorithms to reduce or remove image artifacts. In CT-related cases, image artifacts can be, for example, cone-beam artifacts. Adding noise can be used to generate training pairs for training machine learning algorithms to reduce or remove noise. For example, noise can be added by adding noise with statistical characteristics typical of the imaging modality (e.g., statistical characteristics typical of CT images). Reducing resolution can be used to generate training pairs that are used to train machine learning algorithms to improve image resolution and / or deblur and sharpen images. In particular, as described herein, the reduction and / or reduction of resolution can be applied directly to image or projected image data. For example, a loss of resolution or blurring can be achieved by applying a systematic point spread function (PSF). Additionally or alternatively, additional degradation can be applied. For example, motion can be added to the image data in addition to or instead of the degradation described above. For example, motion can be generated by using dynamic content from natural video. Degradation can include applying only one or a combination of degradation methods.

[0016] According to one embodiment, reducing image quality includes: Image data is forward-projected into the projection space of a human-designed imaging modality. Degrading the image quality of image data projected in the projection space. Reconstruct the image space from the projected and degraded image data.

[0017] Preferably, the artificially designed imaging modality is designed to provide higher image quality than the medical imaging modality. For example, the artificially designed imaging modality can have a higher spatial resolution, which can specifically correspond to a smaller pixel / voxel size in physical space. In the context of this invention, the projection space will be understood broadly. The projection space can describe any kind of image space corresponding to the raw image data obtained through the artificially designed imaging modality, particularly image data prior to image reconstruction. The projection space can also be referred to as the projection domain. For example, in the case of CT imaging, the projection space can be an imaging space including, for example, the X-ray projection detected by the X-ray detector of a CT system. The higher spatial resolution of the CT system can correspond to smaller detector pixels. Therefore, the projection space is designed to thus simulate the projection space of the corresponding imaging modality. It has been found that degrading image quality in the projection space can cause more realistic image degradation compared to degrading directly in the image space. This is due to the fact that degradation is applied to a more "raw" state of the data, and therefore better corresponds to real image degradation. By projecting the image forward back into the image space, the artificial loss of quality is less "obvious" for image improvement algorithms. Therefore, the algorithm must learn to handle real-world imaging problems. For example, it can simulate more realistic resolution loss or more realistic noise. Thus, the training data obtained through this embodiment can lead to better training results, such as a more robust trained algorithm that can handle real-world problems particularly well.

[0018] Alternatively, when image data acquired via an improved variant of the medical imaging modality is available in the projection space, the step of forward projection into the projection space can be omitted, and the data can be processed directly in the projection space. For example, this method may be advantageous if a high-resolution CT scanner is available. In this case, raw CT data with high resolution can be acquired directly, and low-resolution / low-exposure data can be simulated in the projection space. Therefore, instead of receiving a reference ground truth image, image data in the projection space can be received, and degrading image quality can include: Degrading the image quality of image data in the projection space. The degraded image data is reconstructed into the image space.

[0019] According to an embodiment, in the projection space, image quality is degraded by downsampling the projected image data to a lower image resolution, particularly an image resolution achievable through a medical imaging modality. For example, in the case of a CT system, the lower spatial resolution of the CT system may correspond to a larger detector pixel size.

[0020] According to an embodiment, in the projection space, image quality is degraded by performing a low-dose simulation that introduces noise corresponding to the medical imaging modality. Medical imaging modalities may have their own characteristic types of noise that distinguish them from other types of noise. For example, the noise may have a characteristic distribution in the frequency space. The noise power at different frequencies may differ for different modalities. Therefore, applying the corresponding noise typical of the imaging modality may be advantageous in order to create realistic data. The noise may even be more realistic when applied in the projection space (i.e., the noise corresponding to the original data), rather than attempting to apply noise in the reconstructed image.

[0021] According to an embodiment, the baseline ground truth image is obtained at least in part from image data taken from natural video, wherein the temporal dimension of the natural video is transformed into a third spatial dimension to obtain three-dimensional image data. Advantageously, natural video can provide high image resolution and low noise levels, and is therefore well-suited for this purpose. The temporal dimension of the video (i.e., the third dimension) can be used to generate third-dimensional data of the three-dimensional image data (i.e., the three-dimensional image volume). Therefore, using natural video in this way can advantageously allow for the generation of synthetic training data even for three-dimensional volumes, with relatively low effort required. Before applying the natural video, it can be converted into a specific color code, such as grayscale, corresponding to the typical color code of the medical imaging modality.

[0022] According to an embodiment, the ground truth benchmark image is generated by receiving a medical image obtained through a medical imaging modality or another medical imaging modality, and replacing at least a portion of the medical image with at least one image portion taken from another image source (particularly from at least one natural image). Preferably, several or all portions may be replaced. Thus, the ground truth benchmark image is partially synthesized and partially based on real medical images. In particular, the ground truth benchmark image may preserve the basic structure of the medical image, but replace the content of the structure with texture obtained from another image source (particularly from one or more natural images). Replacing portions of the medical image allows for a larger amount of training data to be provided while still maintaining the realistic anatomical structure in the image. Thus, a large training dataset can be generated based on a small amount of available training dataset. It has been found that using natural images can be particularly advantageous because they typically have more diverse fine structures and textures. Therefore, a rich variety of data can be generated. Training with these partially replaced images can benefit from the diverse structures of natural images, and may even result in a more robust trained algorithm.

[0023] According to one embodiment, replacing at least a portion of a medical image includes: Segmenting at least one anatomical structure, particularly at least one organ, in a medical image in order to create at least one segmentation mask. The image data of the medical image within at least one segmentation mask is replaced by at least one image portion taken from at least one natural image from another source.

[0024] The segmentation step can be performed according to methods known in the prior art. In particular, the method can be based on the application of a trained convolutional network. An example of a segmentation method that can be applied similarly or by variation here is TotalSegmentator published by Jakob Wassertahl et al. in “TotalSegmentator: robust segmentation of 104 anatomical structures in CT images” (arXiv:2208.05868 [eess.IV], 2022). Thus, it is possible to segment almost all organs in the human body. This approach is likely most advantageous when multiple segments are applied (preferably as many segments as possible). Applying more segments and thus replacing more parts of the image can result in more diverse training data and thus better training results. However, this approach may already be advantageous when only a few or even just one segment is applied. The segmentation step can specifically produce an image set including medical images and segmentation masks. Replacing the image data can include cropping image data within the segmentation mask and subsequently pasting an image from another source (especially a natural image) into the segmentation mask. Advantageously, a coarse estimate of the maximum structure (i.e., represented by at least one segmentation mask) can be obtained from actual scans of the medical imaging modality, and additionally, fine structures from high-quality natural images can be filled into these structures. Therefore, this embodiment can allow for the acquisition of rich and high-quality structures (e.g., structures from natural images) without violating common placement rules of organs within the human body (since the segmentation mask ensures the general structure of real medical images is preserved). This could be crucial for simulating different modality-specific artifacts and training machine learning algorithms to address modality-specific image formation problems.

[0025] According to an embodiment, regions located outside at least one segmentation mask are clustered together into at least one cluster region, wherein pixels or voxels within the same cluster region are assigned a constant pixel value or voxel value, said constant pixel value or voxel value optionally being the average of the original pixels or voxels within the cluster region. In many cases, segmenting each organ on a medical image may be impractical. Therefore, for the remainder of the medical image, clustering according to this embodiment can be applied. This can allow, or even fill, unsegmented regions. For example, pixels or voxels in a cluster can be assigned the average value within the cluster region. This may result in a piecewise constant approximation of unsegmented regions. Advantageously, the assignment of pixel values ​​or voxel values ​​can further allow for labeling of the corresponding regions. Clustering and applying new pixel / voxel values ​​has the advantage of reducing association with the original patient from whom the image was taken. On the one hand, this can therefore produce a more diverse training dataset. On the other hand, it can therefore better protect the privacy of the original patient.

[0026] According to an embodiment, the dynamic range of at least one image portion of at least one natural image is adjusted based on the dynamic range of the medical image data within a segmentation mask. This embodiment can also be applied when the image portion is taken from any other source (i.e., not from a natural image). The dynamic range of the original portion of a medical image may vary depending on the corresponding medical image and the corresponding segmented portion, such as an organ. For example, some body parts may appear bright, while others may appear quite dark. Furthermore, typical variations in brightness and / or contrast within different segments may differ. On the other hand, random natural images may have different dynamic ranges. Therefore, by adjusting the dynamic range of the image portion taken from a natural image, the result may be more realistic, resulting in a synthesized image with a dynamic range comparable to a real medical image. This may lead to better training results when training a machine learning algorithm using the corresponding training data. For example, in a first sub-step, the dynamic range of the medical image data within a segmentation mask can be calculated based on the original imaging data values. In subsequent sub-steps, the dynamic range of the image portion can be adjusted. For example, this adjustment can be provided by matching a histogram of the dynamic range and / or by rescaling the dynamic range according to the quantiles of the dynamic range distribution. Additionally or alternatively, image portions can be further adjusted. For example, image portions can be flipped, rotated, and / or deformed. This can provide a wider variety of training data.

[0027] According to an embodiment, at least one property of the segmentation mask is modified within a predefined boundary before replacing image data with the image portion. Modifying the property of the segmentation mask can advantageously allow for a more diverse dataset. For example, in this way, a limited set of imaging data can be expanded into a larger and richer imaging dataset. Predefined boundaries can be defined such that the modification remains within realistic properties. In particular, the shape, position, orientation, and / or size of at least one segmentation mask can be modified. The position can be changed by displacing the segmentation mask within the medical image. For example, the maximum permissible displacement can be constrained relative to the segmentation size or by an absolute value. The orientation of the segmentation mask can be changed by rotating the segmentation mask. The rotation can be limited by a maximum permissible rotation, for example, a specific value given in degrees. The size of the segmentation mask can be changed by adjusting the mask size. The size adjustment can be limited by a maximum permissible increase or decrease in size, for example, relative to the original size of the segmentation mask. For example, the shape of the segmentation mask can be changed by distorting the segmentation mask. Changes to one, several, or all of the shape, position, orientation, and size can be applied. Other properties can also be altered; for example, the dynamic range within the segmentation mask can be adjusted, for instance, by adjusting the percentile of the original dynamic range within the segmentation mask. This change can be performed randomly within predefined boundaries. Which changes are applied (i.e., changes to shape, position, orientation, and / or size) can be selected randomly. The number of changes applied can be selected randomly.

[0028] According to another aspect of the invention, a computer-implemented method is provided for providing synthetic training images, particularly benchmark ground truth images, for training a machine learning algorithm used to perform image processing on images acquired through a medical imaging modality. The method includes the following steps: Receive medical images acquired through a medical imaging modality or another medical imaging modality; and At least a portion of a medical image is replaced with at least one image portion taken from at least one image from another image source (particularly from at least one natural image) in order to generate a synthetic training image.

[0029] Synthetic training images can then be provided. Furthermore, corresponding input training images can be generated by degrading the synthetic training images as described herein. Natural images may be particularly advantageous as replacement image portions because they typically possess rich texture and potential high resolution, along with relatively low noise levels. However, other sources for at least one image portion are generally also conceivable.

[0030] According to an embodiment, replacing at least a portion of a medical image includes: Segmenting at least one anatomical structure, particularly at least one organ, in a medical image in order to create at least one segmentation mask. The image data of a medical image within at least one segmentation mask is replaced by at least one image portion from at least one image source (particularly at least one natural image).

[0031] According to an embodiment, regions located outside at least one segmentation mask are clustered together into at least one clustering region, and pixels or voxels within the same clustering region are assigned a constant pixel value or voxel value, which is optionally the average value of the original pixels or voxels within the clustering region.

[0032] According to an embodiment, the dynamic range of at least one image portion is adjusted based on the dynamic range of the medical image data within the segmentation mask.

[0033] According to an embodiment, before replacing image data with image portions, at least one property of the segmentation mask is changed within a predefined boundary, wherein, in particular, the shape, position, orientation, and / or size of at least one segmentation mask is changed.

[0034] According to another aspect of the invention, a method is provided for generating training weights to be used in a software module including a machine learning algorithm. The training weights are generated by training the machine learning algorithm using synthetic training data obtained according to the method for generating synthetic training data described herein. Optionally, additional training data, such as based on real medical image data, can be used to train the machine learning algorithm. The machine learning algorithm can be trained to correct cone-beam artifacts in CT image data. The machine learning algorithm can be trained to perform deblurring of image data. The machine learning algorithm can be trained to perform deblurring in image space using reconstructed image data, or in projection space (particularly raw image data obtained by an imaging device). Training data in projection space can be generated by forward projecting projection space image data from image space to projection space. The machine learning algorithm can be trained to perform upsampling of image data to achieve higher image resolution. The machine learning algorithm can be trained to perform upscaling in image space using reconstructed image data, or in projection space (particularly raw image data obtained by an imaging device). The machine learning algorithm can be trained to perform motion correction of image data. Motion in the training data can be generated by manipulating segmentation masks (e.g., shifting, rotating, and / or distorting masks) or by using dynamic content from natural videos.

[0035] According to another aspect of the invention, training weights generated by the method for generating training weights as described herein are provided. Therefore, training weights can be the result of training a machine learning algorithm using training data.

[0036] According to another aspect of the invention, a non-transient storage medium is provided thereon storing training weights generated by the method for generating training weights described herein. For example, the storage medium and / or the storage medium mentioned herein may be an optical storage medium, a magnetic-based storage medium, a magnetoelectric storage medium, a magneto-optical storage medium, or a solid-state medium, etc.

[0037] According to another aspect of the present invention, a software module having a machine learning algorithm including training weights as described herein is provided.

[0038] According to another aspect of the invention, a non-transient storage medium is provided, on which a software module having a machine learning algorithm is stored, the machine learning algorithm including training weights generated by the method for generating training weights described herein.

[0039] According to another aspect of the invention, there is a use for a software module as described herein and / or training weights as described herein and / or a non-transient storage medium on which the training weights are stored, as described herein, for image processing of medical image data obtained through a medical imaging modality.

[0040] Advantageously, the quality of the trained machine learning algorithm in the software module can be improved because a larger amount of training data can be provided by using synthetic training data. Furthermore, it has been found that using natural images, as described in this paper, as the basis for training data results in better robustness of the machine learning algorithm, even when a considerable amount of training data is applied. This can be attributed to the more diverse structural details of natural images. Therefore, the provision of synthetic training data can still influence the trained algorithm even during its application.

[0041] According to another aspect of the present invention, an image processing apparatus is provided, the image processing apparatus including software modules as described herein and / or a non-transient storage medium on which the software modules are stored, as described herein. The image processing apparatus may, for example, be a control unit of a personal computer, tablet computer, smartphone, server, or medical imaging system.

[0042] According to an embodiment, the image processing device is configured to enhance the image quality of a medical image region, which includes medical image data acquired through a medical imaging modality. The image processing device may be specifically configured to perform the following steps: (a) Receive image data from within a medical image region; (b) Apply a noise reduction algorithm to the image data of the medical image region to generate noise-reduced image data (5); (c) Apply a deblurring algorithm to the denoised image data (5) to produce enhanced image data; (d) Provide enhanced image data.

[0043] In the context of this invention, the term "medical image region" should be understood broadly. This term generally describes a region of a medical image. The region may extend across the entire medical image or only across a portion of it. The medical image region can be a two-dimensional or three-dimensional region. Correspondingly, the medical image can be a two-dimensional image or a three-dimensional image, which can also be described as a (three-dimensional) image volume. Image data can be received, for example, from an external storage device or server, from an imaging modality (e.g., directly after acquisition), from within the image processing device itself performing the method, and / or from another source. The denoising algorithm can be configured to output image data with reduced noise based on the input image data. "Image data with reduced noise" can also include noise-free image data. Preferably, the denoising algorithm can be a trained machine learning algorithm, particularly a trained neural network. In principle, any kind of denoising algorithm can be applied. For example, the denoising algorithm can be configured to directly determine the denoised image. Alternatively, the denoising algorithm can be configured to determine the noise in the input image. In this alternative, noise can be subtracted from the input image to obtain the denoised image. In the context of this invention, the term "deblurring algorithm" should be understood broadly. The term generally refers to any algorithm that enhances image data, particularly enabling users to identify more information or more easily within a medical image region. Deblurring algorithms can be configured to provide image data with more identifiable detail and / or in a less blurred state. It has been found that in many cases, image data includes information invisible to human vision. However, because this information is stored within the image data, it can be identified via deblurring algorithms. Image enhancement (e.g., enhancing anatomical structures within an image region) can enable users (e.g., radiologists) to provide faster and more reliable diagnoses for a given pathology. For example, a standard-sized grid reconstruction (e.g., 512×512 in the axial plane) may not be fine enough for users to fully utilize the image information. Therefore, providing enhanced images, such as more detailed anatomical structures, allows users to provide better analysis and diagnosis. For example, image data enhancement can be performed such that the enhanced image data simulates a medical image region reconstructed on a finer-sized grid. It has been found that the likelihood of identifying and enhancing this information via deblurring algorithms can critically depend on prior noise reduction of the image via denoising algorithms, leading to significantly better results. Advantageously, the denoising algorithm and / or deblurring algorithm can be trained using training data as described herein.

[0044] According to an embodiment, an image processing device is configured to provide a user with medical image data acquired through a medical imaging modality. The method includes the following steps: Image reconstruction that provides medical image data to a user on a screen and a user interface, the user interface having at least one input device that allows the user to select a region of interest within the image reconstruction; Receive a selection of a region of interest, and generate image data within that region of interest based on that selection; (Optionally, according to the embodiments for enhancing image quality in medical image regions as described herein) enhance the image quality of image data within the region of interest to generate enhanced image data; Image reconstruction of medical image data presented to the user on one screen or another, wherein the image reconstruction of the region of interest is based on the enhanced image data.

[0045] Advantageously, according to this embodiment, data processing time and associated hardware costs can be reduced. It has been recognized that for some clinical applications, spatial resolution is not necessarily required in all segments of the reconstructed image / volume. In particular, not all clinical applications require high-resolution representation of some or all data (compared to the possibilities of imaging modalities such as computed tomography) to achieve a sufficient level of diagnostic confidence for a particular disease. Therefore, for high-resolution imaging modalities, such as high-resolution CT systems, reconstructing, transmitting, and storing such a large amount of image data is often redundant. Therefore, specific image structures within a region of interest can be selectively enhanced. Using this method, this aspect can be used to achieve data representations that are potentially advantageous in terms of the required data storage and transmission. Therefore, the reconstruction, transmission, and / or storage footprint of the corresponding data can be reduced.

[0046] Features and advantages of different embodiments can be combined. Features and advantages of one aspect of the invention (e.g., a method for generating synthetic training data) can be applied to other aspects (e.g., a method for generating training weights, training weights, software modules, image processing apparatus, uses, and / or non-transient storage media), and vice versa. Attached Figure Description

[0047] The invention will now be described with reference to the accompanying drawings, in which: Figure 1 A flowchart is shown of a computer-implemented method for generating synthetic training data for training a machine learning algorithm for a software module, according to an embodiment of the present invention, the software module being used for image processing of images acquired through a medical imaging modality; Figure 2 A schematic diagram of a computer-implemented method for generating synthetic training data for training a machine learning algorithm for a software module, according to another embodiment of the present invention, is shown, the software module being used for image processing of images acquired through a medical imaging modality. Figure 3 A flowchart is shown of a computer-implemented method for generating synthetic training data for training a machine learning algorithm for a software module, according to another embodiment of the present invention, the software module being used for image processing of images acquired through a medical imaging modality; Figure 4 Examples of images processed at different processing stages by a method according to embodiments of the present invention are shown; Figure 5 A schematic diagram is shown of a method for generating training weights to be used in a software module that includes machine learning algorithms; Figure 6 The image above shows a noisy simulated clinical scan, and the image below shows a denoised version after applying a trained denoising model, which was trained using training data generated according to the method of the present invention; and Figure 7 Above is another noisy simulated clinical scan image, and below is a denoised image after applying a trained denoising model, which was trained using training data generated according to the method of the present invention.

[0048] List of reference numerals in the attached diagram: 1. Machine Learning Algorithms 3 Natural Images 4. Natural Image Section 5. Benchmark True Value Image 6. Input training images 10. Segmentation Mask Image data of 51 projections 61 projected and degraded image data 110-130 Method and Steps Methods and steps (210-230) 310-330 Methods and Steps Detailed Implementation

[0049] In all the accompanying drawings, the same or corresponding features / elements of various embodiments are indicated by the same reference numerals.

[0050] Figure 1A flowchart illustrating a computer-implemented method according to an embodiment of the invention for generating synthetic training data for training a machine learning algorithm 1 of a software module for image processing of images acquired via a medical imaging modality is shown. In a first step 110, a ground truth image 5 is received. The ground truth image 5 is obtained at least in part from image data acquired via a modified variant of the medical imaging modality or another technique besides the medical imaging modality. Preferably, the ground truth image 5 can be a high-resolution, noise-free image, optionally a 3D image volume. In particular, the ground truth image 5 can be taken from natural image data. In an advantageous optional variant, the ground truth is obtained at least in part from image data taken from natural video. In this variant, the temporal dimension of the natural video is converted to a third spatial dimension to obtain three-dimensional image data. Thus, the temporal domain of the video can be used to generate the third spatial dimension of the image data. In another step 120, the image quality of the ground truth image 5 is artificially degraded to obtain an input training image 6. Degradation may include, for example, one or more of adding noise to the image data, adding image artifacts to the image data, and reducing the resolution of the image data. In another step 130, the input training image 6 is provided together with the benchmark ground image 5 as a training pair as training data.

[0051] Figure 2A schematic diagram of a computer-implemented method for generating synthetic training data for training a machine learning algorithm 1 of a software module according to another embodiment of the invention is shown, the software module being used for image processing of images acquired through a medical imaging modality. In a first step 210, a reference ground truth image 5 is received. In this example, the reference ground truth image 5 is a natural image 3, i.e., in this case, an optical image of a tiger's head. Typically, images from another source can also be used alternatively, preferably high-resolution images. Optionally, the reference ground truth can be obtained at least partially from image data taken from a natural video, such that the temporal dimension of the natural video is converted to a third spatial dimension to obtain three-dimensional image data, i.e., a three-dimensional image volume. In another step 221, the image data (i.e., the reference ground truth image 5 here) is forward-projected into the projection space of an artificially designed imaging modality, thereby generating projected image data 51. The artificially designed imaging modality can, for example, be a computed tomography detector with an artificially high resolution. In another step 220, the image quality of the projected image data 51 is degraded in the projection space. This is done by downsampling the projected image data 51 to an imaging modality with a lower resolution. Furthermore, low-dose simulation is applied to introduce noise into the image data. As a result, noisy, low-resolution projected image data 61 is obtained. This embodiment can also be applied to reduce resolution only or apply noise only. In another step 222, the projected and degraded image data 61 is then reconstructed back into image space to obtain the input training image 6. Compared to the ground truth image 5, the input training image 6 is more blurred and noisier. In another step 230, the input training image 6, together with the ground truth image 5, is provided as a training pair as training data.

[0052] Figure 3A flowchart illustrating a computer-implemented method according to another embodiment of the present invention for generating synthetic training data for training a machine learning algorithm 1 of a software module for image processing of images acquired through a medical imaging modality is shown. In a first step 311, a medical image acquired through the medical imaging modality or another medical imaging modality is received. Furthermore, in another step 312, natural image data is received, i.e., one or more natural images 3 or one or more portions of one or more natural images 3. The natural image 3 may be enhanced and / or rescaled. Enhancement may include, for example, flipping, rotating, and / or deforming. In another step 313, the anatomical structures of the medical image (e.g., including one or more organs) are segmented to create a segmentation mask 10. For example, a segmentation method using a convolutional neural network (CNN) known in the art may be applied to generate the segmentation mask 10. In another step 314, the dynamic range of one or more natural images 3 or portions of natural images 3 is adjusted based on the dynamic range of the medical image data within the segmentation mask 10. For example, the dynamic range within the segmentation mask 10 can be calculated based on the original imaging values, and then the dynamic range of the natural image 3 or a portion of the natural image 3 can be adjusted based on this calculation. For example, the dynamic range can be adjusted by matching histograms or by rescaling according to quantiles. Optionally, in optional step 315, at least one property of the corresponding segmentation mask 10 can be changed within predefined boundaries. For example, the shape, position, orientation, and / or size of the segmentation mask 10 can be changed. In another step 316, the image data of the medical image within the segmentation mask 10 is replaced with the natural image 3 or a portion of the natural image 3. Preferably, the selection of the natural image 3, adjustment of the dynamic range, and replacement of the segmentation mask 10 are applied separately for each segmentation mask 10. In another optional step 317, regions located outside the segmentation mask 10 can be clustered together into at least one cluster region, and pixels or voxels within the same cluster region can be assigned constant pixel or voxel values. This can be an option, for example, if segmenting each organ is not feasible in some cases. For example, constant pixel or voxel values ​​can be the average of the original pixels or voxels within the clustered region. In particular, if several clustered regions are applied, this may result in a piecewise constant approximation of the non-segmented regions. Therefore, the image obtained by replacing a portion of the medical image with a portion of the natural image 3 or a portion of the natural image 3 can then be used as a ground truth image 5 for training data. It has been found that the resulting image is a high-quality ground truth image 5 consistent with the real image obtained from the corresponding imaging modality. Therefore, in another step 318, a ground truth image 5 obtained partially from image data acquired using a technique other than the medical imaging modality is provided. In another step 320, the image quality of the ground truth image 5 is degraded to obtain the input training image 6.Finally, in another step 330, the input training image 6 is provided together with the benchmark ground image 5 as a training pair as training data.

[0053] Figure 4 Examples of images processed at different processing stages according to an embodiment of the invention are shown. First image 401 is a computed tomographic image showing several anatomical parts of the human body (particularly organs). In second image 402, several segmentation masks 10 are applied to label different anatomical parts of the body. The segmentation masks 10 are determined using prior art CNN-based segmentation methods. In the next image 403, voxels within the segmentation masks 10 are cropped, and these voxels are depicted here as black areas. Optionally, to generate even more diverse data, the size of one or more or all of the individual segmentation masks 10 may be (randomly) shifted, rotated, or adjusted at this stage. In this way, a limited CT dataset can be expanded into a larger and richer CT dataset. In the next image 404, voxels not within the segmentation masks 10 are clustered into two distinct cluster regions with two colors (here, light gray and dark gray), each assigned a color. Within each cluster, the average value of the original voxels within the cluster is assigned to the voxels. In the next image 405, the segmented region is filled with a portion of the natural image 3 that has already been pasted into the segmentation mask 10. Before pasting the natural image data into the CT scan, the dynamic range has been adjusted based on the actual CT values ​​within the corresponding segmentation mask 10. Therefore, synthetic medical images with different organ structures but realistic overall structure and dynamic range have been provided. Thus, the new image is suitable as part of the training dataset for training machine learning algorithm 1. The resulting image is a high-quality, consistent ground truth benchmark for CT scans. By simulating degradation caused by different CT scans, a very suitable degraded-clean image pair can be obtained for other learning-based methods (e.g., CNNs). For example, noise, motion artifacts, etc., can be added to the ground truth benchmark image 5.

[0054] Figure 5A schematic diagram is shown of a method for generating training weights to be used in a software module including machine learning algorithm 1. The training weights are generated by training machine learning algorithm 1 using synthetic training data, which includes a training pair of a ground truth image 5 and an input training image 6. The synthetic training data is generated based on image data of portion 4 of a natural image 3. Natural image 3 shows a tiger's head, and portion 4 of natural image 3 shows a part of the tiger's head. Therefore, the ground truth image 5 corresponds to portion 4 of natural image 3. On the other hand, the input training image 6 is a degraded version of the ground truth image 5, making it blurry compared to the ground truth image 5. Therefore, machine learning algorithm 1 can be trained using this training pair and more equivalent training pairs.

[0055] The synthetic training data generated using the method according to embodiments of the present invention is tested to verify the usefulness of the method. For this test example, in accordance with... Figure 3 The denoising model was trained on seven synthetic image cases generated by the described method. All training data were based on a single medical image case, from which seven different baseline ground truth images and corresponding training data were generated. The resulting model was then applied sequentially to simulated clinical patient cases including real-world noise. The results below show the noisy input image and the denoised image obtained using the created model. Figure 6 and Figure 7 Two different examples of experimental results are shown. In the two figures, a noisy simulated clinical scan image is shown above, and a denoised image after applying the trained denoising model is shown below. In both cases, the reduction of noise in the images (resulting in enhanced medical images) is clearly visible. Both example images clearly demonstrate the feasibility and value of the proposed synthetic data generation method for image denoising use cases. As outlined in this paper, the method can also be used to train models for correcting other problems or artifacts in medical image data, such as computed tomography data.

[0056] The above discussion is intended to illustrate the method and system only and should not be construed as limiting the appended claims to any particular embodiment or group of embodiments. Therefore, the specification and drawings should be considered illustrative and not intended to limit the scope of the appended claims. By studying the drawings, disclosure, and appended claims, those skilled in the art will be able to understand and implement other variations of the disclosed embodiments in the practice of the claimed invention. In the claims, the word "comprising" does not exclude other elements or steps, and the quantifiers "a" or "an" do not exclude a plurality. A single processor or other unit can perform the functions of several items recited in the claims. The mere fact that certain measures are recited in dissimilar dependent claims does not indicate that combinations of these measures cannot be advantageously used. Computer programs can be stored / distributed on suitable media, such as optical storage media or solid-state media supplied with or as part of other hardware, but can also be distributed in other forms, such as via the Internet or other wired or wireless telecommunications systems. No reference numerals in the claims should be construed as limiting the scope.

Claims

1. A computer-implemented method for generating synthetic training data for training a machine learning algorithm (1) of a software module, in, The software module is configured to perform image processing on images acquired through a medical imaging modality. The method includes the following steps: (a) Receive a reference truth image (5) generated by: receiving a medical image obtained by the medical imaging modality or another medical imaging modality, and replacing at least a portion of the medical image with at least one image portion taken from at least one natural image (3); (b) Artificially degrading the image quality of the baseline ground truth image (5) to obtain the input training image (6); and (c) The input training image (6) and the baseline ground truth image (5) are provided together as a training pair of the training data.

2. The method according to claim 1, in, Degrading image quality includes adding noise to the image data, adding image artifacts to the image data, and / or reducing the resolution of the image data.

3. The method according to claim 1 or 2, in, Degrading the image quality includes: The image data is forward-projected into the projection space of a human-designed imaging modality. The image quality of the image data (51) projected in the projection space is degraded. The projected and degraded image data (61) is reconstructed back into the image space.

4. The method according to claim 3, in, In the projection space, the image quality is degraded by downsampling the projected image data (51) to a lower image resolution, which is in particular the image resolution available through the medical imaging modality.

5. The method according to claim 3 or 4, in, In the projection space, the image quality is degraded by performing a low-dose simulation that introduces noise corresponding to the medical imaging modality.

6. The method according to any one of the preceding claims, in, The baseline ground truth image (5) is obtained at least in part from image data taken from natural videos, wherein the temporal dimension of the natural videos is converted into a third spatial dimension in order to obtain three-dimensional image data.

7. The method according to claim 1, in, Replacing at least a portion of the medical image includes: Segment at least one anatomical structure, particularly at least one organ, of the medical image in order to create at least one segmentation mask (10). The image data of the medical image within the at least one segmentation mask (10) is replaced by at least one image portion (4) of the at least one natural image (3).

8. The method according to claim 7, in, Regions located outside at least one segmentation mask (10) are clustered together into at least one cluster region, wherein pixels or voxels within the same cluster region are assigned a constant pixel value or voxel value, the constant pixel value or voxel value optionally being the average value of the original pixels or voxels within the cluster region.

9. The method according to any one of claims 7 to 8, in, The dynamic range of at least one image portion (4) of the at least one natural image (3) is adjusted based on the dynamic range of the medical image data within the segmentation mask (10).

10. The method according to any one of claims 7 to 9, in, Before replacing the image data with the image portion, at least one property of the segmentation mask (10) is changed within a predefined boundary, wherein, in particular, the shape, position, orientation and / or size of the at least one segmentation mask (10) is changed.

11. A method for generating training weights to be used in a software module including a machine learning algorithm (1), in, The training weights are generated by training the machine learning algorithm (1) using synthetic training data obtained according to any of the preceding claims.

12. The training weights generated by the method according to claim 11.

13. A software module having a machine learning algorithm (1), the machine learning algorithm comprising the training weights according to claim 12.

14. An application of the software module according to claim 13 and / or the training weights according to claim 12, the application being to perform image processing on medical image data obtained through a medical imaging modality.

15. An image processing apparatus comprising the software module according to claim 13.