Contrast enhancement using machine learning

A machine learning system enhances image contrast and reduces artifacts using generative adversarial networks, addressing the need to minimize contrast agent use in imaging while maintaining diagnostic quality and cost-effectiveness.

JP7876541B2Active Publication Date: 2026-06-19KONINKLIJKE PHILIPS NV

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
KONINKLIJKE PHILIPS NV
Filing Date
2022-02-14
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing imaging technologies rely heavily on contrast agents, which pose health risks and are costly, necessitating a reduction in their use without compromising image quality.

Method used

A machine learning-based system that trains a model to enhance image contrast and reduce artifacts using high- and low-contrast image pairs, employing generative adversarial networks to adjust parameters and improve image quality without requiring paired training data.

🎯Benefits of technology

Reduces contrast agent use while maintaining diagnostic quality, enhancing image contrast and reducing artifacts, thus lowering costs and patient discomfort.

✦ Generated by Eureka AI based on patent content.
Patent Text Reader

Abstract

A system for training a target machine learning model for image correction and related methods. The system comprises a framework of two machine learning models (G1, G2) of generative type, one such model (G1) being part of the target machine learning model. The training is based on a training dataset including at least two types of training images, high quality (IQ) images and low IQ images, the training input images being images of high IQ type. The generative network (G1) processes the training input images of high IQ type to generate training output images (I) with reduced IQ. The target machine learning model (TM) further generates a second training output image based on the training output image (I) and the training input images of high IQ type. The second generator network (G2) estimates an estimate of the training input image of high IQ type from the second training output image. A training controller (TC) adjusts the parameters of the machine learning model framework based on the deviation between said estimate of the training input image of high IQ type and said training input image of high IQ type.
Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field] 【0001】 The present invention relates to a training system for training a machine learning model for image correction, a system for image correction, a method for training a machine learning model, a method for image correction, an imaging component, a computer program element, and a computer-readable medium. [Background technology] 【0002】 Contrast agents are used in an increasing number of diagnostic and interventional clinical procedures. Contrast agents ("CM," or "contrasts") are administered intravascularly to patients, for example, during diagnostic imaging examinations in CT (computed tomography), angiography, fluoroscopy, MRI (magnetic resonance imaging), and many other imaging modalities, or to improve tissue visualization in certain therapeutic procedures such as PTCA (percutaneous transluminal coronary angioplasty). 【0003】 In addition, contrast agents are administered orally and / or via other natural orifices, as instructed for the imaging procedure. Tens of millions of radiographic tests using contrast agents (e.g., iodine) are performed annually. For example, it is estimated that in 2010, 44% of all CT scans in the United States, 35% in India, 74% in Japan, and 10-15% in China used iodine contrast agents. 【0004】 In contrast-corrected CT, the use of non-ionic elements as contrast agents may trigger acute allergic reactions. These reactions can range from mild symptoms such as hives and itching to more serious symptoms such as cardiopulmonary arrest. 【0005】 One of the most serious side effects of iodine-based drugs is contrast-induced nephropathy. Nash et al., in "Hospital-acquired renal insufficiency," Am. J. Kidney Dis (the official journal of the National Kidney Foundation), Vol. 39, pp. 930-936 (2002), reported that while not all such cases are necessarily linked to CM administration, contrast-induced nephropathy is the third most common cause of acute renal failure in hospitalized patients. In any case, serum creatinine monitoring before CM administration is usually performed before acquiring contrast-corrected CT data. 【0006】 Therefore, from the perspective of patient health regarding side effects, reducing the amount of contrast agent is beneficial. However, despite these recognized health risks, contrast-corrected imaging (such as in CT) is one of the most useful routine procedures in modern diagnostic imaging. A non-ionic iodine contrast agent or other contrast agent is administered before image data acquisition. Following such administration, contrast-corrected CT data acquisition is initiated manually or triggered by a bolus tracker. [Overview of the Initiative] [Problems that the invention aims to solve] 【0007】 The resulting corrected tomographic slices significantly improve diagnostic quality compared to uncorrected (native) data. In uncorrected CT data, solid organs have similar densities within the range of 40-60 HU, making it difficult to differentiate abnormalities. With contrast-corrected CT, the density of different tissues can increase to over 100 HU. Such improvements in image contrast can be observed, for example, with regard to the detection of tumors and lesions. Contrast correction is also necessary for other diagnostic indicators, such as CT angiography or CT perfusion. 【0008】 In addition to the aforementioned clinical benefits of reducing contrast agent use, there are also significant economic benefits in terms of cost. CM costs can be high in certain regions (US$120 / 100ml in Japan compared to US$20 in North America). Furthermore, while non-ionic and hypoosmolar contrast agents commonly used today are better tolerated by patients in terms of specific effects, they are still more expensive. 【0009】 Because the amount of contrast agent administered to patients is a concern, a range of technical success factors have been developed to reduce contrast agent use, including the use of automated injectors, high-speed boost saline, faster rotation times, wider coverage areas, bolus tracking software, and low kVp protocols and spectral / conventional correction algorithms. However, some of these success factors are expensive, complex, error-prone, or difficult to use. 【0010】 Therefore, alternative systems and methods are needed to address some of the shortcomings mentioned above. In particular, it is necessary to reduce the amount of contrast agent used for imaging. [Means for solving the problem] 【0011】 The object of the present invention is solved by the subject matter of the independent claims, and further embodiments are incorporated into the dependent claims. It should be noted that the embodiments described below of the present invention are equally applicable to systems for image correction, methods for training machine learning models, image correction methods, imaging components, computer program elements, and computer-readable media. 【0012】 According to a first aspect of the present invention, a training system for training a target machine learning model for image correction of medical images, wherein the training system is An input interface for receiving training input images drawn from a training dataset containing at least two types of training images: high-resolution (IQ) images and low-IQ images, wherein the training input images are of the high-IQ type. A machine learning model framework comprising a first generator network and a second generator network, wherein the target machine learning model comprises the machine learning model framework including the first generator network, Here, the first generative network is for processing high-IQ type training input images in order to generate training output images with reduced IQ. The target machine learning ("ML") model is designed to generate a second training output image based on the training output image and the high-IQ type training input image, and the second generator network can operate to estimate the high-IQ type training input image from the second training output image. A training controller capable of adjusting at least one parameter of a machine learning model framework based on the deviation between the estimated value of a high-IQ type training input image and the high-IQ type training input image, and A training system is provided that includes the following features. 【0013】 In one embodiment, the machine learning model framework includes adversarial generative type subnetworks, which include a first generator network and a discriminator network, the discriminator network attempting to discriminate between low-IQ type training inputs and training output images drawn from a set to generate discriminative results, and the training controller is operable to further adjust the parameters of the machine learning model based on the discriminative results. 【0014】 In an embodiment, the target machine learning model is configured to generate a second training output image by combining a training output image and a training input image of a high-IQ type. The combining is performed by a linear combination of the training output image and the training input image of the high-IQ type. Specifically, an optionally scaled difference between the training input image of the high-IQ type and the training output image (with reduced IQ) is added to the training output image with reduced IQ. 【0015】 In an embodiment, the system is configured to process a training input image together with i) at least one image acquisition parameter, ii) at least one reconstruction parameter, and iii) context data describing any one or more of the patient data. 【0016】 In an embodiment, the architecture of the first generator network and / or the second generator network is of a convolutional type. 【0017】 In an embodiment, the architecture of the first generator network and / or the second generator network is of a multi-scale type. 【0018】 In an embodiment, IQ includes any one of i) contrast (level), ii) artifact (level), and iii) noise (level). 【0019】 In an embodiment, a high-IQ image is an image recorded by an imaging device while a contrast agent is present, and a low-IQ image is an image recorded by the imaging device or a certain imaging device while a smaller amount of the contrast agent or a certain contrast agent is present. 【0020】 Therefore, in other embodiments, the two image classes, high IQ and low IQ, relate to classes of images having high contrast and low contrast, or images having low artifact contribution and high artifact contribution, or images having low noise contribution and high noise contribution, respectively. Such two classes, high IQ and low IQ, are also defined with respect to other image quality metrics. Image samples from the high IQ class have a higher IQ than samples from the low IQ class. IQ can be local or global. The average IQ metric of image samples from the high IQ class is higher than the IQ metric of image samples from the low IQ class. With respect to contrast, primarily as an example of IQ assumed herein, in a high-contrast image ("contrast-enhanced image"), the amount / concentration of contrast agent present in the field of view ("FOV") of the imaging device when the image was acquired is higher than the amount / concentration of contrast agent present in the FOV when the low-contrast image ("non-contrast-enhanced image") was acquired. 【0021】 In this embodiment, the high-IQ / low-IQ images in the set are X-ray images or MRI images. 【0022】 In this embodiment, the high-IQ / low-IQ images are computed tomography (CT) images. 【0023】 In another embodiment, a storage medium is provided which stores a target network trained by any one of the training systems of the embodiments described above. 【0024】 In another embodiment, A target network according to any one of the above embodiments, which is capable of processing an input image received from an image storage device or supplied by an imaging device in order to provide a processed image, A display device for displaying processed images and An imaging component comprising the above is provided. 【0025】 In another embodiment, a method for training a target machine learning model, the method is: A step of receiving a training input image drawn from a training dataset which includes at least two types of training images: high-resolution (IQ) images and low-IQ images, wherein the training input image is a high-IQ type image, and a step of receiving The process involves processing high-IQ type training input images in order to generate training output images with reduced IQ using a generative network, wherein the generative network is part of a machine learning model framework that further includes a second generator network, and the target machine learning model includes the first generator network. The machine learning model in question processes the training output image and the high-IQ type training input image in order to generate a second training output image. The second generator network estimates the value of the training input image for the high IQ type from the second output image. The training controller performs the steps of adjusting at least one parameter of the machine learning model framework based on the deviation between the estimated value of the high-IQ type training input image and the high-IQ type training input image, and A method is provided that includes this. 【0026】 In another embodiment, A method for image processing is provided, comprising the step of processing an input image received from an image storage device or supplied by an imaging device in order to provide a processed image, based on a target network according to any one of the above embodiments. 【0027】 In one embodiment, the machine learning model framework includes a generative adversarial type subnetwork comprising a first generator network and a discriminator network as components. In such an embodiment, the method is A discriminator further comprises the step of discriminating between low-IQ type training inputs and training output images drawn from a set, and thus generating a discriminant result, The tuning of parameters in machine learning model frameworks is further based on the discrimination results. 【0028】 The machine learning framework used above is generally a computing system comprising multiple ML models or sub-ML systems that are at least partially interconnected with respect to their respective inputs and outputs. In embodiments, the first and / or second generators and / or discriminators are of the artificial neural network type. 【0029】 In another embodiment, a computer program element is provided which, when executed by at least one processing unit, is adapted to cause the processing unit to carry out the above method. 【0030】 In another embodiment, a computer-readable medium is provided on which the above-mentioned program elements are stored. 【0031】 Therefore, according to the proposed training method, a smaller amount of contrast agent is still sufficient to obtain images with sufficient and useful contrast, thus saving costs and eliminating patient discomfort. A significant reduction in good contrast agent (CM) use can be achieved with the same clinical / radioactive effects as before (when large amounts of CM were used), without changing conventional CT scanning parameters and without compromising the clinical tolerability of the CT examination, such as good vascular and organ correction. The proposed system and method compensate for the reduction in contrast agent volume by correcting the examination images. 【0032】 In addition, the proposed method and system can be used as a general-purpose image correction tool, not limited to contrast agent reduction protocols. 【0033】 In particular, the proposed method and the model of the subject trained by the proposed system are configured to enhance only, or primarily, the contrast agent contribution in a given image, thereby producing an image with a more "typical" appearance, i.e., one that is not artificial and unnatural. The image portion representing the contrast agent is reproduced largely unchanged. 【0034】 In addition, the methods and systems can routinely improve and facilitate diagnosis by correcting the prominence of lesions and abnormalities. 【0035】 Furthermore, if the injection test fails, the method and system may allow for volume reduction by rescuing such tests and eliminating the need for repeated scanning. 【0036】 In addition to the clinical benefits mentioned above, there are also significant economic benefits from reducing the use of contrast agents. Direct costs can be high in certain regions (for example, $120 / 100ml in Japan, as mentioned above), and indirect costs such as extended hospital stays, treatment, and repeated scans can be very high. 【0037】 This method has the attractive advantage of being based on an unsupervised learning technique. This eliminates the need to prepare large, reliable pairs of training data, which is either impossible or extremely difficult, redundant, time-consuming, and therefore costly to implement. In particular, pairing non-contrast and contrast-enhanced images is not required in the proposed training system. In such a protocol, a given patient is not scanned twice in the usual manner without moving the patient at all, using two different amounts of contrast agent, which is particularly beneficial for the purposes of CM-based imaging. 【0038】 In particular, regarding IQ related to contrast, in one embodiment, a training system for training a target machine learning model for image contrast correction of medical images, An input interface for receiving training input images drawn from a training dataset containing at least two types of training images: high-contrast and low-contrast images, wherein the training input images are high-contrast images. An artificial neural network model framework including a generative adversarial type network having a first generator network and a second generator network, wherein the target machine learning model includes the artificial neural network model framework including the first generator network, Here, the generative network is designed to process high-contrast training input images in order to generate training output images with reduced contrast. The discriminator attempts to discriminate between low-contrast training input and training output images drawn from the set to generate a discriminant result. The target machine learning model is designed to generate a second output image based on a training output image and a high-contrast type training input image, and the second generator network is capable of estimating an estimate of the high-contrast type training input image from the second output image. A training controller capable of adjusting the parameters of an artificial neural network model framework based at least i) the discrimination result, and ii) the deviation between the estimated high-contrast type training input image and the high-contrast type training input image. A training system equipped with the following features is envisioned. 【0039】 Generally, a “machine learning component or module” is a computerized component that implements a machine learning ("ML") algorithm. An algorithm is configured to perform a task. An ML algorithm is based on an ML model, whose parameters are tuned by the ML algorithm during the training phase based on training data. In an ML algorithm, task performance generally improves measurably over time, with more (new and changed) training data being used for training. Performance is measured by objective testing when test data is fed to the trained model and the output is evaluated. Performance is defined by requiring that a specific error rate be achieved for given test data. See TM Mitchell, “Machine Learning,” page 2, section 1.1, McGraw-Hill, 1997. Tasks of primary interest in this specification are automatically increasing image contrast, removing / reducing image artifacts / noise, or improving other IQ (image quality) metrics. 【0040】 Here, embodiments of the present invention will be described with reference to the attached drawings, which are not proportional to the actual size. [Brief explanation of the drawing] 【0041】 [Figure 1] This is a block diagram of an imaging component, including an imaging device and an image processing system. [Figure 2] This is a flowchart illustrating the operating principle of a training system for training machine learning components that can be used to implement an IQ enhancer. [Figure 3] This is a block diagram of a training system that includes two generator networks and a discriminator network. [Figure 4] This figure shows one embodiment of the training system shown in Figure 3. [Figure 5] Figures 2 through 4 show schematic block diagrams of trained machine learning models as provided by one of the training systems. [Figure 6]This is a diagram showing a convolutional type artificial neural network model. [Figure 7] This is a flowchart illustrating the computer-based method for IQ correction. [Figure 8] This is a flowchart illustrating a computer-based method for training machine learning models. [Modes for carrying out the invention] 【0042】 Referring to Figure 1, the computerized imaging component AR is shown. In embodiments, component AR comprises an imaging apparatus IA and a system SIE for image correction, which includes an image correction processor IP. In very brief terms, the image correction processor IP is used to correct the image quality ("IQ") of the image obtained by the imaging apparatus IA. The image to be corrected is supplied directly by the imaging apparatus IA or retrieved from an image storage device IRP. Image correction includes increasing image contrast or reducing or removing image artifacts and / or noise. Image contrast can be measured and quantified, for example, by the contrast-to-noise ratio (CNR). A higher CNR is preferred. In particular, the contrast provided by CM can be increased. In some embodiments, the image correction processor IP is implemented by a machine learning model TM. A training system TS is provided that can train multiple such ML models, including the (target) model TM, based on training data TD. The model TM thus trained can then be used by the image correction processor IP to compute the corrected image. Before describing the training system TS in more detail, the components of the imaging component will first be described. 【0043】 An imaging device IA (which may be simply referred to herein as “imaging device”) is preferably intended for medical purposes and is operable to acquire one or more images of a patient PAT. More broadly, the imaging device includes a signal source SS for generating a question signal XB. The question signal XB interacts with tissue in the patient PAT and is thereby modulated. The modulated signal is then detected by a detection unit DT. The detected signal, such as an intensity value, forms detector raw data or a projected image. While the projected image is of interest in itself, sometimes the projected image is further processed by a reconstruction module RECON to generate a reconstructed image X. 【0044】 The imaging apparatus IA envisioned herein is configured for structural or functional imaging. Various imaging modalities are envisioned herein, such as transmission imaging, radiation imaging, or other imaging such as ultrasound (US) imaging. For example, in transmission imaging such as X-ray-based imaging, the signal source SS is an X-ray tube, and the interrogation signal is an X-ray beam XB generated by the X-ray tube SS. In this embodiment, the modulated X-ray beam collides with an X-ray sensitive pixel of a detection unit DT. The X-ray sensitive detector DT records the impacting radiation as a distribution of intensity values. The recorded intensity values ​​form a projection image π. The X-ray projection image π may be useful on its own, such as in X-ray examinations, but is then converted into a cross-sectional image in CT imaging by a reconstruction module RECON. Specifically, the reconstruction module RECON applies a reconstruction algorithm, such as filtered back projection or other algorithms, to the projection image. The cross-sectional image forms a 2D image in 3D space. In CT, multiple such cross-sectional images are reconstructed from multiple different sets of projection images to obtain a 3D image volume. 【0045】 In MRI, the detection unit is formed from a coil capable of picking up radio frequency signals representing the projected image, and a cross-sectional image is reconstructed from these radio frequency signals by an MRI reconstruction algorithm. 【0046】 X-ray images in CT or radiography represent the structural details of a patient's anatomical structure. In radiographic imaging such as PET or SPECT, the signal source (SS) is present inside the patient in the form of a previously administered radiotrace material. Subsequently, nuclear events caused by the tracer material are recorded as projection images by PET / SPECT detectors (DT) positioned around the patient. Then, PET / SPECT reconstruction algorithms are applied to obtain reconstructed PET / SPECT images that represent the functional details of processes within the patient, such as metabolic processes. 【0047】 The imaging device IA is controlled by the operator from the operator console CS. The operator may set multiple imaging parameters Ip. Imaging parameters include acquisition parameters and / or reconstruction parameters, where applicable. Acquisition parameters refer to settings of the imaging device that control the imaging operation, such as X-ray tube settings (amperage or voltage). Reconstruction parameters, on the other hand, relate to parameters related to the operation of the reconstruction algorithm. Reconstruction parameters are not necessarily set on the operator console CS, but may be set at a later stage on a workstation or other computer. In X-ray imaging, acquisition parameters include one or more of the following: scan type, body part, XR source (tube) XR settings such as mA, mAs, kVp, rotation time, collimation setting, and pitch. The "scan type" can be helical or axial, and / or specifies the region of interest (ROI) to be imaged, such as the chest, lungs, pulmonary embolism, or heart. "Pitch" is a parameter in multislice spiral CT and is defined as the ratio of table increment to detector collimation. 【0048】 When reconstruction operation using the RECON reconstruction module is required in CT, MR, or radiographic imaging (PET / SPECT), this requires specifying reconstruction parameters to adjust the reconstruction algorithm. Appropriate reconstruction parameters expected in CT reconstruction include one or more of the following: reconstruction filter, reconstruction algorithm (e.g., FBP, iDose, or IMR), slice thickness, slice increment, image size (in pixels or voxels, m×n), and field of view. 【0049】 Acquisition parameters specify how the imaging device should operate to acquire an image signal that will subsequently be converted into image values. Reconstruction parameters describe how image values ​​are converted from one domain (e.g., projection domain) to other image values ​​in different domains, such as the image domain. Therefore, both imaging parameters and / or reconstruction parameters influence the image values ​​or their distribution within the image. 【0050】 Unless otherwise specified, no distinction is made between reconstructed images and projected images below; both are simply referred to as input images or "images" X processed by the image correction processor IP. In other words, the image correction processor IP is configured to operate on images from the projection domain or on images being reconstructed in the imaging domain. 【0051】 An input image X, generated by the device IA or retrieved from the storage device IRP, is received by the imaging processor at its interface IN', for example, via wired or wireless communication means. The image X is received directly from the imaging device or retrieved from the imaging repository IRP where the image is stored after acquisition. 【0052】 Some imaging protocols require the administration of a contrast agent / drug to the patient to enhance image contrast to a higher CNR for organs or tissues that are not sufficiently radioabsorbent on their own. Contrast agents / drugs are used by CT or other X-ray-based imaging modalities, but are also used in MRI, for example. Before or during imaging, a certain amount of liquid contrast agent (such as iodine-based) is administered to the patient to allow it to accumulate in the ROI at a sufficient concentration, after which imaging is initiated or continued to obtain a "contrast-enhanced" high-contrast image of or with respect to that ROI. 【0053】 In most imaging modalities, images acquired or processed are subject to image artifacts for technical, physical, or other reasons. Image artifacts are image structures that do not represent the actual structural or functional properties within the imaged object; that is, the representation in the image is somewhat distorted or otherwise unsatisfactory. In X-ray imaging, such image artifacts are present in projected or reconstructed images. Image artifacts include one or more of the following: beam hardening artifacts such as streaks, cupping artifacts, ring artifacts, partial volume artifacts, incomplete projection, photon deficiencies, patient motion, helical and conical beam artifacts. Other imaging modalities, such as MRI, introduce other artifacts such as chemical shift, wraparound, Gibbs artifact, metallic artifact, or RF-based artifacts, magnetic field heterogeneity, etc. 【0054】 The noise-related IQ is often influenced, for example, by the energy of the X-ray used or by imperfections in the detector hardware. Detector quantum efficiency (DQE) is used to define the amount of noise introduced by the detector hardware. Some image processing algorithms, such as reconstruction algorithms, also introduce noise themselves. 【0055】 The image processor IP can generate corrected images in which artifacts or noise are reduced, or contrast is increased. For example, an input image X that may contain artifacts, received at the input port IN' of the image processor IP, is processed to become an output image X', which is output at the output port OUT'. The image processor IP operates to reduce or completely remove artifacts. In addition, or instead, a low-contrast image X acquired with little to no contrast agent is processed by the image correction processor IP to become a higher-contrast image X'. 【0056】 The output image X' is mapped to appropriate grayscale values ​​or a color palette and then visualized on a display device DD by a visualization module (not shown). 【0057】 Therefore, the image correction processor IP assists the diagnostician in properly interpreting the input image X by using the corrected image X'. The image processor IP operates on a general-purpose computing unit. The computing unit PU is communicatively coupled to an imaging device or imaging repository IRP. In other embodiments, the image processor IP is integrated into the imaging device IA. The image processor IP is located in hardware, software, or a combination thereof. Hardware embodiments include appropriately programmed circuits such as a microprocessor or microcontroller, FPGA, or other general-purpose circuit elements. Hardcoded circuits such as ASICs or on-chip systems are also assumed herein. Dedicated processors such as GPUs (graphics processing units) are also used. 【0058】 To refer more precisely to the nature of the image being processed, it can be conceptualized as a 1D, 2D, or 3D, or even higher-dimensional (e.g., a time series of the image) data structure containing multiple digits. Depending on the contrast-granting mechanism underlying the imaging, these digits represent intensity values ​​of the physical quantity or phenomenon being imaged. The digits are referred to herein as image values. The data structures referred to above include one or more n-dimensional matrices or "tensors" (n>3). Appropriate values ​​for n are 1, 2, 3, 4, or n>4. The image can be grayscale or color. Color images are encoded using the RGB scheme or another suitable encoding scheme. 【0059】 Image values ​​are represented by rows and columns i and j to represent a two-dimensional spatial structure. Three-dimensional image values ​​are represented by rows i and j, and a depth component k. 【0060】 In the embodiment, the image is represented in a 2D structure having rows and columns i, j, but in practice, the image forms a cross-section or subspace of a 3D volume, such as a slice image of a CT 3D volume. Similarly, an MRI volume also includes 2D slice images that are cross-sections in 3D. 2D or 3D images acquired over time are represented as 3D or 4D image data, respectively, along with a third or fourth component representing the acquisition time. 【0061】 While images represent 2D structures, higher-dimensional representations are still sometimes used, such as in color images encoded in RGB. An RGB image is represented as a 3D image with two spatial dimensions i and j corresponding to structures or functions within a patient, while the other component represents the red, green, and blue image values ​​of any given image locations i and j, respectively. In other words, a 2D color image is represented as a 3D volume formed by the superposition of three separate 2D images, each representing the red, green, and blue image values ​​of a given image location. 【0062】 Therefore, a spatially 3D color image acquired over time is thus represented as a 7-dimensional tensor, i.e., three spatial dimensions, three dimensions of color values, and one dimension of time. A grayscale image is represented without an additional depth component. 【0063】 In relation to image data, as used herein, "size" refers to the number of data value (digital) entries in a matrix or tensor. "Dimension" refers to the number of spatial components. For example, a matrix (e.g., an image) with 50x50 pixels has a larger size than a 40x40 or 50x40 image. Both have the same dimension. A 3-channel image 50x50x3 has a larger dimension (3D) than a 50x50 2D image. A 3D image also has a larger size because it has more data points. 【0064】 Referring more closely to the computerized imaging processor IP, it includes, in embodiments, a pre-trained machine learning component or module MLC. The pre-trained machine learning component MLC includes a pre-trained machine learning model TM. The machine learning model TM, such as an artificial neural network, has been previously trained by a computerized training system TS, which is described in more detail below. The training is based on a set of training data TD ("corpus"). 【0065】 The training data TD includes, in particular, historical images. In embodiments, the training data includes historical X-ray, MRI, UV, PET / SPECT, or other images collected from previous patients. The training data corpus TD includes two categories of data, referred herein, as high-contrast and low-contrast images of high and low-IQ images, as an example. Low-IQ vs. high-IQ pairs, such as low-contrast / high-contrast images, are not required herein for the operation of the training system TS. Although primarily assumed herein, high and low contrast in an image is not necessarily caused by different contrast agent concentrations present within the imaging device's field of view (FOV) when the image was acquired. For low / high contrast, the image is generated using spectral or dual imaging techniques, rather than being generated "synthetically". Specifically, low / high contrast images are generated by using low keV images that can be obtained in spectral scanning, which also provide conventional CT images to define the two classes of low / high contrast images. 【0066】 In addition, or alternatively, an iodine map may also be obtained in a spectral scan that provides instances of high and low contrast images, respectively, along with conventional CT images. In this case, the high contrast image can be obtained by combining the iodine map with the low contrast image by addition or multiplication. Alternatively, the iodine map is combined with the low contrast image by first multiplying the map by some coefficient (control parameter) and then adding the iodine map (the multiplied coefficient) to the low contrast image. Further alternatively, it is also possible to form a contrast-enhanced image member by registering a high contrast image of any modality with a low contrast image of any or the same imaging modality. 【0067】 Image correction relates to artifact reduction, and therefore the training set TD includes images with one class of artifacts and images without artifacts (or with less severe artifacts) of the other class. Classification is performed by human experts or by an auxiliary machine learning system. Similarly, in embodiments relating to image correction with respect to noise reduction, the two classes of high and low IQ images include high and low noise images. 【0068】 Generally, images from training sets TD are referred to herein as “training images / pictures” in two categories, high and low contrast, which are primarily referred to herein, although the following explicitly assumes other types of IQ. 【0069】 A machine learning model TM can generally operate in two modes: training mode and deployment mode. Training mode takes place before deployment mode. Training mode may be a one-time operation or it may be repeated. More specifically, a machine learning model TM includes an architecture and several parameters configured according to the architecture. These parameters are initialized. During training mode, the training system TS adjusts the parameters based on the input training images being processed by the model TM to be trained. Once the parameters are sufficiently fitted in one or more iterations, the thus pre-trained model TM is then used in deployment mode in the machine learning component MLC of the image processor IP, as mentioned above. In deployment mode, the pre-trained machine learning model TM may process the received input image to reduce or remove image artifacts / noise or to increase contrast, as mentioned above. Importantly, the input image X in deployment may not be part of the training image, and consequently, the machine learning model TM never "sees" the previous input image. After proper training, a machine learning model TM can transform an input image X into a higher-contrast version X' of the input image X (with lower contrast), or similarly with respect to other IQ metrics. 【0070】 Referring to the block diagram in Figure 2, the operation of the proposed computerized training system TS will now be described in more detail. 【0071】 The training system TS resides in one or more computer-readable memories MEM. The computerized training system TS is executed by a processing unit PU, such as a general-purpose CPU circuit or, preferably, a dedicated circuit such as one or more GPUs (graphics processing units). The components of the training system TS are implemented on a single computing device or on a group of computing devices such as servers in a distributed or "cloud" architecture connected within an appropriate communication network. 【0072】 Machine learning involves tuning the parameters of a machine learning model TM. Machine learning, or training, is performed by a training system TS. The model is capable of processing inputs and producing outputs. Model parameters are tuned / updated by the training controller TC of the training system TS based on a set of training data TD. The parameter tuning or update operation is formulated as an optimization problem. The optimization problem is formulated with respect to one or more objective functions E. Parameters are updated to improve the objective function. The objective function is performed by the training controller TC. The objective function is a cost function, and parameters are updated to reduce the cost as measured by the cost function. 【0073】 Referring to the operational commutative diagram shown in Figure 2, the operating principle of the proposed training system TS will now be explained in more detail. This diagram shows the processing path included in the training of the target ML model TM. As seen in Figure 2, the processing path forms a closed feedback loop or cycle whose consistency is checked by the cost function E = CC(). The cost function CC() is a part of the system of the cost function E used by the training system. j The cost function CC is referred herein as the cycle consistency checker function CC(x) (or simply the “checker function”). The initial input x that can be processed by the cost function CC(x) refers to an image sample drawn from the training dataset TD. As mentioned, the training dataset, in embodiments, includes images of two classes or categories, high-contrast images and low-contrast images, such as contrast-enhanced images and non-contrast-enhanced images. “Contrast-enhanced images” are referred herein as contrast-enhanced images, and non-contrast-enhanced images are referred herein as non-contrast-enhanced images. Now, assuming that the sample drawn originates from the contrast-enhanced image class, such images are there 【number】 This is referred to as [a specific term], while members of the non-contrast image class are referred to herein by the symbol "x=I" (without the "bar" symbol "-"). 【0074】 The training system TS first performs a first transformation (1) on the input image 【number】 This is converted to an estimate of low-contrast image I. Next, conversion (2) converts this low-contrast estimate to an estimate of contrast enhancement or contrast improvement. 【number】 Convert this image. 【number】 Therefore, in closing the cycle in this way, a third transformation (3) is applied, and again a low-contrast image. 【number】 This is estimated. Ideally, this last estimate 【number】 Within the applicable error limits, the initial input image 【number】 This should be essentially equivalent. Output image 【number】 A cycle checker function CC(x) is implemented by the training controller TC as a cost function to facilitate the selection of model parameters by the training system TS so that the result is roughly similar to or equal to the input image. This cycle is shown by the rounding error in the center of Figure 2. The proposed consistent cycle-based learning has been found to yield good and robust results. The transformations (1) to (3) mentioned above are implemented in part as a machine learning model in which parameters are tuned under the control of a cost function. The machine learning model includes the model TM under consideration. Preferably, a neural network type model is used to implement at least some of the transformations described above. In particular, transformations (1) and (3) are based on a machine learning model, while transformation stage (2) may be analytically defined or may also be defined as a machine learning model. The model TM under consideration preferably implements the first transformation (1) and / or the second transformation (2). 【0075】 The cycle consistency checker function CC(x) is formulated in relation to a contrast map M that measures the amount of contrast in the image generated in the transformation paths (1) to (3) described above in Figure 2. The feedback loop or cycle described above constitutes an approximate commutative diagram. The identity function is approximated in relation to the connection of the three transformations (1) to (3). The first transformation, which reduces contrast even though it is desired to train the target model TM to improve contrast, has been shown to make training more elastic and robust. In addition, it has been found that when the above commutative configuration in Figure 2 is used in combination with a cycle consistency checker function CC(x), preferably driven by a contrast map, the contrast change can be accounted for locally. Specifically, the contrast correction can substantially focus on areas in the recorded image that represent the contrast agent. More specifically, areas representing the contrast agent are contrast-corrected more strongly by the trained model TM than areas that do not represent the contrast agent. More specifically, only the areas representing such contrast agents are contrast-corrected, while other areas are not contrast-corrected and therefore remain substantially unchanged. 【0076】 Referring further to Figure 2, instead of using images from two categories, high and low contrast, the same principle is applied to other image quality ("IQ") metrics such as high / low noise and high / low artifact contribution. In this case, the transformation (1) acts to reduce the IQ, i.e., to increase artifacts or noise. 【0077】 Therefore, the commutative diagram in Figure 2 represents the processing in the iterative cycle of IQ reduction, and increases when processing images from the high IQ class, such as contrast-enhanced images. When input images are drawn from the low-contrast class, the training system enhances the parameters so that the low-contrast training input images pass through transformations (1) to (3) preferably without change. Thus, the processing in this diagram is asymmetric with respect to image categories. 【0078】 The target model TM to be trained for contrast correction is preferably incorporated into a training system TS together with additional interconnected machine learning models and, optionally, analytical models, which will be described in more detail below in the block diagram of Figure 3, which will be referenced below. Specifically, Figure 3 shows one embodiment of the principle described above in Figure 2. Figure 3 shows a number of components, including machine learning models and optionally analytical transformation stages (not necessarily performed by ML). The components are interconnected and incorporated in a training system formed by their interconnections, with some of their inputs and outputs being interconnected. As mentioned, some of the models are neural network type ML models, but other machine learning models that are not necessarily neural network type are also assumed herein. 【0079】 Specifically, in the embodiment, the training system TS includes two generative ML models, referred to herein as generators G1 and G2. In addition, there is a discriminator DS and an analytical converter path CB. The training controller TC has a cost function E = CC as mentioned above, or a cost function E including the consistency checker function CC() described above, and optionally one or more regularization terms R, which will be studied in more detail below. j The system is implemented. Cost functions E, E j The controller TC processes the output generated by the model, and the updater UT adjusts the model's current parameters to improve the objective function, for example, by reducing costs. The adjustments proceed iteratively. 【0080】 During operation, the training input images are drawn from a set of training data TD that includes contrast-enhanced C (high contrast) image classes and non-contrast NC (low contrast) image classes, as mentioned above in Figure 2. In particular, images belonging to classes NC and C do not need to be paired. The training images from classes NC and C are applied to the training system TS one by one or together and processed by several of the mentioned components to generate the training output. The training output is expressed using the cost function E, E jThe parameters of the training system are fitted until the stopping condition is met, based on the cost returned by the cost function E, which is evaluated by and includes the checker CC(). Below, this is the cost function E j This can refer to a system of terms, and in particular, simply to the "cost function E" under the understanding that it includes the checker CC() function and, in some cases, other terms such as one or more regularization terms R. 【0081】 The target model TM in the schematic diagram of Figure 3 includes a first generator G1 and an analysis-transformation path CB, as will be described in more detail below. In embodiments, the target model TM includes two models G1, CB, which are trained together with the remaining components G2, DS. Once training is complete, the target model TM may subsequently be used as an independent machine learning module to correct (increase) the contrast in images processed during unfolding. These unfolding images are not part of the training dataset, in particular, because they were not previously "seen" by the system. Once training is complete, the remaining machine learning components of the training system, in particular the discriminator DS and the other generators G2, may be discarded as they are no longer needed. Training may be a one-time event or may be repeated as needed when new training data becomes available. In the latter case, if training is assumed to be in multiple stages, the remaining components D, G2 should be retained for future training runs. 【0082】 Two generative models, G1 and G2, are distinguished from a discriminator model, DS. Generative models generate samples from an unknown sample distribution, while discriminators enable distinction between two categories or classes. In a preferred embodiment assumed herein, the discriminator DS and the first generator G together form an adversarially coupled subsystem of a (global) training system TS, which is described in more detail below. In an embodiment, this ML training subsystem forms an adversarial generative network ("GAN"), a type of ML described in the 2014 paper "Generative Adversarial Networks" by Ian Goodfellow and co-authors, published as a preprint under arXiv:1406.2661v1. 【0083】 Some or each of the above-described component models D, G1, and G2 have their own associated cost functions, and these cost functions, along with an optional regularization term, have a cost function E j This forms a system. Some of these cost functions include a cycle consistency checker CC(). All or some of the cost functions are combined, for example additively, to form a common cost function E. As with the general principle in Figure 2, the cost functions are preferably interdependent, and therefore a change in cost in one constituent cost function affects the costs returned by some or all of the other constituent cost functions. This interdependence arises because the cost function of one constituent refers not only to the parameters of its own model but also to some or more parameters of some or all of the other models. 【0084】 Parameter tuning by the updater UT proceeds during training through one or more iterations. Tuning may proceed either once to improve the common cost function E, or with the constructed cost function E. jThe process proceeds alternately, improving one (or more) of the parameters at a time, or in turn. In any case, the initial or current set of parameters of the machine learning model described above is adjusted to improve, i.e., reduce, the overall cost. The overall cost is the sum of all partial costs, due to the various constituent cost functions or regularization terms (if any) included, regardless of whether the training proceeds separately, alternately, or simultaneously. 【0085】 Referring more closely to the GAN-type subsystem (G1, D) of the training system TS, the first generator G1 and discriminator DS have their respective cost functions, which are adversarially coupled. Specifically, assuming that the parameters of the other cost function remain constant, the cost functions of the first generator G1 and discriminator DS conduct a zero-sum game in the sense of game theory. The sum of the two cost functions is considered to be constant. A reduction in cost in one results in a loss in the other, and therefore, the associated cost of the model increases as a result. 【0086】 More specifically, in the GAN subsystem, the first generator G1 is referred to herein as an IQ reducer, such as a contrast reducer CR. In embodiments, it operates as follows: The reducer CR takes an initial input image x drawn from the training dataset TD as input. As described above in Figure 2, this input image is either contrast-enhanced C or non-contrast NC. If the input image is of contrast-enhanced / high-contrast type C, it includes an image region representing the contrast agent. The reducer CR, G1 takes this image 【number】 The generator G1 / reduction CR and the discriminator DS work to reduce the contrast so that they attempt to replicate samples from the non-contrast class NC. The discriminator DS analyzes the output of the reduction CR', i.e., the contrast-reduced version I, and attempts to accurately classify this image I as belonging to either the contrast-enhanced C or non-contrast-enhanced NC class. Thus, the two cost functions of the first generator G1 / reduction CR and the discriminator DS are interdependent, configured so that the discriminator attempts to maximize the probability that it will classify itself correctly, while the generator attempts to "deceive" the discriminator by producing an output that results in a misclassification by the discriminator. How precisely the cost function of the GAN can be formulated in an embodiment is further explained below in equations (1, 6). 【0087】 The contrast reduction output I of the GAN network is then configured as an analysis processing path, preferably using algebraic operations, and is fed to a synthesizer component, a synthetic contrast enhancer, or a "booster" CB, whose parameters are not tuned during machine learning. Alternatively, this component CB is also implemented as a machine learning component whose parameters are tuned during learning. The contrast enhancer CB uses the contrast reduction estimate I to enhance or correct the contrast image. 【number】 The estimated value is calculated. The contrast in this estimate is, at least locally, compared to the initial input image. 【number】 Higher than average. High contrast estimated image 【number】 This is supplied to the second generator G2 for processing. In this connection, the second generator G2 acts as a restorer. Therefore, the second generator G2 is referred to as “Restorer RS” in the following parts of this specification. More specifically, contrast correction estimate 【number】 The original input image is restored by the RS restorer. 【number】 This is processed in an attempt to reconstruct the image. That is, the restorer RS ​​processes the contrast-enhanced image to be similar to or equal to the initial contrast-corrected image as processed by the GAN. 【number】 It operates to generate the following. Therefore, the restorer is configured to disable the contrast enhancement brought about by the operation of the contrast booster CB. The cost function E of the training system is, in particular, i) the classification result as generated by the discriminator DS, and ii) the initial input image. 【number】 And the image as output by the restorer RS. 【number】 The cost is calculated based on both of the following: i) the lower partial cost if the discriminator accurately classifies the output of the reducer CR, and ii) the output of the restorer RS. 【number】 and initial input image 【number】 The smaller the deviation between the two, the lower the partial cost is awarded. Partial cost ii) is measured by the consistency checker function CC(). 【0088】 At the same time, if the initial input image x drawn during training is of the low-contrast type, the GAN network ideally leaves this low-contrast image I essentially unchanged. In particular, the training system as a whole should act as an equalizer for samples from non-contrast set NC, ideally making no changes to the input image at all, or only slight changes, while the input image propagates through the training system TS. 【0089】 In other words, the training system TS essentially approximates the equalizer by allowing non-contrast samples to propagate unchanged through the system. However, for contrast-enhanced input images, the training system approximates the equalizer as a whole, but only for the initial input and output. However, the contrast-enhanced input image undergoes changes as it propagates through the system TS in the described contrast reduction cycle, is then contrast-enhanced by the booster CB, and then contrast-reduction once more by the restorer RS. These two asymmetric processing modes for contrast-enhanced and non-contrast images, along with the consistency checker function CC as the cost function described above, enable robust learning and locally focused contrast enhancement. Contrast is enhanced only in the contrast-enhanced portion of the input image, while the remaining non-contrast portion of the image is essentially left unchanged (within a given error limit). Locally focused or selective contrast correction allows for the production of more natural-looking contrast-enhanced images compared to other contrast correction methods, which tend to produce images that look artificial because contrast is corrected everywhere in the image. Such artificially enhanced contrast images are not widely accepted in clinical practice. 【0090】 It will be understood that the training system TS performs an optimization operation with respect to the cost function. However, a dual formulation of such optimization with respect to the utility function is also assumed herein, and this is an embodiment in which both maximization of the utility function and minimization of the cost function are assumed. In this case, the objective is not to reduce the cost as described herein, but to increase the overall utility. However, formulations of objective functions with respect to both cost and utility are assumed as alternative forms or in combination, notwithstanding the following description which focuses on the optimization of the cost function. 【0091】 Now, refer to Figure 4, which shows further details of the GAN-based embodiment of the training system TS described above in Figure 3. 【0092】 Training input images x from training datasets in two categories, contrast C and non-contrast NC, are selected from set TD by a random or deterministic switch SW1. 【0093】 A second, arbitrarily selected switch SW2, which can be either deterministic or random, switches through the output of the contrast reducer CR to a discriminator for discriminating between genuine and fake l_g, l_f labels. 【0094】 The analysis transformation path of the contrast booster CB is implemented as an algebraic operation, as shown in the figure. More specifically, the algebraic operation is performed on the original input image 【number】 This involves forming a linear combination of the contrast reduction version I generated by the contrast reducer CR. More specifically, the contrast, which is essentially derived from the contrast agent, is applied to the initial image. 【number】 It is measured by taking the point-by-point difference between the initial contrast-enhanced image and the contrast-reduced version I. This contrast measurement is then linearly scaled by a coefficient α, and the scaled contrast is then used to determine the initial contrast-enhanced image. 【number】 It is added to, and therefore the contrast enhancement estimate 【number】 This is constructed. Then, the contrast enhancement estimate 【number】 The original high-contrast input image is then input to the contrast reducer CR by the restorer RS. 【number】 It is processed in an attempt to reconstruct it. If the original input image is drawn from a non-contrast type NC, the network TS leaves the image essentially intact as it passes through the various network stages. In any case, the cost function is configured to facilitate an optimization transformation toward model parameters that yield an approximate copy of the initial input image x as the output at the final stage of the reconstructor RS, regardless of whether x was drawn from a contrast-enhanced or non-contrast category (as shown in more detail below). 【0095】 The various processing paths on the left side of Figure 4 are shown with dashed and solid lines for clarity. The dashed lines represent the process flow in the GAN subsystem. The solid lines in the left portion of Figure 4 represent the process flow between the GAN and the booster CB. 【0096】 The above points have been explained so far mainly focus on the training mode and operation of the training system during the training phase, in which training images are extracted from the training dataset and processed to fit the parameters of the various machine learning models DS, G1, and G2 that constitute the training system. 【0097】 Here, we refer to Figure 5, which shows a schematic block diagram of the target model TM, including a first generator G1, CR, and analytical contrast booster stage CB, which have been fully trained once. Once training is complete, the remaining machine learning model DS and the second generator G2=RS may be discarded. Specifically, once the training phase is complete, the current parameters of the contrast reducer CR and the instructions for the contrast booster CB are copied into memory in a given architecture. The trained model contrast reducer CR and contrast booster together form the target model TM of the machine learning module MLC of the contrast enhancer SIE. At this point, the machine learning module MLC can be used to perform the expected image contrast correction operation during deployment or testing. 【0098】 During deployment or testing, the input image X (not from the training dataset on which models CR, DS, and RS were trained) is used by the contrast reducer trained at this point. 【number】 In the waveform symbol "~" (indicated here), the input image X, and optionally the context data CXB described in detail below, are received together. The input image X, and possibly the context data CXB, are then trained to appear as the output (not shown) of a contrast reducer. 【number】 The process is carried out together through preparation. This output, along with the initial input image, is then processed by a contrast booster to correct the contrast or other IQs of the output image. 【number】 It is processed to generate the contrast booster, which is the IQ-corrected output image. 【number】 To construct this, we use a linear combination. The linear combination includes a booster parameter β, as described above. 【0099】 The contrast booster parameter β (≧0) is either a fixed metaparameter or can be adjusted by the user via an appropriate user interface UI, such as a touchscreen, keyboard, or any other user interface for providing input. (Contrast-corrected image) 【number】 These IQ-corrected images are, if necessary, further processed, stored, or displayed on the display device DD by visualization components (not shown). 【0100】 Here, refer to Figure 6, which shows an artificial neural network type ("NN") architecture that may be used for generator G1. The following description focuses on the first generator G1=CR, but the same network setup is also used for the second generator G2. Generally, the assumed NN type architecture is a multiscale convolutional network with an optional processing path for context data CXD (generally non-image). Context data processing equipment is not required for the second generator G2, but can still be implemented if necessary. 【0101】 Now, referring more closely to the proposed model CR and Figure 6, this model uses the training input image x as the basis for... 【number】 If the image is of contrast-enhanced class, it is configured to convert it to a training output image CR(x) with reduced contrast. Otherwise, CR(x) is equal to or approximately equal to x. Thus, CR acts as a (pseudo) equalizer. 【0102】 Preferably, the dimensions and size of the training input image are preserved. That is, the dimensions, and preferably the size (such as numbers or rows and columns), of the training input image correspond to or are equal to the dimensions and size of the training output image I' supplied in the output OUT. 【0103】 However, during processing of the training input image x, the training input image x is first transformed into a lower-dimensional representation χ by the reducer G1, the CR encoder component ENC. In addition, its size is also reduced. This lower-dimensional and / or smaller central intermediate representation χ of the training input image is a reduced version of the input image x. This representation is "intermediate" in the sense that it does not form the final output of the model CR. It is "central" in the sense that it is formed within the target model CR during processing. The central representation χ is then upsampled by the decoder DEC component, which is located in series with the encoder ENC, to increase the dimensionality / size of the central representation so that the dimensionality and / or size of the training output image CR(x) finally matches the dimensionality and / or size of the training input image x. Thus, this architecture is a type of autoencoder. 【0104】 A lower-dimensional and / or smaller-sized representation χ is also referred to herein as a “code”. Thus, the encoder ENC can “encode” the received training input image x, while the decoder DEC can reconstruct or “decode” from its code χ a training output image CR(x) having the same dimensions and / or size as the original training input image x. There are also embodiments in which the encoder ENC generates a code that is larger in size and / or higher in dimension than that of the input image x in order to facilitate over-decision representations. In this alternative embodiment, the encoder ENC generates a code χ that is higher in dimension and / or size, but with higher sparseness. In particular, as described by Freiman, Moti, Ravindra Manjeshwar, and Liran Goshen in "Unsupervised abnormality detection through mixed structure regularization (MSR) in deep sparse autoencoders," Medical Physics, vol. 46(5), pp. 2223-2231, (2019), regularization mechanisms are used to facilitate over-determination and sparse coding. 【0105】 In a preferred embodiment, if a neural network setup is used as assumed herein, the network CR is implemented as an autoencoder neural network as described above. 【0106】 The training data TD includes not only image data but also non-image context data CXD associated with each training image. The context data CXD includes imaging parameters, i.e., acquisition and / or reconstruction parameters, if applicable. If the context data is also supplied to the training system TS during training, these are optimized together with the image data. As will be fully explained below, the non-image context data CXD must be reshaped into appropriate "pseudo-images" by the reshaper module RSH, and these pseudo-images are then processed together with their respective training images to which they are associated. 【0107】 When a neural network setup is assumed for a contrast reducer CR, a deep learning network is preferred. In embodiments, such a deep learning network includes a neural network having one or more (preferably at least two) hidden layers between each input layer IN and output layer OL, as will be further explained with reference to Figure 6. 【0108】 This block diagram provides more details about the model CR in question, which is presented herein as an autoencoder within a convolutional neural network (CNN) with multiscale processing. 【0109】 The training input data x is processed in a processing path referred to herein as the imaging strand SI. Optionally, if training also includes non-image context training data CXD, this is processed in a separate processing strand SP that includes a reshaper RSH. The context data CXD is processed by the reshaper RSH to form a pseudo-image that is fused with their respective training image data. 【0110】 However, before describing the neural network architecture of the CR model in Figure 6 in more detail, let's first introduce some artificial neural network concepts to facilitate further explanation. 【0111】 Broadly speaking, the NN structure of the model CR in question includes multiple nodes that are at least partially interconnected and configured to be located within different layers L. Each layer L is configured to be located within one or more sequences. Each node is an entry from which a value can be assumed and / or which can generate an output based on an input received from one or more nodes in a preceding layer L. 【0112】 Each node is associated with a specific function that can be a simple scalar value (node ​​weight), but can also be a more complex linear or nonlinear function. A "connection" between nodes in two different layers means that a node in the subsequent layer can receive input from a node in the preceding layer. If no connection is defined between two nodes, the output of one node cannot be received as input by the other node. A node produces its output by applying its function to the input. This can be done as a multiplication of the incoming input and the node's (weight's) scalar. Two or more inputs from different nodes are received by a node in a subsequent layer. Multiple different inputs are connected by a connection function g to produce a connected value, to which the receiving node applies its own function f to produce the node's output. For example, g maps the incoming inputs from the preceding node to a sum of products (e.g., dot products), and then the node's function f is applied to this sum of products. 【0113】 Each connection has its own weight ("connection weight"). The weight is used to weight the output progressing along that connection. A concatenated function uses the connection weight to combine all incoming inputs of a given node to produce a concatenated dot product output. Connections between layers are either fixed or change during processing. Some layers are fully connected, while others are not. Two layers are fully connected if each node in a subsequent layer is connected to all nodes in the preceding layer. In partially connected layers, such as convolutional layers, not all nodes in a subsequent layer are connected to all nodes in the preceding layer. 【0114】 The outputs of all nodes in a given layer are referred to herein as “layer outputs,” and the inputs received from preceding layers are referred to herein as “layer inputs.” 【0115】 Each layer is represented as a 2-dimensional, 3-dimensional, or higher-dimensional matrix. When the dimension is 3 or greater, the matrix is ​​generally referred to as a tensor. Nodes are implemented as entries within those matrices or tensors. Each layer has a size (rows i and columns j), a depth k (greater than 1), and possibly additional dimensions. Alternatively, the size, depth, and one or more additional dimensions are realized by data structures other than matrices or tensors. 【0116】 The target model CR of the neuronetwork structure includes one or more initial input layers IL and one or more final output layers OL. The initial input layer IL is where the initial training image I and, optionally, context data CXD are received, either by inputting values ​​into the nodes or by presenting the final result for further processing. 【0117】 Layer L is located within a deep architecture, as shown in Figure 6. This architecture includes one or more hidden layers between the output layer OL and the input layer IL. The number of hidden layers determines the depth of the network M. For example, a single hidden layer, as assumed in some embodiments, means a depth of "1". However, preferably, at least two hidden layers are used. Preferably, there is an even number of hidden layers, such as four, six or more. Inputs received from preceding layers in a hidden layer are referred to as intermediate inputs, while outputs generated by the hidden layer and passed to subsequent hidden layers are referred to as intermediate outputs. 【0118】 Generally, in the NN architecture of the Model CR, training input data IM, such as training input images, is applied as an initial input in one or more input layers IL. The data is then sequentially transformed as it passes through successive layers. At each layer, an intermediate output is generated that then serves as an intermediate input to the next layer, and at each point in time, each layer acts on the received intermediate input by applying operations to it until the final result appears in one or more output layers OL. The NN architecture assumed herein is preferably a feedforward type, where data propagates forward from layer to layer, as described above. Alternatively, a recurrent architecture, where one or more layers are revisited during processing, is also assumed herein. 【0119】 Different types of strata L have different functions and, therefore, different operations are applied. Here, we will first discuss different types of strata, some or all of which are assumed in different combinations and partial combinations herein. 【0120】 The NN layer types envisioned herein include one or more, or all, of the following, as shown in Figure 3: a fully connected layer FC, a convolutional layer C, a deconvolutional layer D, an upsampling layer ("↑"), a downsampling layer ("↓"), and an activation layer R. The layer types are grouped into units to form various operating units for each scale level s. For example, a deconvolutional or convolutional layer is combined with an upsampling or downsampling layer and an activation layer. For instance, the representation "C,↓,R" in Figure 3 indicates a group of layers arranged consecutively as a convolutional layer C, a subsequent downsampling layer ↓, and then a subsequent activation layer R. While grouping layers into such units of two, three, or more layers may offer practical advantages when expressed as matrix / tensor and matrix / tensor multiplication, grouping is not necessarily envisioned in all embodiments. 【0121】 Here, referring to the activation layer R in more detail, this is a logistic-based sigmoid function, arctan, softmax, rectification function (x + It is implemented using arbitrary nonlinear functions, including =max(x,0)). The layer that implements the rectification function is called the Rectified Linear Unit (ReLU). The activation layer R implements a nonlinear function applied to the values ​​within each node to introduce nonlinearity, allowing the MLC to capture nonlinear patterns. The size of the (intermediate) input layer is maintained by the rectification layer R. Layer R also acts as a “significance filter” to remove or reduce inputs if it falls below a threshold. Inputs from a node are completely disabled and therefore not forwarded to the next layer at all, even though a connection exists. In this case, the node is said not to “ignite,” and this event is recorded by subsequently forwarding “zeros” to the next layer. The percentage of nodes that do not ignite in a given configuration is expressed as the sparseness of the MLC. 【0122】 In addition to or instead of ReLU, dropout layers are used in some embodiments. These layers also introduce nonlinearity by randomly canceling out connections between nodes of two layers. 【0123】 Next, referring to the fully connected layer FC, the layers assumed herein within an MLC are not necessarily fully connected, but in embodiments, the network MLC includes two or more fully connected layers. The fully connected layer is shown as FC in Figure 6 above. The fully connected layer FC represents a more conventional non-convolutional NN component, where each node is associated with its activation function and weights. The activation function of each node is organized into a separate operating layer. As described above in the general introduction, each node in one layer within the FC receives inputs from all nodes in the previous layer, each weighted according to the respective weights of the connection. This total input is then summed and evaluated by the activation function layer. The activation function layer is applied to the weights of each node to produce a node output that is passed to the nodes in the subsequent layer. The advantage of using FC is that it can model more complex nonlinear patterns. 【0124】 Convolutional layer C and deconvolutional layer D (and / or upsampling and downsampling layers) are examples of non-fully connected layers. More specifically, these layers are not fully connected to all nodes of the preceding layer. In addition, the connections change when processing (intermediate) inputs from the preceding layer. 【0125】 In convolutional layer C, each intermediate output node is obtained by convolving the convolutional layer with a subgroup of nodes from the preceding layer, and thus containing only a subset of all possible connections. The connections are then redefined to select a new group of nodes for the next intermediate output layer, and this continues until the entire (intermediate) input layer is processed. 【0126】 The convolutional layer C is preferably structured as a matrix of odd size, such as 3x3 or 5x5, with a central position. The convolutional / deconvolutional layer also has a depth corresponding to the depth of the intermediate input to which it should be applied. 【0127】 The size of a convolution / deconvolution layer is generally smaller than the size of the intermediate input on which it acts. Just as with conventional convolution, a convolution layer can be thought of conceptually as sliding over its (intermediate) input layer and selectively applied to different groups of nodes to produce filtered nodes as (intermediate) outputs. The convolution operation itself involves forming the sum of the products of all nodes in the convolution layer and / or all nodes in immediate groups within its intermediate input layer. A filtered node is the central node of the odd-size matrix of the convolution layer. 【0128】 Shifting to a new node group in the processing of a convolutional layer is conceptually understood as sliding the convolutional layer C over the (intermediate) input by a width of n (where n is a natural number) to produce the (intermediate) output of each group node. The width is a CNN design parameter indicating the degree of each shift. For example, a width of n=1 means that a new group is obtained by effectively shifting layer C by one node when redefining the connections of the next node group into which the value of the next (intermediate) output node will be convolved. A width of n=2 means that one column or row of nodes is skipped, and therefore the case when n>2. Instead of the sliding window method described above, the input layer to be processed by the convolutional layer is instead divided into multiple parts (tiles), each of which is then convolved separately with the convolutional layer. 【0129】 Zero-padding is used when a convolutional layer extends beyond the outermost node of the intermediate input layer on which it acts. Convolutional / deconvolutional layers are applied sequentially as described, or in parallel across the entire intermediate input layer. 【0130】 The deconvolutional layer D is essentially the inverse operation of the convolution performed by the convolutional layer C. While the convolutional layer initially maps from pixels to progressively higher-level features, the deconvolutional operation maps the features back to pixels. Functionally, the deconvolution can be formulated in terms of the convolutional operation used in the convolutional layer described above, which is then summed up. See, for example, Section 2 of MD Zeiler et al., "Adaptive Deconvolutional Networks for Mid and High Level Feature Learning," 2011 International Conference on Computer Vision, Barcelona, ​​Spain. The deconvolutional layer D can also be represented as a matrix with an appropriate depth. 【0131】 Therefore, convolutional and deconvolutional layers generally retain the size of their (intermediate) inputs. 【0132】 Downsampling layers are structured similarly to convolutional / deconvolutional layers C and D, but they act differently on their input data. Downsampling / upsampling layers (also referred to herein simply as “downsamplers” or “upsamplers”) are odd or even sized. Downsampling layers combine with groups of nodes in preceding layers to produce a single node in the subsequent output layer, thus reducing the space (fewer rows and / or columns) of the (intermediate) input layer. This can be done by forming an average, or by selecting the maximum / minimum value or some other specified value from the group of nodes covered. The size of the group corresponds to the size of the downsampling layer. 【0133】 An upsampler acts in a pseudo-inverse relationship with a downsampler, generating a larger set of nodes in its output, preferably by interpolating between input nodes. Convolution / deconvolution and upsampling / downsampling functions are combined within the same layer. For example, convolution and downsampling are achieved as convolutions with a width greater than 1, such as 2 or more. Similarly, deconvolution is combined with an upsampling function. 【0134】 Further layer types include batch normalization layers B. These layers perform gradient normalization within the previous layer. This normalization prevents or reduces gradient saturation, where the magnitude of the gradient decreases excessively rapidly during iterations. Similar batch normalization layers are described by Sergey Ioffe et al. in "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," available online T arXiv:1502.03167v3 [cs.LG] 2 March 2015. 【0135】 Here, referring to Figure 6, the processing of image data in the imaging strand SI of the model TM in question will be described in more detail. Generally, in the image strand SI, the number of upsampling layers in the decoder stage DEC corresponds to the number of downsampling layers in the encoder stage ENC. In one embodiment, the encoder ENC of the SI contains three downsampling components, and the decoder DEC contains three upsampling components. The same applies to the convolution / deconvolution layers C and D. In Figure 6, the layers L are grouped into units. This grouping is more conceptual than practical. The logic of the grouping in Figure 6 is that each unit causes either downsampling or upsampling. 【0136】 Generally, in some embodiments, the last layer L (excluding ReLU) of the encoder ENC is a downsampler ↓ to reach the lower-dimensional representation CR, followed by at least one upsampler ↑ in the decoder section DEC. The layers downstream of the last layer in the encoder ENC provide the central representation CR in the form of a two- or more-dimensional matrix with entries. The more zeros or negligible entries there are, the sparser the coded CR will be, which is preferable. More generally, there is at least one downsampler in the encoder END section and at least one upsampler in the decoder DEC section. This “bottleneck” structure allows the coding / decoding principle to be implemented in autoencoder-based embodiments of the model CR. Alternatively, over-deterministic variants of the AE architecture are also assumed herein, in which spatial up / downsampling is not necessarily present, and instead, sparseness-based regularization, as further described below, is used. In an alternative embodiment, such a bottleneck does not exist, and the encoder ENC performs one or more upsampling operations so that the code CR achieves an over-determined representation having a higher spatial dimension than the training input image I and the training output image I', and then the decoder performs a corresponding number of downsampling operations. This autoencoder embodiment having an over-determined embodiment is preferably used with sparseness correction by a regularization function, as described in more detail below. 【0137】 In addition, there is at least one convolutional layer C in the encoder ENC section and at least one deconvolutional layer D in the decoder DEC section. The order of convolution and downsampler in the encoder ENC is preferably convolution first, followed by downsampling, but this is not necessarily the case in all embodiments. Similarly, in the decoder DEC section, there is deconvolution first, followed by upsampling, but this order is also reversed in some embodiments. 【0138】 Certainly, the processing sections ENC and DEC are conceptually inverse but generally do not result in an identity element, and thus, ENC -1 ≠ DEC. 【0139】 In an embodiment, the encoder ENC of SI includes three-layer units. Each unit successively includes a convolutional operator, another subsequent convolution, downsampling, and rectification R. In an alternative embodiment, instead, more or fewer such units are used. 【0140】 In an embodiment, in the decoder section DEC, the convolutional layer C has a size of n x n x d (or larger), and preferably each is followed by a rectified linear unit (ReLU). The spatial dimension can be any suitable number such as 3, 5, etc., while the depth d is a function of the color channels (if any) and the number of channels required by the reshaper strand SP as described in more detail below in relation to the reshaping context parameter CXD. Two or more convolutional layers C are used to vary the scale. The last convolution of each scale is performed by a factor of 2 for downsampling / downscaling. In the embodiment of FIG. 3, three pairs of upsampling operations and downsampling operations exist within DEC and ENC, resulting in three feature scales, but in alternative embodiments, more scales or fewer scales are also envisioned. 【0141】 The decoder DEC section of the strand SI complementarily reflects the structure of the encoder section ENC and, correspondingly, contains the same number of units, but in each unit, each generation of the convolution is replaced by an inverse convolution layer, and each generation of the downsampling layer is replaced by an upsampling layer. The last downstream layer in the decoder DEC preferably ensures that the final output also allows zero nodes; otherwise, it is restricted to values ​​greater than zero due to the preceding activation layer R. This last layer is implemented as an additional convolution layer C to provide the final decoded result, as shown in the figure, but this is optional. Preferably, this last convolution has a size of 1x1x1. 【0142】 In the embodiment, in the decoder DEC3x3x3 (or larger filter), deconvolutional layers D are used, preferably each followed by a rectifier linear unit (ReLU). Two or more deconvolutional layers D are used to revert the scale as it was modified in the encoder ENC. The final convolution of each scale is performed by a width of 2 for upsampling / upscaling. The number of scales in the decoder DEC section is typically equal to the number of scales in the encoder ENC section. 【0143】 Preferably, but not necessarily in all embodiments, the typical number of convolutional filters in each layer within scale s is (2 3 ) s c is the network control parameter, where s=1,2,3,... is the scale (the input scale is equal to 1), and it was found that the best performance is usually obtained in an over-determined setting where c>1. 【0144】 Other combinations or configurations of functional layers within the two sections DEC and ENC of the SI are also contemplated herein, but preferably the number of upsampling layers and downsampling layers and / or convolutional layers and deconvolutional layers in the two sections DEC and ENC are the same. 【0145】 Optionally, some or each of the units include one or more batch normalizers B positioned between the convolutional layer C or inverse convolutional layer D and the activation layer R. 【0146】 Here, we refer in more detail to the optional reshaper strand (RSH) and the processing of non-image context (CXD) in the context data processing strand (SP), where, for example, imaging parameters, acquisition parameters, or patient description data (age, sex, patient history, etc.) are processed. In embodiments, the reshaper strand (SP) includes three hidden layers. Specifically, a fully connected layer (FC), followed by a rectifying layer (R), and followed by a second fully connected layer (FC) are assumed to be in sequence. Alternatively, a single fully connected layer may be used, or four or more hidden layers, in particular three or more FCs, may be used. 【0147】 Functionally, the imaging parameter strand SI is configured to act as a reshaper for the set of imaging parameters. In other words, at the output OL of the strand SP, a "pseudo-image" is formed based on the context data CXD received at its input layer IL. The imaging parameters are reformatted into a matrix / tensor with a size equal to the input image IM. This pseudo-image is then processed by arrow f L As indicated by (or "supply line"), it is fused with the initial image data IM received in the input layer IL of the SI strand. 【0148】 A convolutional layer has a certain depth and can act on multidimensional layers. The depth of the convolution corresponds to the number of channels in the image to be processed. For example, a convolutional layer applied to a color image has a depth of 3 to serve each of the three color channels, RGB. The convolutional operation in each of the three layers differs for each channel. 【0149】 In an embodiment, it is proposed herein that a reshaped image parameter (i.e., a pseudo-image) is included in the training input image I as an additional channel, and that the convolution / deconvolution layers within the SI section have an additional depth configured to process the pseudo-image, which is the reshaped image parameter. In this sense, the imaging parameter and the actual image data in I are fused into a multidimensional composite image, where the imaging parameter is included in the actual image data as an additional dimension or layer. Thus, the composite image has a depth of the actual dimensions of the image data I + the imaging parameter I. p It can then be processed by a multi-channel convolution / deconvolution layer corresponding to one (or more) dimensions. 【0150】 In an embodiment, the reshaper RSH reshapes the output of the last fully connected layer FC into one or more representations of matrices, volumes, or tensors, referred herein as “imaging parameter volumes” or “pseudo-images.” The size of each imaging parameter volume is equal to the size of the input image I. The imaging parameter volumes are fed into the imaging strand IS in additional channels. The number of additional channels corresponds to the number of imaging parameter volumes generated by the reshaper strand SP. Feeding the image strand SI is done by data-feeding each output value of the last fully connected layer FC into separate imaging parameter volumes, so that in each case, the entire volume is filled with the same output value. Other embodiments of fused image data V with non-image data CXD are also conceivable. The use of the multi-channel technique described above in the context of a CNN is only one embodiment. However, similar applications of the above fusion technique in more general NN or non-NN contexts are also conceivable herein. 【0151】 The imaging parameters, as applied to the input layer of the reshaper strand SP, are represented in one or more dimensions. Preferably, the imaging parameters are encoded into a suitable numerical format or mapped in this way from class to numerical. The imaging parameters are provided as one or more vectors or matrices in the input layer IL within the network CR strand SP. The dimensions of the vector or matrix, or actually generally a tensor, are equal to the dimensions of the image data IM. 【0152】 A similar architecture to that shown in Figure 6 is also used for the contrast booster CB=G2. 【0153】 As mentioned above, NNs are not a requirement in this specification. For example, DS can be implemented as any other classifier setup such as a support vector machine (SVM), k-nearest neighbors, or a decision tree or random forest. The two generators may be implemented as different NNs, or one as an NN model and the other as a non-NN model. Non-NN type ML models assumed in this specification for either or both of the generators G1 and G2 include variational autoencoders, Boltzmann machines, Bayesian networks, etc. 【0154】 Here, refer to Figure 7, which shows a flowchart of the computer-aided method for image correction. This method is based on the machine learning model G1=CR, which is trained by one of the training systems TS described above, based on at least two-category image training datasets TD. A GAN setup is used, but this is not required in this specification. 【0155】 In step S710, a new image (not part of the dataset TD) is supplied to the pre-trained machine learning model. The new image X is acquired, for example, by an X-ray imaging device. Preferably, image X is acquired while a contrast agent (administered prior to the patient) is present within the field of view (FOV) of the X-ray imaging device. 【0156】 The machine learning models CR and CB process the supplied image X and, in step S720, perform IQ correction on the image. 【number】 For example, contrast-corrected image 【number】 Calculate the contrast correction image. 【number】 The following is output: Contrast-corrected image 【number】 It is displayed on a display device, stored, or processed in any other way. 【0157】 In particular, corrected image 【number】 The contrast within is higher than the contrast in the input image X. In an embodiment, the pre-trained machine learning model includes two parts: a machine learning stage and an analysis stage. These parts operate sequentially. The machine learning stage includes a subsequent analysis stage, such as the pre-trained contrast reducer CR, G1 and the contrast booster CB described above. First, the contrast reducers CR, G1 reduce the contrast of the input image X to generate a contrast-reduced image. Based on the initial image and the contrast-reduced image, the level of contrast distribution already present in the input image is estimated. The contrast thus estimated is then amplified ("enhanced") linearly or non-linearly by the contrast booster CB. In a linearly acting embodiment, the contrast booster CB is configured to include an amplification coefficient β that is applied multiplicatively to the contrast estimate point by point. The contrast thus amplified is then added to the original input image X to create a contrast-corrected image. 【number】 The following is obtained. The amplifier or enhancement coefficient β is fixed or adjustable by the user. 【0158】 As a result of the machine learning model's operation, contrast-corrected image 【number】 The contrast correction is localized, not global. Specifically, the contrast correction is more pronounced in areas of the image that already represent the contrast agent. Therefore, the correction operation in step S720 of the machine learning model will increase the contrast only in one or more of those areas that represent contrast. Other areas of the image will be left unchanged (within the applicable margin), or, if contrast is still corrected in such areas, this will be done to a lower degree than in areas that represent contrast within the input image X. 【0159】 The locally focused contrast correction described above stems, at least in part, from the way the machine learning partial CR was trained, which includes the use of a cycle consistency checker function as part of the objective function to control the training. Similar locally acting corrections can be obtained for other IQ metrics such as artifacts or noise. 【0160】 If X is the input image acquired when the contrast agent / drug is not present within the FOV and / or when the amount / concentration is below the threshold, then the output image 【number】 This is essentially a copy of the input image X. Therefore, the processing steps are asymmetric with respect to the IQ of the input image. 【0161】 Here, we refer to Figure 8, which shows a flowchart of a computer implementation method for training the target machine learning models CR and G1 in the training system TS framework described above. The framework includes multiple ML models. The following steps of the method include computing a training output image using the model. Based on the training output image, the objective function is improved in the optimization step by adjusting the current (or initial) parameter set in one or more iterations. The ML models mentioned below are assumed to have an initial or current parameter set. If the model is a neural network type model, the above parameters include, for example, the weights of network nodes in various layers and / or filter elements in the case of a convolutional network (CNN), as described above in Figure 6. 【0162】 In step S810, initial training images are extracted from the training dataset. The set includes at least two classes of images: contrast-enhanced and non-contrast images as described above. 【0163】 First, in step S810, a contrast-enhanced image is created. 【number】 Assuming that is received / extracted, this is processed by the first machine learning model in step S820. The first machine learning model outputs the first training output image I. Generally, during training, the first machine learning model processes the initial training input image 【number】 It is configured to act on the input image to generate a first training output image I having reduced contrast compared to the original image. 【0164】 In embodiments, the first machine learning ("ML") model is of a generative type. However, it is preferably, optionally, part of a GAN network, which forms part of the (global) training system frame TS described above, which includes multiple ML models. In addition to the first ML model, the GAN network includes another machine learning model, preferably a discriminator type model DS. 【0165】 In the optional step S830, the discriminator model DS processes the first training output image I and calculates a classification result representing a classification attempt to classify the output image as either high-contrast or low-contrast. In other words, the classification step in S830 discriminates whether the training output image was (artificially) generated by the first ML model or whether it was drawn from the low-contrast category of the initial training dataset. 【0166】 In step S840, the contrast-reduced training output image I is increased again, and the second training output image 【number】 A contrast increase / contrast enhancement operation in step S840 is based in particular on the contrast reduction image and the initial high-contrast image as derived in step S810. In the contrast increase / contrast enhancement operation S840, a scaling factor is used. In particular, the initial input image and the first output image are linearly combined using one or more such scaling factors. A weighted sum is used. More specifically, in embodiments, in step S840, the "base contrast" is estimated by subtracting the initial input image from the first output image. The base contrast is then enhanced by multiplying it by the scaling factor, and the scaling result is then added to the initial input image. Ideally, the "base contrast" represents the intrinsic contrast present / encoded in a given image. The base contrast is the portion of image information (as encoded in image I) caused by the amount / concentration of contrast agent present in the FOV of imager IA, noise-dependent, when the above image was acquired. 【0167】 Contrast corrected in that manner 【number】 The image is then subjected to further contrast reduction using a third ML model in step S850, in an attempt to replicate the initial contrast-enhanced image extracted in step S810. This third ML model is preferably of the same generative type, and consequently, the ML framework of the training system for the proposed training method includes two generative ML models. 【0168】 In step S860, the parameters of the two or three machine learning models mentioned above are adjusted to improve the objective function (or cost function). The value of the objective function depends on the training output images obtained above and the current classification result. Here again, the discriminator model DS is optional. 【0169】 Steps S810 to S860 described above are repeated one or more times for one or more initial training images extracted in step S810 in order to optionally obtain new classification results and new training output images. The cost function depends on the optionally obtained newly acquired classification results and new training output images, and further depends on the optionally preceding classification results and preceding training output images. Specifically, when the objective function is a cost function, the cumulative cost is taken into account for all previous or one or more training input images. Training input images can be extracted and processed one by one or in subsets from the initial training dataset. A subset may be referred to as a batch. 【0170】 Training is completed after a certain number of iterations and after a number of such initial training input images have been processed. Training is completed when the stop condition is met. The stop condition is a user-issued signal, a preset number of iterations, all or a certain percentage of the training images in the set have been processed, or sufficient convergence is detected in the ML parameters. Upon completion of training, in follow-up step S870, the first ML model G1, CR becomes available as part of the trained model of the contrast correction system, which is then used during deployment. The trained ML model for image correction further includes a post-analysis processing stage, which includes fixed or user-adjustable contrast booster parameters, to be used to carry out step S840 during deployment. 【0171】 In step S860, the parameters of the aforementioned ML components for the classification operation S830 and the two contrast reductions S820 and S850 are adjusted for multiple initial training input images in one or more iterations so that they can operate as intended. A gradient-based method, such as the backpropagation technique of an NN-type ML model, is used, and the training controller TC evaluates the gradient of the objective function. As previously stated, it is assumed in this specification that the objective function is a cost function and that improvement of the objective function is a reduction in cost. However, this does not limit what is described herein, as this operating principle is readily applicable to utility functions as well, if the improvement is either an increase or maximization of utility rather than a reduction or minimization of cost. 【0172】 The cost returned by the cost function is based on i) the classification result calculated arbitrarily in step S830, and ii) the deviation between the first low / high contrast image extracted in step S810 and the second training output image generated in the reconstruction step S850. The smaller the deviation, the lower the cost incurred. With respect to the arbitrarily selected discriminator, the cost function is configured to adjudicate a lower cost if the arbitrarily selected classification result is accurate (as described in more detail below). Simultaneously, adversarially to the cost adjudicated in i) above for accurate classification results, the cost function (or another cost function in the system of cost functions) adjudicates a lower cost if the first ML model attempts to generate a first output image with reduced contrast that leads to an inaccurate classification result by the discriminator. These two adversarial objectives have been shown to converge towards a game-theoretic equilibrium during iterations. 【0173】 Once sufficient convergence is achieved in the parameters of the ML model involved, the system TS behaves asymmetrically with respect to training images from two categories: contrast-enhanced and non-contrast-enhanced. Specifically, if a low-contrast initial training input image is drawn in step S810, this is essentially maintained as the image passes through the processing steps S810-S850 described above. In particular, the second training output image output in step S860 is almost a copy of the initial input image. This differs for high-contrast images, which undergo two cycles of contrast reduction in between, where contrast is enhanced. Furthermore, the second training output image output in step S860 is still expected to be an approximate copy of the initial input image. Thus, the training system maintains low-contrast input images and reconstructs high-contrast input images (regardless of the contrast changes during the cycle). This asymmetric behavior of the training system TS is enforced, in particular, by the cycle consistency checker cost function CC() mentioned above. 【0174】 In particular, this up-down cycle with respect to the contrast-corrected input image allows the system TS to better learn the true contrast difference and thus generate a machine learning model aimed at correcting only the image portion that represents the presence of the contrast agent. 【0175】 The cost function (or system of cost functions) described above is implemented by a training controller TC, which will be described in more detail later. The training controller TC is implemented in software, hardware, or partially in both. 【0176】 As mentioned above, the training system TS uses a system of cost functions. Specifically, machine learning, i.e., tuning of ML parameters, is controlled by the training controller TC based on the cost function. The cost function is a scalar function that maps the parameters to be learned to numbers, i.e., "costs". Broadly speaking, the controller TC tunes some or all of the parameters of the ML model so that the cost is reduced. This operation is referred to herein as "improvement of the cost function". If optimization is formulated with respect to the utility function as the objective function, this is improved by increasing utility as a function of tuning the ML parameters. 【0177】 Some or each cost function itself contains sub-cost function components. These sub-cost function components are combined, for example, additively to produce their respective (higher) cost functions. Thus, the sub-cost functions are understood as terms of the (higher) cost functions. However, each sub-cost function is a cost function in itself, and for the purposes of this work, this is equivalent to formulating a single optimization (minimization) with respect to a single higher cost function, or a system of optimizations that includes each (partial) optimization of each sub-cost function. 【0178】 In the following, when we refer to a “cost function,” it is part of a system of cost functions used together to enforce the aforementioned behavior of the training system with respect to the inputs and outputs generated by the ML model framework within the training system. Therefore, when a “cost” function is referred to herein, it does not exclude other cost functions additionally used in the training system. Generally, the cost function uses an appropriate distance / deviation rate L() to measure the relationship (e.g., similarity) between the input and output images. As used herein, “input” and “output” refer not only to the initial input (image) and final output (image) of the training system TS, but also to the various intermediate input / output images generated as the data passes through the various stages of the training system TS as described above in Figures 2 to 8. The terms of the cost function are referred herein as optional regularizers assumed herein to enforce certain desirable characteristics of the parameters or the results that can be obtained from such parameters. 【0179】 The cost function can be distinguished between those operating (primarily) with respect to the contrast reducer (first generator G1), the arbitrarily selected discriminator DS, and the restorer RS ​​(second generator G2). However, it should be understood that some or all of the cost functions used herein are interdependent and are optimized together, either one at a time or alternately, as discussed below. The cost function is formulated not only with respect to the output / input images but also with respect to the parameters of (at least) three ML models. Thus, the cost function of one ML model also refers to the parameters of the other (at least two) ML models, thereby causing the interdependence mentioned. The cost function system E of the training system TS, as implemented by the training controller TC, is expressed herein as E = (E G1 ,E DS E G2 This is shown as follows. In the embodiment, the proposed optimization includes both minimization and maximization. 【0180】 Here, first, let's consider the cost function E of the first generator G1. G1 Referring to this, it optionally includes a GAN term representing the adversarial relationship between the first GAN and the discriminator. In addition or instead, there is a consistency checker function CC(), which is a (further) cost function. Generator 【number】 When these two cost functions are combined into a single (higher) cost function, this cost function can be formally expressed as the following minimization task: 【number】 【0181】 During the ceremony, 【number】 is the first generator 【number】 The parameters are represented, and the sum in (1) (and also in the following equation) is equal to the training input images of the training set TD. 【number】 It extends over λ. In equation (1) and below, λ i ' is an optional control parameter that can be used to adjust the contribution of each term in the optimization. These control parameters are generally metaparameters and are not adjusted in the optimization and therefore not learned. Some or all λ i ' is the identity element. 【0182】 Here, first, referring in more detail to the cycle consistency checker cost function CC(), it operates on both contrast-enhanced and non-contrast-enhanced images, forcing the aforementioned behavior of the training system to act globally as an identity operator on images from both classes. 【0183】 The cycle consistency checker cost function is expressed as follows: 【number】 【0184】 M is the contrast agent contribution measure, and x is the image from either class C, NC. "∝" is the contrast scaling / enhancement coefficient mentioned above, which is preferably a positive number > 1. The scaling coefficient can be adjusted by the user as mentioned, but is preferably kept fixed during training. During deployment, the scaling coefficient is changed to a value β other than the one ∝ used for training, and as a result, the user can adjust the intensity of contrast enhancement (e.g., by the user interface UI). L(·) is the similarity measure mentioned above, such as the L1 norm or squared L2 norm cost function, the sum of both, or a weighted sum. 【0185】 By minimizing (2a), the cost function E is obtained for the restorer. 【number】 Small deviations between the output and the initial input x incur a lower cost. In this way, the training system's parameters are tuned in optimization to approximate the identity operator behavior for images from both classes. 【0186】 Therefore, the cost function E G1 This involves a regularizer or cost function term CC, which restores the image from either the contrast-enhanced or non-contrast-enhanced image class. 【number】 Measure the deviation between the initial input image x and the current image. Smaller deviations result in lower costs. 【0187】 (1) Cost function E G1It includes additional, optional cost (sub)functions or regularization terms (addends). Some such additional terms are described in equations (3)-(5) below and represent prior knowledge and predictions of the contrast map M. The regularizers (3)-(5), alone or in combination, further improve the above-described locally focused IQ correction such as contrast enhancement. 【0188】 For example, the regularization term of map M is configured to promote a solution with low noise and / or roughness as represented below. 【Number】 【0189】 where (·) + is a rectifying function (defined as the positive part of its argument), and N map is calculated based on a partial or per-pixel standard deviation or variance map, which can be generated using some techniques as described, for example, in the applicant's U.S. Patent No. 8,938,110 or U.S. Patent No. 1,028,282, 【Number】 is a noise map of, and R(·) is a roughness penalty or regularization term (e.g., total variation, Huber loss function, or any other suitable measure). Term (3) or any other such similar term represents the contribution of the contrast agent / drug or a contrast agent map of a portion within the image as described above in equation (2b) 【Number】 promotes a piecewise smooth solution (such as an iodine map). 【0190】 Another optional regularization term included in (1) as an additional term is as follows. 【Number】 【0191】 Since adding contrast to an image may only increase intensity / HU (which is true depending on image artifacts), this term promotes positive (≧0) contrast agent maps. Negative contributions from maps are unrealistic, and term (4) or a similar term regularizes / facilitates the calculation of positive maps during optimization. 【0192】 The following are other optional regularization terms that can be included in (1) as additional terms: λ5L(M(I i )) (5) 【0193】 Since the non-contrast input image I is assumed to have been acquired without contrast agent present in the FOV, this term promotes solutions with a low-contrast map for the non-contrast input image I. The cost function E includes all terms (3) to (5), any single term, or any (partial) combination of equations (3) to (5). 【0194】 To refer to the arbitrarily chosen GAN setup in more detail, it requires two cost functions to formulate the adversarial relationship between G1 and D as a min-max optimization. 【0195】 One of those terms, namely the term for the first generator G1, is in (1) 【number】 The cost function is formulated such that the first generator G1 is configured to have a higher cost if it fails to produce an output that results in a discriminator DS producing a false classification result. Specifically, the task of minimizing the cost function of the first generator G1 is to minimize the logarithm of the inverse probability predicted by the discriminator for a fake image. Thus, the generator tends to produce samples that have a low probability of being fake images. 【0196】 The other GAN cost function E DSThat is, the cost function of the discriminator DS is configured as follows. [Number] 【0197】 (6) is formulated as a binary classification regarding cross-entropy. This cost function is configured to increase the cost when the discriminator fails to accurately classify the image I as output by the generator [Number] Therefore, the discriminator objective function, which is a utility function rather than a cost function, is configured to maximize the average of the logarithm of the probability of the real image and the logarithm of the inverse probability of the fake image. 【0198】 All alternative GAN setups other than (1) and (6), such as the least squares GAN cost function or the Wasserstein GAN cost function assumed in this specification in all alternative embodiments, are also assumed. The GAN setup is suitable in this specification due to its robustness, but is not essential in this specification. If the GAN step is not used, the cost function formula (6) is not required, and the [Number] terms of the cost function formula (1) are not required either. In this case, the first generator G1 can be any generative network that can be appropriately trained to generate IQ reduction samples from the training dataset in (at least two) image categories C, NC. 【0199】 [[ID=3I]] Referring to the cost function E of the restorator RS network, which is the second generator G2 here, this can be configured as follows regarding the cycle consistency regularizer CC(). G2 That is, it can be configured as follows with respect to the cycle consistency regularizer CC(). [Number] 【0200】 This cost function, similar to (1), facilitates the identity operator behavior of the restorer G2 when it acts on images from both class C and NC. 【0201】 Objective function (system) E = (E G1 ,E DS ,E G2 The optimization of equations (1), (6), and (7) is performed using an alternating optimization procedure. In this method, the optimization is performed using two or three cost functions, i.e., a network 【number】 and 【number】 The optimization of the cost function of the other two models (and, optionally, the discriminator DS, if present) is performed alternately and sequentially between each optimization step, treating the parameters of the other two models as constants. Some or each step of the alternating optimization procedure may be gradient-based, as previously described. Possible methods include stochastic gradient descent. See, for example, H. Robbins et al., "A stochastic approximation method," published in Ann Math Stat. 22, pp. 400-407 (1951), or the "ADAM" algorithm by DP Kingma et al., "A method for stochastic optimization," published as an arXiv pre-print at arXiv:1412.6980 (2014). Other optimization algorithms, such as conjugate gradient methods, Nelder-Mead methods, or even non-gradient-based ones, are also assumed herein by embodiment. 【0202】 Although the learning task described above is formulated as an optimization problem controlled by the cost function E, "optimization" as used herein does not necessarily mean that a global optimum is sought. For most applications, a local optimum is sufficient. Even when the proposed algorithm converges to a local (or global) optimum, it is not necessarily required herein that this actually be achieved. For most applications, an approximation within a given error limit is sufficient. If there is no significant change in the ML parameters, the iteration is prematurely terminated. 【0203】 In addition, training can be further improved or enhanced by using mixed structure regularization or automixing, as described, for example, by M. Freiman et al., "Unsupervised abnormality detection through mixed structure regularization (MSR) in deep sparse autoencoders," published in Medical Physics, Vol. 46(5), pp. 2223-2231 (2019). Specifically, non-contrast and contrast-enhanced images are mixed in this manner. 【0204】 In addition to generating contrast-enhanced images, the proposed method may also provide additional useful results, including virtual non-contrast (VNC) images, for example, network images. 【number】 The output is a contrast agent map. 【number】 It can be used to calculate as 【0205】 The training system TS and contrast enhancer SIE (Figure 5) described above illustrate image contrast correction, but the principles described are equally applicable to other types of image quality (IQ) correction, including artifact and noise reduction. Therefore, the above can be easily extended to other types of IQ, where the two categories of images in the training set TD are, here, two classes: high-quality images and low-quality images, for example, images with low artifact contribution or low noise in high-IQ samples on one hand, and images with high artifact contribution or high noise in the other hand. For example, for artifact-related IQ, the first generator G1 operates to increase artifact contribution in this embodiment, while for noise-related IQ, the first generator G1 increases noise. In such IQ embodiments other than contrast, the measure function M of the cycle consistency checker CC() is configured to measure local noise, such as by the standard deviation by the pixel neighborhood. Other noise filters may be used. For IQ regarding the presence of artifacts, the measure function M is constructed as a spatial or frequency domain filter with a high response that is tuned to artifact image patterns of particular interest, such as cup artifacts, rings, or stripes. 【0206】 As used herein, the modifiers “high” and “low” indicate that a given IQ is better in a given high-IQ image sample than in a given low-IQ image sample. 【0207】 One or more features disclosed herein are comprised of or implemented as / by circuits encoded in a computer-readable medium, and / or in combination thereof. Circuits include discrete and / or integrated circuits, application-specific integrated circuits (ASICs), systems on a chip (SOC), and combinations thereof, machines, computer systems, processors and memories, and computer programs. 【0208】 In another exemplary embodiment of the present invention, a computer program or computer program element is provided, characterized in that it is adapted to perform a step of a method according to one of the prior embodiments on a suitable system. 【0209】 Therefore, the computer program elements may be stored on a computer unit, which may also be part of one embodiment of the present invention. This computing unit is adapted to perform or induce the performance of the steps of the method described above. Furthermore, the computing unit is adapted to operate the components of the apparatus described above. The computing unit may be adapted to operate automatically and / or to execute user instructions. The computer program is loaded into the working memory of the data processor. Thus, the data processor is equipped to perform the method of the present invention. 【0210】 This exemplary embodiment of the present invention covers both computer programs that use the present invention from the outset and computer programs that, through updates, make existing programs use the present invention. 【0211】 Furthermore, computer program elements may be capable of providing all the steps necessary to satisfy the procedures of exemplary embodiments of the methods described above. 【0212】 According to a further exemplary embodiment of the present invention, a computer-readable medium such as a CD-ROM is presented, which stores computer program elements, the computer program elements being described in the preceding section. 【0213】 Computer programs are stored and / or distributed on suitable media (in particular, non-transient media, though not essential) such as optical or solid-state media supplied together with or as part of other hardware, but may also be distributed in other forms via the Internet or other priority or wireless telecommunication systems. 【0214】 However, computer programs may also be presented via a model such as the World Wide Web, and can be downloaded from such a model into the working memory of a data processor. According to a further exemplary embodiment of the present invention, a medium is provided that makes a computer program element available for download, and the computer program element is configured to carry out a method according to one of the aforementioned embodiments of the present invention. 【0215】 It should be noted that embodiments of the present invention are described with reference to several different means. In particular, some embodiments are described with reference to method-type claims, while others are described with reference to device-type claims. However, those skilled in the art will infer from the above and below descriptions that, unless otherwise noted, any combination of features belonging to one type of subject matter, as well as any combination of features relating to different subjects, are disclosed by this application. However, all features can be combined to provide synergistic effects that go beyond simply adding features together. 【0216】 Although the present invention is illustrated and described in detail in the drawings and the above description, such illustrations and descriptions should be considered examples or illustrations, not limitations. The present invention is not limited to the embodiments disclosed. A person skilled in the art can, by studying the drawings, this disclosure, and the dependent claims, understand and practice other variations of the disclosed embodiments in putting the claimed invention into practice. 【0217】 In the claims, the word “equipped with” does not exclude other elements or steps, and singular elements do not exclude plural elements. A single processor or other unit may satisfy the functions of several items described in the claims. The mere fact that certain measures are described in different dependent claims does not imply that combinations of these measures cannot be used advantageously. Any reference numerals in the claims shall not be construed as limiting their scope.

Claims

[Claim 1] A training system for training a target machine learning model for image correction of medical images, wherein the training system is An input interface for receiving a training input image extracted from a set of training data including at least two types of images: high-resolution and low-resolution, wherein the training input image is the high-resolution image; A machine learning model framework comprising a first generator network and a second generator network, wherein the target machine learning model comprises the first generator network, the first generator network processes the high-resolution type training input image to generate a training output image with reduced image quality, the target machine learning model further generates a second training output image with higher image quality than the high-resolution type training input image based on the training output image and the high-resolution type training input image, and the second generator network is operable to estimate the value of the high-resolution type training input image from the second training output image. A training controller capable of adjusting at least one parameter of the machine learning model framework based on the deviation between the estimated value of the high-resolution type training input image and the high-resolution type training input image, and A training system equipped with the following features. [Claim 2] The machine learning model framework includes a generative adversarial type subnetwork, which includes the first generator network and the discriminator network. The discriminator attempts to discriminate between the low-resolution training input image and the training output image drawn from the set, and generates a discriminant result. The training system according to claim 1, wherein the training controller is operable to adjust the parameters of the machine learning model framework based on the discrimination results. [Claim 3] The training system according to claim 1 or 2, configured to process the training input image together with context data describing one or more of the following: i) at least one image acquisition parameter, ii) at least one reconstruction parameter, and iii) patient data. [Claim 4] The training system according to any one of claims 1 to 3, wherein the architecture of the first generator network and / or the second generator network is of the convolutional type. [Claim 5] The training system according to any one of claims 1 to 4, wherein the architecture of the first generator network and / or the second generator network is of the multiscale type. [Claim 6] The training system according to any one of claims 1 to 5, wherein the image quality includes one of i) contrast level, ii) artifact level, and iii) noise level. [Claim 7] The training system according to any one of claims 1 to 6, wherein the high-resolution image is an image recorded by the imaging device while a contrast agent is present, and the low-resolution image is an image recorded by the imaging device or an imaging device while a smaller amount of the contrast agent or a certain contrast agent is present. [Claim 8] The training system according to any one of claims 1 to 7, wherein the high-resolution / low-resolution images in the set are X-ray images or MRI images. [Claim 9] The training system according to claim 8, wherein the high-resolution / low-resolution images are computed tomography images. [Claim 10] A first generator network according to claim 1, which is operable to process an input image received from an image storage device or supplied by an imaging device in order to provide a processed image, A display device for displaying the processed image and An imaging component comprising: [Claim 11] A method for training a target machine learning model, wherein the method is A step of receiving a training input image drawn from a set of training data including at least two types of images: high-resolution and low-resolution, wherein the training input image is a high-resolution image. A processing step comprising: processing the high-resolution type training input image in order to generate a training output image with reduced image quality using a generative network, wherein the generative network is part of a machine learning model framework that further includes a second generator network, and the target machine learning model includes a first generator network; A step of further processing the training output image and the high-resolution type training input image using the target machine learning model, wherein a second output image with higher image quality than the high-resolution type training input image is generated and further processed. The second generator network estimates the value of the high-resolution type training input image from the second output image, The training controller performs at least one step of adjusting the machine learning model framework based on the deviation between the estimated value of the high-resolution training input image and the high-resolution training input image. Methods that include... [Claim 12] A method of image processing comprising the step of processing an input image received from an image storage device or supplied by an imaging device in order to provide a processed image by a first generator network as described in claim 1. [Claim 13] A computer program, which, when executed by at least one processing unit, is adapted to cause the processing unit to carry out the method described in claim 11 or 12. [Claim 14] A computer-readable medium on which the computer program described in claim 13 is stored.

Citation Information

Patent Citations

  • Transformation of digital pathology images

    WO2019154987A1

  • Image enhancement using generative adversarial networks

    WO2019209820A1

  • Information processing method and information processing system

    WO2020179200A1