Method and related device for partial volume correction of pet images based on depth image priors
By constructing a partial volume correction model for PET images using an unsupervised framework based on depth image priors, and introducing back-projection fidelity terms and MRI conditional constraints, combined with population pre-training and individual fine-tuning, the problems of low resolution and data dependence in PET images are solved, thereby improving image quality and anatomical consistency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SOUTHERN MEDICAL UNIVERSITY
- Filing Date
- 2026-02-09
- Publication Date
- 2026-06-12
AI Technical Summary
Existing PET images have low spatial resolution and significant volume effects, leading to biases in quantitative assessments. Traditional methods are computationally complex or have stringent hardware requirements, while deep learning methods rely on large amounts of paired data and suffer from slow convergence and insufficient detail preservation.
A partially volume correction model is constructed using an unsupervised framework based on depth image priors. Back-projection fidelity terms, denoising regularization, and MRI conditional constraints are introduced. Combined with pre-training based on population information and fine-tuning training for individuals, the training efficiency and detail preservation capabilities of the model are improved.
It significantly improves the image structure similarity and noise suppression capability of PET images, enhances the performance of some volume correction models, avoids dependence on large-scale paired data, and improves the image quality and anatomical consistency of the correction results.
Smart Images

Figure CN122199343A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image processing technology, and in particular to a method and related equipment for partial volume correction of PET images based on depth image priors. Background Technology
[0002] Positron emission tomography (PET) is an advanced functional imaging technique that can quantitatively describe the distribution of radiotracers in vivo. However, limited by hardware and software performance, PET images have low spatial resolution and significant partial volume effect (PVE), leading to biases in quantitative assessment. To address this issue, partial volume correction (PVC) methods can be divided into reconstruction-based and post-reconstruction methods. Reconstruction-based methods embed the PVC correction model into the PET image reconstruction process, combining anatomical information to directly cancel the effect during the radioactive signal-to-image conversion stage. However, this method has high computational complexity, stringent hardware requirements, and is highly sensitive to the accuracy of the anatomical model. Post-reconstruction methods typically combine high-resolution anatomical images (such as MRI images) for correction, but traditional methods rely on accurate image segmentation and registration and are sensitive to noise.
[0003] In recent years, deep learning methods have made significant progress in the field of image priors (PVCs), but most rely on supervised training with large amounts of paired data, making data acquisition difficult in clinical applications. Deep image priors, as an unsupervised method, offer a new approach to address the data shortage problem, but they still suffer from slow convergence, insufficient detail preservation, and susceptibility to local optima in PVC tasks.
[0004] In summary, the technical problems existing in the relevant technologies need to be improved. Summary of the Invention
[0005] The embodiments of this application aim to at least partially address one of the technical problems in the related art. Therefore, the main objective of the embodiments of this application is to propose a method and related equipment for partial volume correction of PET images based on depth image priors, which can significantly improve image structural similarity and noise suppression capabilities, while simultaneously enhancing model training efficiency and detail preservation capabilities, thereby improving the performance of the partial volume correction model.
[0006] To achieve the above objectives, one aspect of this application proposes a method for partial volume correction of PET images based on depth image priors, the method comprising the following steps: Acquire first target image data of a group of patients; wherein, the first target image data includes a first target PET image and a corresponding first target MR image; The first volume correction model is constructed based on depth image priors, and a partial volume correction function is constructed that includes backprojection fidelity terms, denoising regularization, kernel constraints, and MRI conditional constraints; wherein, the first volume correction model includes an image generation network and a kernel generation network; Based on the first target image data, the first partial volume correction model is pre-trained with population information in conjunction with the partial volume correction function to obtain the trained second partial volume correction model, and the encoder parameters of the image generation network in the second partial volume correction model are frozen. Acquire second target image data of the target patient; wherein, the second target image data includes a second target PET image and a corresponding second target MR image; Construct a third volume correction model with the same structure as the first volume correction model, and load the encoder parameters frozen in the second volume correction model into the third volume correction model; After the encoder parameters are loaded into the third partial volume correction model, the third partial volume correction model is individually fine-tuned based on the second target image data and combined with the partial volume correction function to obtain the trained fourth partial volume correction model. The output of the image generation network in the fourth partial volume correction model is used as the target partial volume correction image corresponding to the target patient.
[0007] In some embodiments, acquiring the first target image data of the group of patients includes: The first raw image data of the group of patients was acquired using PET and MRI equipment; wherein the first raw image data includes a first raw PET image and a corresponding first raw MR image; The first original image data is converted to obtain the first formatted image data. The first formatted image data is subjected to image size unification processing to obtain first size standardized image data; The first illegal value replacement process is performed on the first size standardized image data to obtain the first compliant image data; The first compliant image data is normalized to obtain the first normalized image data; The first normalized image data is subjected to a second illegal value replacement process to obtain the first target image data.
[0008] In some embodiments, the step of pre-training the first partial volume correction model with population information based on the first target image data and in conjunction with the partial volume correction function to obtain a trained second partial volume correction model, and freezing the encoder parameters of the image generation network in the second partial volume correction model, includes: The first target MR image is input into the image generation network in the first partial volume correction model to generate a first predicted partial volume correction image. The initial Gaussian kernel is input into the kernel generation network in the first part of the volume correction model to generate the first prediction system point spread function; The first target PET image corresponding to the first target MR image is used as the model pre-training target. Based on the first predicted partial volume correction image and the model pre-training target, the first pre-training loss corresponding to the image generation network is calculated in combination with the partial volume correction function. The second pre-training loss corresponding to the kernel generation network is calculated based on the point spread function of the first prediction system and the partial volume correction function. The total pre-training loss of the group information is determined based on the first pre-training loss and the second pre-training loss. Based on the total loss of the pre-training of the group information, gradient backpropagation is performed on the first partial volume correction model to iteratively update the model pre-training parameters of the first partial volume correction model until the training requirements of the pre-training of the group information are met, thereby obtaining the trained second partial volume correction model, and freezing the encoder parameters of the image generation network in the second partial volume correction model.
[0009] In some embodiments, the image generation network includes an encoder and a decoder, and the skip connection between the encoder and the decoder is implemented using a channel Transformer module. The step of inputting the first target MR image into the image generation network in the first partial volume correction model to generate a first predicted partial volume corrected image includes: The first target MR image is input into the image generation network in the first partial volume correction model. The encoder in the image generation network performs layer-by-layer downsampling on the first target MR image to extract encoder feature maps at different scales. The encoder feature maps at different scales include several layers of shallow feature maps and one layer of deep feature map. The shallow feature maps of several layers are embedded into corresponding feature tokens through the skip connections, and the feature tokens are concatenated along the channel dimension to obtain a feature token set. The feature token set is fused using a channel attention cross-fusion module to obtain token fusion features. The token fusion feature is reconstructed using the decoder to obtain several reconstructed feature maps. The deep feature map is upsampled by the decoder to obtain the decoder deep feature map; By using a channel-based cross-attention module, channel responses in each of the reconstructed feature maps that are not related to partial volume correction are suppressed based on the deep feature map of the decoder, so as to achieve feature fusion of the deep feature map of the decoder and each of the reconstructed feature maps, and obtain a fused feature map. The fused feature map is input into the output layer of the image generation network to generate the first predicted volume-corrected image.
[0010] In some embodiments, the step of individually fine-tuning the third partial volume correction model based on the second target image data and in conjunction with the partial volume correction function to obtain a trained fourth partial volume correction model, and using the output of the image generation network in the fourth partial volume correction model as the target partial volume correction image corresponding to the target patient, includes: The second target MR image is input into the image generation network in the third volume correction model to generate a second predicted volume correction image. The initial Gaussian kernel is input into the kernel generation network in the third part of the volume correction model to generate the point spread function of the second prediction system. The second target PET image corresponding to the second target MR image is used as the model fine-tuning training target. Based on the second predicted partial volume correction image and the model fine-tuning training target, the first fine-tuning training loss corresponding to the decoder in the image generation network is calculated in combination with the partial volume correction function. The second fine-tuning training loss corresponding to the kernel generation network is calculated based on the point spread function of the second prediction system and the partial volume correction function. The total individual fine-tuning training loss is determined based on the first fine-tuning training loss and the second fine-tuning training loss. The decoder and the kernel generation network are backpropagated using gradients based on the total loss of individual fine-tuning training to iteratively update the model fine-tuning training parameters of the decoder and the kernel generation network until the training requirements of individual fine-tuning training are met, thereby obtaining the trained fourth volume correction model. The output of the image generation network in the fourth volume correction model is then used as the target volume correction image corresponding to the target patient.
[0011] In some embodiments, the method further includes: An early stopping strategy for model training is adopted. Based on the early stopping judgment index and the preset early stopping training threshold, it is determined whether to terminate the pre-training of the group information or the fine-tuning training of the individual. The early stopping judgment index is a quantitative evaluation index of image quality, which includes peak signal-to-noise ratio and structural similarity.
[0012] To achieve the above objectives, another aspect of this application proposes a PET image partial volume correction device based on depth image priors, the device comprising the following modules: The first target image data acquisition module is used to acquire first target image data of a group of patients; wherein, the first target image data includes a first target PET image and a corresponding first target MR image; The model and loss function construction module is used to construct a first volume correction model based on depth image priors, and to construct a partial volume correction function that includes back projection fidelity terms, denoising regularization, kernel constraints, and MRI conditional constraints; wherein, the first volume correction model includes an image generation network and a kernel generation network; The group information pre-training module is used to pre-train the first partial volume correction model based on the first target image data and in combination with the partial volume correction function to obtain the trained second partial volume correction model, and freeze the encoder parameters of the image generation network in the second partial volume correction model. The second target image data acquisition module is used to acquire second target image data of the target patient; wherein, the second target image data includes a second target PET image and a corresponding second target MR image; The target patient model construction module is used to construct a third volume correction model with the same structure as the first volume correction model, and to load the encoder parameters frozen in the second volume correction model into the third volume correction model. The individual fine-tuning training module is used to perform individual fine-tuning training on the third partial volume correction model based on the second target image data and in conjunction with the partial volume correction function after the encoder parameters are loaded into the third partial volume correction model, to obtain the trained fourth partial volume correction model, and to use the output result of the image generation network in the fourth partial volume correction model as the target partial volume correction image corresponding to the target patient.
[0013] To achieve the above objectives, another aspect of this application provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the above-described method.
[0014] To achieve the above objectives, another aspect of the embodiments of this application proposes a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method.
[0015] To achieve the above objectives, another aspect of this application provides a computer program product, including a computer program that, when executed by a processor, implements the above-described method.
[0016] The embodiments of this application include at least the following beneficial effects: This application provides a method and related equipment for partial volume correction of PET images based on depth image priors. This scheme acquires first target image data of a group of patients; wherein, the first target image data includes a first target PET image and a corresponding first target MR image; a first partial volume correction model is constructed based on depth image priors, and a partial volume correction function including backprojection fidelity terms, denoising regularization, kernel constraints, and MRI condition constraints is constructed; wherein, the first partial volume correction model includes an image generation network and a kernel generation network; based on the first target image data, the first partial volume correction model is pre-trained with group information in combination with the partial volume correction function to obtain a trained second partial volume correction model, and the second part is frozen. The encoder parameters of the image generation network in the volume correction model are defined as follows: The second target image data of the target patient is obtained; the second target image data includes a second target PET image and a corresponding second target MR image; a third volume correction model with the same structure as the first volume correction model is constructed, and the frozen encoder parameters from the second volume correction model are loaded into the third volume correction model; after the encoder parameters are loaded into the third volume correction model, the third volume correction model is individually fine-tuned based on the second target image data and combined with the partial volume correction function to obtain the trained fourth volume correction model, and the output of the image generation network in the fourth volume correction model is used as the target partial volume correction image corresponding to the target patient. This application's embodiments construct a partial volume correction model using an unsupervised depth image prior framework, avoiding reliance on large-scale paired data; introducing a backprojection fidelity term improves adaptability to ill-conditioned inverse problems; combining denoising regularization and MRI (magnetic resonance imaging) conditional guidance enhances the image quality and anatomical consistency of the correction results; using kernel constraints to adjust the weight distribution of the generated PSF (systemic point spread function) solves the PSF required for the backprojection fidelity term, avoiding human design errors; by introducing a backprojection fidelity term, denoising regularization, and MRI conditional constraints, image structural similarity and noise suppression capabilities are significantly improved; by employing a population information pre-training and individual fine-tuning training strategy, the training efficiency and detail preservation capabilities of the model are improved, thereby enhancing the performance of the partial volume correction model. Attached Figure Description
[0017] Figure 1 This is a flowchart of the steps of the PET image partial volume correction method based on depth image prior provided in the embodiments of this application; Figure 2 This is a schematic diagram of the structure of the image generation network provided in the embodiments of this application; Figure 3 This is a schematic diagram of the kernel generation network provided in an embodiment of this application; Figure 4 This is a schematic diagram of the model training strategy provided in the embodiments of this application; Figure 5 This is a schematic flowchart of the PET image partial volume correction method based on depth image prior provided in the embodiments of this application; Figure 6 This is a schematic diagram of experimental comparison on the first dataset provided in an embodiment of this application; Figure 7 This is a schematic diagram of the experimental comparison on the second dataset provided in the embodiments of this application; Figure 8 This is a schematic diagram of the structure of the PET image partial volume correction device based on depth image prior provided in the embodiments of this application; Figure 9 This is a schematic diagram of the hardware structure of the electronic device provided in the embodiments of this application. Detailed Implementation
[0018] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to limit it. In the following description, when referring to the accompanying drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with those of this application; they are merely examples of apparatuses and methods consistent with some aspects of the embodiments of this application as detailed in the appended claims.
[0019] It is understood that the terms “first,” “second,” etc., used in this application may be used herein to describe various concepts, but unless otherwise stated, these concepts are not limited by these terms. These terms are only used to distinguish one concept from another. For example, without departing from the scope of the embodiments of this application, first information may also be referred to as second information, and similarly, second information may also be referred to as first information. Depending on the context, the words “if,” “when,” or “in response to a determination” as used herein may be interpreted as “when…” or “when…” or “in response to a determination.”
[0020] As used in this application, the terms "at least one", "multiple", "each", "any", etc., "at least one" includes one, two or more, "multiple" includes two or more, "each" refers to each of the corresponding multiples, and "any" refers to any one of the multiples.
[0021] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.
[0022] Before providing a detailed description of the embodiments of this application, some of the nouns and terms involved in the embodiments of this application will be explained first. The nouns and terms involved in the embodiments of this application are subject to the following interpretations.
[0023] (1) PET image, namely positron emission tomography (PET) image. Positron emission tomography (PET) technology is currently the only new imaging technology that can display biomolecular metabolism, receptors and neural activity in living organisms. It can detect diseases at the molecular level and has been widely used in the early diagnosis and efficacy evaluation of various diseases.
[0024] (2) MR images. Magnetic Resonance Imaging (MRI) is a type of imaging technology used for medical examinations that utilizes the nuclear magnetic resonance phenomenon. It can provide high-quality anatomical images (i.e., MR images) for clinical use.
[0025] (3) Partial volume effect (PVE) and partial volume correction (PVC): The value of each pixel on a PET image represents the average activity value of the corresponding unit tissue. The phenomenon that it cannot accurately reflect the activity value of the unit tissue itself is called partial volume effect. Partial volume correction refers to the process of reducing this effect through algorithms to improve image resolution and quantitative accuracy.
[0026] (4) System matrix: Due to factors such as PVE, PET imaging can only acquire degraded PET images. In this application, the PET system matrix is used to represent various factors that cause image degradation. In this application, the matrix is modeled as a Gaussian-like function PSF (Point Spread Function) in the image domain.
[0027] (5) Point spread function (PSF): The response distribution of a PET imaging system to an ideal point source (an infinitesimally small radioactive point), which describes the intensity distribution of the blurry spot formed by the point source in the reconstructed image.
[0028] (6) Deep Image Prior (DIP) is an unsupervised deep learning framework that uses a neural network structure as an implicit prior to recover high-quality images from a single degraded image without the need for paired training data.
[0029] (7) Loss function (loss), a mathematical function in deep learning that measures the difference between the model output and the optimization objective.
[0030] (8) Back Projection (BP): In this application, the optimization term used to replace the traditional least squares fidelity term in the DIP framework is applicable to ill-conditioned inverse problems.
[0031] (9) Regularization by Denoising (RED) is a regularization method based on a denoiser, used to suppress the influence of noise on image restoration.
[0032] (10) Neural Blind Deconvolution (NBD) is a method that automatically estimates PSF through neural networks.
[0033] (11) UCTransNet, a U-shaped network based on channel attention Transformer, used for multi-scale feature fusion.
[0034] (12) PyTorch is a deep learning technology framework based on the Python programming language, which contains a large number of related library functions.
[0035] As an example, positron emission tomography (PET) is an advanced functional imaging technique capable of quantitatively describing the distribution of radiotracers in vivo. However, limited by hardware and software performance, PET images have low spatial resolution and significant partial volume effect (PVE), leading to biases in quantitative assessment. To address this issue, partial volume correction (PVC) methods can be divided into reconstruction-based and post-reconstruction methods. Reconstruction-based methods embed the PVC correction model into the PET image reconstruction process, combining anatomical information to directly cancel the effect during the radioactive signal-to-image conversion stage. However, this method has high computational complexity, stringent hardware requirements, and is highly sensitive to the accuracy of the anatomical model. Post-reconstruction methods typically combine high-resolution anatomical images (such as MR images) for correction, but traditional methods rely on accurate image segmentation and registration and are sensitive to noise.
[0036] In recent years, deep learning methods have made significant progress in the field of PET (Polymorphic Spectroscopy) imaging, but most rely on supervised training with large amounts of paired data, making data acquisition difficult in clinical applications. Deep image priors, as an unsupervised method, offer a new approach to address the data shortage problem, but they still suffer from slow convergence, insufficient detail preservation, and susceptibility to local optima in PET tasks. For example, current unsupervised PET methods based on deep image priors utilize neural network structures as priors to recover high-quality images from single degraded PET images without the need for paired data. However, this method often uses least-squares fidelity terms during optimization, exhibits poor adaptability to ill-posed inverse problems, and is susceptible to noise, potentially resulting in artifacts or loss of detail in the corrected image.
[0037] Therefore, traditional deep learning methods require a large amount of paired data with samples and labels, while obtaining suitable datasets in PET clinical practice is quite difficult. Traditional DIP methods converge slowly in PVC tasks, are prone to getting trapped in local optima, and are trained on single images, making it difficult to utilize common features in group data. The use of least-squares fidelity terms is sensitive to noise and performs poorly in high-dimensional ill-conditioned problems. The lack of multimodal information guidance means that the correction results need improvement in terms of anatomical structural consistency.
[0038] In view of this, this application provides a method and related equipment for partial volume correction of PET images based on depth image priors. This scheme avoids dependence on large-scale paired data by constructing a partial volume correction model using an unsupervised depth image prior framework; it improves adaptability to ill-conditioned inverse problems by introducing a backprojection fidelity term; it enhances the image quality and anatomical consistency of the correction results by combining denoising regularization and MRI (magnetic resonance imaging) conditional guidance; it solves the PSF required for the backprojection fidelity term by adjusting the weight distribution of the generated PSF (systemic point spread function) using kernel constraints, avoiding human design errors; it significantly improves image structural similarity and noise suppression ability by introducing a backprojection fidelity term, denoising regularization, and MRI conditional constraints; and it improves the training efficiency and detail preservation ability of the model by using a group information pre-training and individual fine-tuning training strategy, thereby improving the performance of the partial volume correction model.
[0039] The PET image partial volume correction method based on depth image prior provided in this application relates to the field of image processing technology. This method can be applied to a terminal, a server, or software running on either a terminal or a server. In some embodiments, the terminal can be a smartphone, tablet, laptop, desktop computer, smart speaker, smartwatch, or in-vehicle terminal, but is not limited to these. The server can be configured as an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. The server can also be a node server in a blockchain network. The software can be an application implementing the PET image partial volume correction method based on depth image prior, but is not limited to the above forms.
[0040] This application can be used in a wide variety of general-purpose or special-purpose computer system environments or configurations. Examples include: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics devices, network PCs, minicomputers, mainframe computers, and distributed computing environments including any of the above systems or devices. This application can be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform specific tasks or implement specific abstract data types. This application can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.
[0041] It should be noted that in each specific implementation of this application, when it is necessary to collect and process sensitive medical data from hospitals, patients, or other relevant sensitive data of the target being predicted, permission or consent from the user or relevant institution will be obtained first. Moreover, the collection, use, and processing of this data will comply with relevant laws, regulations, and standards.
[0042] Please see Figure 1 , Figure 1 This is an optional flowchart of a PET image partial volume correction method based on depth image priors provided in this application embodiment. Figure 1 The method may include, but is not limited to, steps S101 to S106.
[0043] Step S101: Obtain first target image data of the group of patients; wherein, the first target image data includes a first target PET image and a corresponding first target MR image; In some embodiments, step S101 may include: acquiring first raw image data of a group of patients using a PET device and an MRI device; wherein the first raw image data includes a first raw PET image and a corresponding first raw MR image; performing format conversion processing on the first raw image data to obtain first formatted image data; performing image size unification processing on the first formatted image data to obtain first size-standardized image data; performing first illegal value replacement processing on the first size-standardized image data to obtain first compliant image data; performing normalization processing on the first compliant image data to obtain first normalized image data; and performing second illegal value replacement processing on the first normalized image data to obtain first target image data.
[0044] Here, "patient population" refers to a collection of multiple clinical subjects used for pre-training of the model population information, not a single target to be analyzed. The PET images and corresponding MR image data of the patient population are used as image training data for subsequent population information pre-training (referred to as population pre-training).
[0045] The first set of raw image data refers to native PET images of a group of patients directly acquired by PET equipment, reflecting the metabolic distribution of radiotracers in the patients' bodies. Due to differences in equipment models and scanning parameters, there are issues such as inconsistent formats, sizes, and numerical ranges, making it unsuitable for direct input into the model. Similarly, the first set of raw MR images refers to native MR images of a group of patients directly acquired by MRI equipment, reflecting the anatomical structures of tissues and organs in the patients' bodies. These are paired images from the same patient and the same scanning time period as the first set of raw PET images in the same group, and also suffer from equipment-related non-standardization issues such as format and size. Therefore, preprocessing of the raw image data is necessary. The data preprocessing process may include, but is not limited to, format conversion, image size standardization, illegal value replacement, and normalization, ultimately yielding the first target image data. The first target image data includes the first target PET image and the corresponding first target MR image. The first target PET image is a standard PET image obtained by preprocessing the original PET images of the group of patients, preserving the metabolic characteristics of the original PET images, and meeting the model's input requirements for PET data. The first target MR image is a standard MR image obtained by preprocessing the original MR images of the group of patients, preserving the anatomical structural features of the original MR images, and is precisely matched with the first target PET images of the same group to provide structural constraints for the model.
[0046] In the specific implementation, firstly, images of multiple other patients for group pre-training are acquired using PET and MRI equipment and converted into a processable format, such as '.nii' format. Then, the acquired PET and MR images, converted to the appropriate format, are read using the PyTorch library and scaled to a specified size (256×256 is used as an example in this application; scaling or cropping is used as needed) to fit the network model. Next, illegal values (maximum, minimum, values that the program cannot recognize, etc.) are replaced with values that meet the actual needs, such as 0, using torch.nan_to_num as an example. The images are further normalized, and finally, any illegal values that may be generated during normalization are replaced to obtain image training data for subsequent group information pre-training.
[0047] Normalization refers to calculating the minimum and maximum pixel values of an image x, and then calculating the normalized image using the normalization formula. The normalized calculation formula is as follows: .
[0048] Step S102: Construct a first partial volume correction model based on depth image priors, and construct a partial volume correction function that includes backprojection fidelity terms, denoising regularization, kernel constraints, and MRI conditional constraints; wherein, the first partial volume correction model includes an image generation network and a kernel generation network; Among them, deep image prior is an unsupervised deep learning framework that uses a neural network structure as an implicit prior to recover high-quality images from a single degraded image without the need for paired training data. Based on this framework, this application builds the basic architecture of a partial volume correction model (PVC model), eliminating the dependence on a large amount of paired clinical data.
[0049] The first part of the volume correction model is a basic model framework built around depth image priors for other patients (non-target patients). It possesses the core network structure of a partial volume correction model but has not learned the correction rules. Training will be completed subsequently through the optimization of the partial volume correction function. The first part of the volume correction model is used to complete the subsequent model pre-training process, obtaining encoder parameters for the pre-training stage, which will then be used for subsequent individual fine-tuning training.
[0050] In this embodiment, the partially volume-corrected model includes an image generation network and a kernel generation network. During training, the image generation network predicts the partially volume-corrected image, while the kernel generation network estimates the system's PSF (point spread function). Finally, the outputs of the two networks are convolved to obtain the simulated degraded image.
[0051] Please see Figure 2 , Figure 2 This is a schematic diagram of the structure of the image generation network provided in the embodiments of this application, such as... Figure 2 As shown, the image generation network used in this application to predict corrected images adopts an encoder-decoder structure and incorporates a channel attention mechanism to solve the multi-scale feature fusion problem. The specific network architecture is as follows: Figure 2 As shown, the encoder consists of four convolutional blocks, each containing a convolutional layer, batch normalization, and ReLU activation, with a stride of 2 for downsampling; the decoder corresponds to four upsampling blocks, using bilinear interpolation to restore resolution; the skip connections use the CTrans module (i.e., the channel Transformer module) instead of the traditional stitching method, achieving multi-scale feature fusion through channel cross-attention. It can be understood that in this embodiment, UCTransNet uses the channel Transformer module (CTrans) as the skip connection, replacing the traditional U-net skip connection, which can solve the multi-scale feature fusion problem; the output layer uses a 1×1 convolution to output a single-channel corrected image.
[0052] like Figure 2 As shown, the workflow of the image generation network is as follows: First, the input image passes through the convolution module of the encoder to obtain the first-layer feature map of layer 1, and the first-layer feature map of layer 1 is downsampled four times to obtain the multi-scale feature maps of layers 2 to 5. Then, the green upsampling module of layer 5 in the decoder has no corresponding skip connections and only upsamples the feature map of layer 5 output by the encoder to obtain the deep feature map. Next, the upsampling modules of layers 4 to 1 in the decoder sequentially perform concatenation, convolution, and other operations on the reconstructed feature map output by the skip connections and the deep feature map, and successively convert the feature map to the original size. Finally, the feature map after the concatenation, convolution, and other operations of the upsampling module of layer 1 in the decoder is subjected to convolution and activation function operations by the pink convolution module to obtain the final output of the network.
[0053] In the skip connections, the blue module first embeds the feature maps of layers 1 to 4 into tokens and concatenates them along the channel dimension. The Channel Attention Cross-fusion Transformer (CCT) is responsible for multi-scale feature fusion, and feeds the concatenated tokens into four concatenated modules consisting of layer normalization, multi-head channel cross attention, and residual multilayer perceptron to obtain output tokens. The yellow module upsamples and convolves the output tokens to reconstruct and restore the reconstructed feature maps corresponding to the dimensions of layers 1 to 4. For each layer, the Channel-wise Cross Attention (CCA) module performs global average pooling, linear transformation, and sigmoid operation on the reconstructed feature map of the layer and the deep upsampled feature map (the output of the deep upsampled module) to obtain weights. These weights are multiplied channel-wise with the reconstructed feature map to suppress channel responses in the reconstructed feature map that are unrelated to PVCs, thereby achieving feature fusion between the encoder and decoder and enhancing cross-modal semantic consistency. By integrating CCT and CCA, the CTrans module achieves adaptive fusion of encoder and decoder features, improving the model's ability to capture complex structures and subtle lesions, while maintaining a lower number of parameters and computational complexity compared to inter-block attention mechanisms.
[0054] Please see Figure 3 , Figure 3 This is a schematic diagram of the kernel generation network provided in the embodiments of this application, such as... Figure 3 As shown, Figure 3 This is a kernel generation network used to generate the PSF of the system, consisting of four hidden layers. The input to the kernel generation network is an initialized Gaussian kernel; the structure of the kernel generation network consists of four fully connected layers, each followed by ReLU or ReLU6 and Dropout; the output of the kernel generation network is a 9×9 PSF kernel matrix, constrained to sum to 1, with concentrated weights at the center. The number of neurons in the fully connected layers is set to 5 times the size of the input kernel to enhance the expressive power of the model.
[0055] In model training (including group pre-training and individual fine-tuning training), the image predicted by the image generation network and the PSF predicted by the kernel network are convolved and the model gradient is backpropagated through the loss function (i.e., partial volume correction function). The training is iterative and looped to continuously approximate the original image while preserving image details.
[0056] In this embodiment, the loss function (i.e., the partial volume correction function) includes a backprojection fidelity term, denoising regularization, kernel constraints, and MRI conditional constraints. The introduction of the backprojection fidelity term improves adaptability to ill-conditioned inverse problems; the combination of denoising regularization and MRI conditional guidance enhances image quality and anatomical consistency; kernel constraints are used to adjust the weight distribution of the generated PSF, and a blind neural deconvolution method is employed to solve the PSF required for the backprojection term. The specific content of the loss function is as follows: (1) Traditional DIP uses least squares terms, while this application uses back-projection fidelity terms (BP terms) to better adapt to ill-conditioned inverse problems. The loss of the back-projection fidelity terms (BP terms) is... The calculation formula is as follows: ; in, Represents the system matrix; This indicates that the neural network is in operation when the input is... Parameters are The output image of the downmapping; This represents the observed image that needs to be corrected. Representation based on system matrix The pseudo-inverse matrix, The superscript T here indicates transpose.
[0057] (2) To suppress noise, a denoising regularization (RED term) is introduced. The loss of the denoising regularization (RED term) is... The calculation formula is as follows: ; ; in, This represents a nonlocal mean denoiser; λ represents the weight hyperparameter. This represents the candidate image used for regularization, initialized as the PET image to be processed; the superscript T indicates transpose. The augmented Lagrange elimination formula is used. Constraints: ; in, represents a Lagrange multiplier vector, initialized to a zero vector; Represents weight hyperparameters; neural network parameters Updated via the Adam optimizer. Updated using the alternating vector multiplier method. and The specific calculation formula is as follows: ; ; Among them, superscript Indicates the number of iteration rounds.
[0058] (3) To enhance anatomical consistency, MRI statistical constraints are introduced in the form of multi-scale SSIM (Structural Similarity Index Measure), which reduces the loss due to MRI conditional constraints. The calculation formula is as follows: ; ; in, and These represent the mean and standard deviation of the output image, respectively. and These represent the mean and standard deviation of the MRI images, respectively. , , Both represent constants used in stable calculations; Indicates multiplication. Indicates by The subsampling obtained One image The product of the results is multiplied with the brightness component of the output image (subscript 1) to constrain the MRI structure; This represents the product of the contrast component and the structural component of the nth layer.
[0059] (4) To constrain PSF estimation, central loss and nonnegativity constraints are introduced, along with kernel constraints. The calculation formula is as follows: ; in, and Indicates the center position of the PSF; Indicates that it is located in PSF ( , The weight value of ); and These represent the number of rows and columns of the core, respectively.
[0060] The formula for calculating the total loss function is as follows: .
[0061] Please see Figure 4 , Figure 4 This is a schematic diagram of the model training strategy provided in the embodiments of this application. Figure 4In this context, ImageGeneration Model represents the image generation network, Gxparams freeze encoder represents the parameters of the frozen encoder, New Image Generation Model represents the new image generation network, Kernel Generation Model represents the kernel generation network, Loss represents the loss function, and New Kernel Generation Model represents the new kernel generation network. This indicates that the feature map output by the image generation network is multiplied element-wise by a partial volume correction kernel. For example... Figure 4 As shown, in the group pre-training phase, two network models are initialized and trained using MRI and PET images of other patients combined with a total loss function. After training, the encoder parameters of the image generation network are saved to learn general features, retain latent structural information, and initially improve the partial volume correction effect. In the formal training phase (i.e., the individual fine-tuning training phase), two new network models are initialized. The encoder parameters of the image generation network are inherited from the parameters saved in the group pre-training phase and are frozen. The target patient's own MRI and PET images are used as conditional inputs and optimization targets. The entire kernel generation network, as well as the decoder and skip connection parts of the image generation network, are trained using a total loss function to further optimize and generate PVC images of the target patient.
[0062] Step S103: Based on the first target image data, the first partial volume correction model is pre-trained with population information in combination with the partial volume correction function to obtain the trained second partial volume correction model, and the encoder parameters of the image generation network in the second partial volume correction model are frozen. In some embodiments, step S103 may include: inputting a first target MR image into an image generation network in a first partial volume correction model to generate a first predicted partial volume correction image; inputting an initialized Gaussian kernel into a kernel generation network in the first partial volume correction model to generate a first prediction system point spread function; using the first target PET image corresponding to the first target MR image as the model pre-training target, calculating the first pre-training loss corresponding to the image generation network based on the first predicted partial volume correction image and the model pre-training target, combined with the partial volume correction function; calculating the second pre-training loss corresponding to the kernel generation network based on the first prediction system point spread function, combined with the partial volume correction function; determining the total pre-training loss of population information based on the first pre-training loss and the second pre-training loss; performing gradient backpropagation on the first partial volume correction model based on the total pre-training loss of population information to iteratively update the model pre-training parameters of the first partial volume correction model until the training requirements of population information pre-training are met, obtaining a trained second partial volume correction model, and freezing the encoder parameters of the image generation network in the second partial volume correction model.
[0063] In some embodiments, the step of inputting a first target MR image into an image generation network in a first partial volume correction model to generate a first predicted partial volume correction image may include: inputting the first target MR image into the image generation network in the first partial volume correction model; performing layer-by-layer downsampling on the first target MR image through the encoder in the image generation network to extract encoder feature maps at different scales; wherein the encoder feature maps at different scales include several layers of shallow feature maps and one layer of deep feature map; embedding several layers of shallow feature maps into corresponding feature tokens through skip connections, and concatenating each feature token along the channel dimension to obtain a first predicted partial volume correction image. The token set is used to perform multi-scale feature fusion on the token set through a channel attention cross-fusion module to obtain token fusion features. The token fusion features are then reconstructed by a decoder to obtain several reconstructed feature maps. The deep feature maps are upsampled by the decoder to obtain the decoder deep feature map. Through a channel-based cross-attention module, channel responses in each reconstructed feature map that are not related to partial volume correction are suppressed based on the decoder deep feature map to achieve feature fusion between the decoder deep feature map and each reconstructed feature map to obtain a fused feature map. The fused feature map is then input into the output layer of the image generation network to generate the first predicted partial volume-corrected image.
[0064] Optionally, the image generation network includes an encoder and a decoder, with skip connections between the encoder and decoder implemented using a channel Transformer module. For example... Figure 2 As shown, the image generation network used in this application to predict corrected images adopts an encoder-decoder structure and incorporates a channel attention mechanism to solve the multi-scale feature fusion problem. The specific network architecture is as follows: Figure 2 As shown, the encoder consists of four convolutional blocks, each containing a convolutional layer, batch normalization, and ReLU activation, with a stride of 2 for downsampling; the decoder corresponds to four upsampling blocks, using bilinear interpolation to restore resolution; skip connections use the CTrans module (i.e., the channel Transformer module) instead of the traditional stitching method, achieving multi-scale feature fusion through channel cross-attention. It can be understood that in this embodiment, UCTransNet uses the channel Transformer module (CTrans) as skip connections, replacing the traditional U-net skip connections to solve the multi-scale feature fusion problem; the output layer uses a 1×1 convolution to output a single-channel corrected image. Figure 2 As shown, the workflow of the image generation network is as follows: First, the input image passes through the convolution module of the encoder to obtain the first-layer feature map of layer 1, and the first-layer feature map of layer 1 is downsampled four times to obtain the multi-scale feature maps of layers 2 to 5. Then, the green upsampling module of layer 5 in the decoder has no corresponding skip connections and only upsamples the feature map of layer 5 output by the encoder to obtain the deep feature map. Next, the upsampling modules of layers 4 to 1 in the decoder sequentially perform concatenation, convolution, and other operations on the reconstructed feature map output by the skip connections and the deep feature map, and successively convert the feature map to the original size. Finally, the feature map after the concatenation, convolution, and other operations of the upsampling module of layer 1 in the decoder is subjected to convolution and activation function operations by the pink convolution module to obtain the final output of the network.
[0065] In the skip connections, the blue module first embeds the feature maps of layers 1 to 4 into tokens and concatenates them along the channel dimension. The Channel Attention Cross-fusion Transformer (CCT) is responsible for multi-scale feature fusion, and feeds the concatenated tokens into four concatenated modules consisting of layer normalization, multi-head channel cross attention, and residual multilayer perceptron to obtain output tokens. The yellow module upsamples and convolves the output tokens to reconstruct and restore the reconstructed feature maps corresponding to the dimensions of layers 1 to 4. For each layer, the Channel-wise Cross Attention (CCA) module performs global average pooling, linear transformation, and sigmoid operation on the reconstructed feature map of the layer and the deep upsampled feature map (the output of the deep upsampled module) to obtain weights. These weights are multiplied channel-wise with the reconstructed feature map to suppress channel responses in the reconstructed feature map that are unrelated to PVCs, thereby achieving feature fusion between the encoder and decoder and enhancing cross-modal semantic consistency. By integrating CCT and CCA, the CTrans module achieves adaptive fusion of encoder and decoder features, improving the model's ability to capture complex structures and subtle lesions, while maintaining a lower number of parameters and computational complexity compared to inter-block attention mechanisms.
[0066] In the specific implementation, the pre-training process for group information is as follows: PET-MRI images of other patients are used as training data for successive training. First, the image generation network (UCTransNet) and the kernel generation network (fully connected network) are constructed and initialized. The image is initialized as a degraded PET image, and u is initialized as a zero vector of the same size (256×256). Then, the initialized Gaussian kernel is input into the kernel generator network to obtain the prediction system point spread function. The MR image is input into the predicted partial volume-corrected image output by the image generator network, and the PET image corresponding to the MR image is used as the model pre-training target. Next, the predicted partial volume-corrected image output by the image generator network and the prediction system point spread function output by the kernel generator network are used to calculate the corresponding loss using the loss function (including BP fidelity term, RED regularization, MRI constraint, and kernel constraint) shown in the total loss function constructed in step S102. The losses are then weighted and fused to obtain the total loss for the pre-training stage for gradient backpropagation of the network model. After that, the model is updated. And u; after pre-training the population information using all other patients' PET-MRI image pairs as input, the encoder parameters of the image generation network are saved.
[0067] Step S104: Acquire second target image data of the target patient; wherein, the second target image data includes a second target PET image and a corresponding second target MR image; The target patients are subjects who require partial volume correction of PET images, distinct from the "group patients" used for model pre-training in step S101. The individual fine-tuning training phase belongs to the application service phase.
[0068] In the specific implementation, firstly, PET and MRI image data of the same target patient for individual fine-tuning training are acquired using PET and MRI equipment and converted into a processable format, such as '.nii' format. Then, the acquired PET and MR images, which have been converted to the appropriate format, are read using the PyTorch library, and the PET and MR images are scaled to a specified size (256×256 is used as an example in this application; scaling or cropping is used as needed) to fit the network model. Next, illegal values (maximum, minimum, values that the program cannot recognize, etc.) are replaced with values that meet the actual needs, such as 0, using torch.nan_to_num as an example. The images are further normalized, and finally, any illegal values that may be generated during normalization are replaced to obtain image training data for subsequent individual fine-tuning training.
[0069] It is understood that the data preprocessing process in the individual fine-tuning training phase is consistent with the process of processing PET and MRI image data of other patients in step S102, and will not be described in detail here in the embodiments of this application.
[0070] Step S105: Construct a third volume correction model with the same structure as the first volume correction model, and load the encoder parameters frozen in the second volume correction model into the third volume correction model; The third volume correction model shares the same structure as the first volume correction model. It is a basic model framework built around a depth image prior, targeting the specific patient (other patients outside the pre-training phase). While it possesses the core network structure of a partial volume correction model, it hasn't learned the correction rules. It will subsequently be trained by optimizing the partial volume correction function, incorporating the encoder parameters frozen in the second volume correction model. This third volume correction model is used for subsequent individual fine-tuning training to obtain the trained target partial volume correction model (i.e., the fourth volume correction model) and the target partial volume correction image.
[0071] Step S106: After the encoder parameters are loaded into the third partial volume correction model, the third partial volume correction model is individually fine-tuned based on the second target image data and combined with the partial volume correction function to obtain the trained fourth partial volume correction model. The output of the image generation network in the fourth partial volume correction model is used as the target partial volume correction image corresponding to the target patient.
[0072] In some embodiments, the step of individually fine-tuning the third partial volume correction model based on the second target image data and incorporating a partial volume correction function to obtain a trained fourth partial volume correction model, and using the output of the image generation network in the fourth partial volume correction model as the target partial volume correction image corresponding to the target patient, may include: inputting the second target MR image into the image generation network in the third partial volume correction model to generate a second predicted partial volume correction image; inputting an initialized Gaussian kernel into the kernel generation network in the third partial volume correction model to generate a second prediction system point spread function; using the second target PET image corresponding to the second target MR image as the model fine-tuning training target, and adjusting the second predicted partial volume correction image accordingly. The training objectives are as follows: First, the fine-tuning training loss corresponding to the decoder in the image generation network is calculated using a partial volume correction function. Second, the fine-tuning training loss corresponding to the kernel generation network is calculated using the second prediction system's point spread function and a partial volume correction function. The total individual fine-tuning training loss is determined based on the first and second fine-tuning training losses. Gradient backpropagation is then performed on the decoder and kernel generation network based on the total individual fine-tuning training loss to iteratively update the model fine-tuning training parameters of the decoder and kernel generation network until the training requirements for individual fine-tuning training are met, resulting in the trained fourth-part volume correction model. The output of the image generation network in the fourth-part volume correction model is then used as the target partial volume correction image corresponding to the target patient.
[0073] In the specific implementation, the individual fine-tuning process is as follows: the MR image of the target patient is used as the conditional input of the image generation network, the initialized Gaussian kernel is used as the input of the kernel generation network, and the low-resolution PET image corresponding to the MR image of the target patient is used as the training target. First, a new image generation network and kernel generation network are constructed and initialized. The image generation network loads the encoder weights saved during pre-training and fixes the saved encoder parameters. The model is initialized with a degraded PET image, and u is initialized with a zero vector of the same size (256×256). Then, the initialized Gaussian kernel is input into the kernel generator network to obtain the prediction system point spread function. The MR image is input into the predicted partial volume-corrected image output by the image generator network, and the corresponding PET image is used as the model pre-training target. Next, the predicted partial volume-corrected image output by the image generator network and the prediction system point spread function output by the kernel generator network are used to calculate the corresponding losses using the loss function (including BP fidelity term, RED regularization, MRI constraint, and kernel constraint) constructed in step S102. The losses are then weighted and fused to obtain the total loss for the pre-training stage, which is used for gradient backpropagation of the decoder and kernel generator networks in the image generator network. Afterwards, the model is updated... After completing all training rounds, the final output of the image generation network is taken as the corrected PET image.
[0074] It should be noted that model initialization refers to assigning initial values to the parameters of the neural network, and neural network training also represents the updating of these parameters. In the model pre-training stage, assuming the image generation network is represented as gx1 and the kernel generation network as gk1, the image generation network gx1 and the kernel generation network gk1 are first initialized, and then group pre-training is performed on the image generation network gx1 and the kernel generation network gk1. After group pre-training, the encoder parameters of the image generation network need to be obtained for subsequent individual fine-tuning training. During the individual fine-tuning training phase, assuming the image generation network is represented as gx2 and the kernel generation network as gk2, the encoder parameters derived from the image generation network gx1 are first assigned to the image generation network gx2 used to predict the target patient. In specific implementations, either a Python function can be called to re-initialize the parameters of the pre-trained image generation network gx1 and kernel generation network gk1 to obtain the image generation network gx2 and kernel generation network gk2, or the image generation network gx2 and kernel generation network gk2 can be obtained directly by declaring new variables such as 'gx2=uctransnet()'. This application embodiment does not limit this approach. The key point is to discard the pre-trained parameters that contain residual data distributions of other patients and may lead to overfitting, and to allow the network to relearn the data distribution of the target patient.
[0075] In some embodiments, the method may further include: employing an early stopping strategy for model training, and determining whether to terminate group information pre-training or individual fine-tuning training based on an early stopping judgment index and a preset early stopping training threshold; wherein the early stopping judgment index is a quantitative evaluation index of image quality, and the quantitative evaluation index of image quality includes peak signal-to-noise ratio and structural similarity.
[0076] In practical implementation, an early stopping strategy is used to terminate training in advance to prevent overfitting. The early stopping criterion, a quantitative evaluation metric for image quality, consists of Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM), both of which are considered better the higher they are. In practical applications, theoretically and experimentally, PSNR and SSIM typically show a trend of first increasing and then decreasing before stabilizing, or gradually increasing and stabilizing. Therefore, by calculating PSNR and SSIM at certain intervals, the optimal value corresponding to the calculated values can be taken as the output after the set number of training epochs.
[0077] For an image x and y of size m×n, the peak signal-to-noise ratio is calculated using the following formula: ; Where MAX represents the maximum value of the image pixels, and MSE represents the mean square error of the image. MSE is obtained by calculating the squared difference of each pixel in the image. ; in, and Representing two images and The number of rows and columns, and Represents coordinates. The formula for calculating structural similarity (SSIM) is as follows: ; Where μ represents the image mean and σ represents the image standard deviation; and It is a constant that maintains stability. and These are two images used for comparison.
[0078] Steps S101 to S106 as illustrated in the embodiments of this application, by constructing a partial volume correction model using an unsupervised depth image prior framework, can avoid dependence on large-scale paired data; the introduction of a backprojection fidelity term can improve adaptability to ill-conditioned inverse problems; the combination of denoising regularization and MRI (magnetic resonance imaging) condition guidance can enhance the image quality and anatomical consistency of the correction results; the use of kernel constraints to adjust the weight distribution of the generated PSF (systemic point spread function) can solve the PSF required for the backprojection fidelity term and avoid human design errors; by introducing a backprojection fidelity term, denoising regularization, and MRI condition constraints, the image structural similarity and noise suppression ability can be significantly improved; by adopting a group information pre-training and individual fine-tuning training strategy, the training efficiency and detail preservation ability of the model can be improved, thereby improving the performance of the partial volume correction model.
[0079] To explain in detail the principles of the technical solution of this application, the overall process of this application will be described below with reference to some specific embodiments. It is easy to understand that the following is an explanation of the technical principles of this application and should not be regarded as a limitation of this application.
[0080] Please see Figure 5 , Figure 5 This is a schematic flowchart of a PET image partial volume correction method based on depth image prior provided in an embodiment of this application, as shown below. Figure 5 As shown, the specific implementation process of the PET image partial volume correction method based on depth image prior is as follows: The first step is data acquisition: images of multiple other patients are acquired using PET and MRI equipment for group pre-training. At the same time, PET and MRI image data of the same target patient are acquired using PET and MRI equipment for individual fine-tuning training. All PET and MRI image data are then converted into a processable format, such as '.nii' format.
[0081] The second step is data preprocessing: First, the PyTorch library is used to read the acquired PET and MR images that have been converted to an appropriate format, and to scale the PET and MR images to a specified size (256×256 is used as an example in this application; scaling or cropping may be used as needed) to fit the network model; then, illegal values (maximum, minimum, values that the program cannot recognize, etc.) are replaced with values that meet the actual needs, such as 0, using torch.nan_to_num as an example; the images are further normalized, and finally, any illegal values that may be generated during normalization are replaced to obtain image training data for subsequent group information pre-training and image training data for subsequent individual fine-tuning training.
[0082] The third step is pre-training with group information: Pre-processed PET-MRI images of other patients are used as training data for successive training. First, the image generation network (UCTransNet) and the kernel generation network (fully connected network) are constructed and initialized. The image is initialized as a degraded PET image, and u is initialized as a zero vector of the same size (256×256). Then, the initialized Gaussian kernel is input into the kernel generator network to obtain the prediction system point spread function. The MR image is input into the predicted partial volume-corrected image output by the image generator network, and the PET image corresponding to the MR image is used as the model pre-training target. Next, the predicted partial volume-corrected image output by the image generator network and the prediction system point spread function output by the kernel generator network are used to calculate the corresponding loss using the loss function (including BP fidelity term, RED regularization, MRI constraint, and kernel constraint) shown in the total loss function constructed in step S102. The losses are then weighted and fused to obtain the total loss for the pre-training stage for gradient backpropagation of the network model. After that, the model is updated. And u; after pre-training the population information using all other patients' PET-MRI image pairs as input, the encoder parameters of the image generation network are saved.
[0083] The fourth step is individual fine-tuning training: The preprocessed MR images of the target patient are used as the conditional input to the image generation network, the initialized Gaussian kernels are used as the input to the kernel generation network, and the low-resolution PET images corresponding to the target patient's MR images are used as the training targets. First, a new image generation network and kernel generation network are constructed and initialized. The image generation network loads the encoder weights saved during pre-training and fixes the saved encoder parameters. The model is initialized with a degraded PET image, and u is initialized with a zero vector of the same size (256×256). Then, the initialized Gaussian kernel is input into the kernel generator network to obtain the prediction system point spread function. The MR image is input into the predicted partial volume-corrected image output by the image generator network, and the corresponding PET image is used as the model pre-training target. Next, the predicted partial volume-corrected image output by the image generator network and the prediction system point spread function output by the kernel generator network are used to calculate the corresponding losses using the loss function (including BP fidelity term, RED regularization, MRI constraint, and kernel constraint) constructed in step S102. The losses are then weighted and fused to obtain the total loss for the pre-training stage, which is used for gradient backpropagation of the decoder and kernel generator networks in the image generator network. Afterwards, the model is updated... After completing all training rounds, the final output of the image generation network is taken as the corrected PET image.
[0084] In this application embodiment, an unsupervised partial volume correction method for PET images based on improved depth image prior and group learning is provided. By introducing back projection fidelity terms, denoising regularization, MRI-guided multimodal networks, blind neural deconvolution, and group pre-training strategies, the correction speed, image quality, and structural consistency can be improved, and the dependence on large amounts of paired data can be resolved.
[0085] It should be noted that this embodiment is only a brief illustrative description of the overall process of the PET image partial volume correction method based on depth image prior. The detailed description of each step can be referred to the relevant content in the foregoing embodiment, and will not be repeated here. It is understood that this application does not impose any limitations on this.
[0086] It is worth mentioning that the PET image partial volume correction method based on depth image prior provided in this application has also been compared and verified with related technologies in relevant publicly available datasets, verifying that the method provided in this application has superior performance. The specific details are as follows: First, please refer to Figure 6 , Figure 6 This is a schematic diagram of experimental comparison on the first dataset provided in the embodiments of this application, such as... Figure 6 As shown, the method (PRBDIP-Pro) provided in this application embodiment was verified on the PET-SORTEO dataset, and the method provided in this application embodiment (on Figure 6 The methods provided in the embodiments of this application (represented by "Proposed" in Tables 1 and 2 below) were compared with traditional PVC methods, DIP, and NBD. The results are as follows: Figure 6 As shown, the image quality metrics of the method (Proposed) and DIP / NBD on the PET-SORTEO dataset provided in this application embodiment are shown in Table 1 below. Figure 6 As shown in Table 1, the method (Proposed) provided in this application embodiment significantly outperforms the DIP and NBD methods in terms of PSNR (Peak Signal-to-Noise Ratio) and MSE (Mean Squared Error), achieves the best SSIM (Structural Similarity Index Measure) and RC (Recovery Coefficient) values, and outperforms other methods in CNR (Contrast to Noise Ratio).
[0087] Table 1: Image quality metrics on the PET-SORTEO dataset
[0088] Secondly, please refer to Figure 7 , Figure 7 This is a schematic diagram of the experimental comparison on the second dataset provided in the embodiments of this application, such as... Figure 7As shown, the method (Proposed) provided in this application embodiment was validated on the ADNI dataset in the same way as the PET-SORTEO dataset. The results of the method (Proposed) provided in this application embodiment on the ADNI dataset are as follows. Figure 7 As shown in Table 2 below, the quantitative results are combined with... Figure 7 As shown in Table 2, the method (Proposed) provided in this application embodiment has achieved the best performance in terms of PSNR, MSE, CNR, SSIM and RC.
[0089] Table 2: Image Quality Metrics on the ADNI Dataset
[0090] In summary, this application proposes an unsupervised PET image partial volume correction framework that requires no paired training data. The training loss function employs the following: introducing a backprojection fidelity term to improve adaptability to ill-conditioned inverse problems; combining denoising regularization and MRI-guided conditions to enhance image quality and anatomical consistency; and using kernel constraints to adjust the generated PSF weight distribution, combined with a blind neural deconvolution method to solve the PSF required for the backprojection term. A group pre-training + individual fine-tuning strategy is used to improve training efficiency and detail preservation. A UCTransNet combined with an MRI-related loss function is used for multimodal feature fusion, achieving adaptive and accurate fusion of PET-MRI cross-modal multi-scale features, effectively mining the value of dual-modal features, and improving the model's ability to capture subtle lesions and weak signal regions.
[0091] The key technical point of the PET image partial volume correction method based on depth image prior provided in this application is as follows: (1) Design a loss function that includes back-projection fidelity term, denoising regularization, kernel constraint and MRI condition constraint; (2) A training strategy that combines group pre-training with individual fine-tuning is used to improve the performance of unsupervised PVC models; (3) An integrated neural blind deconvolution PSF estimation network is used to automatically obtain the system point spread function; (4) Image generation and kernel generation dual network architecture based on UCTransNet and fully connected network.
[0092] The PET image partial volume correction method based on depth image prior provided in this application has the following advantages compared with related technologies: (1) No pairwise data required: An unsupervised depth image prior framework is adopted, which avoids the dependence on large-scale paired data; (2) High correction quality: Through back projection fidelity term, denoising regularization and MRI condition constraints, the image structure similarity and noise suppression ability are significantly improved; (3) High training efficiency: The group pre-training strategy accelerates convergence and reduces individual training time; (4) Excellent anatomical consistency: MRI is used as a conditional input, and a loss function is used to enhance the consistency between the correction results and the anatomical structure; (5) Automatic PSF estimation: The PSF is automatically learned through blind deconvolution and kernel constraint to avoid human design errors.
[0093] It should be noted that the partial volume correction method for PET images based on depth image priors provided in this application is applicable to the partial volume effect correction of positron emission tomography (PET) images, and can improve the accuracy of quantitative and qualitative image assessment. It is worth mentioning that the corrected images obtained by the partial volume correction method for PET images based on depth image priors provided in this application are mainly used as auxiliary references for medical practitioners in clinical judgment and to provide technical support for clinical decision-making; they do not constitute a final disease diagnosis conclusion. The final judgment on disease diagnosis, condition assessment, and treatment plan formulation shall be made by qualified doctors or medical personnel based on clinical symptoms, signs, other examination and test results, and clinical experience, depending on the actual application scenario.
[0094] Please see Figure 8 This application also provides a PET image partial volume correction device 800 based on depth image prior, which can implement the above method. The device includes the following modules: The first target image data acquisition module 801 is used to acquire first target image data of a group of patients; wherein, the first target image data includes a first target PET image and a corresponding first target MR image; The model and loss function construction module 802 is used to construct a first partial volume correction model based on depth image priors, and to construct a partial volume correction function that includes back projection fidelity terms, denoising regularization, kernel constraints, and MRI conditional constraints; wherein, the first partial volume correction model includes an image generation network and a kernel generation network; The group information pre-training module 803 is used to pre-train the first partial volume correction model based on the first target image data and in combination with the partial volume correction function to obtain the trained second partial volume correction model, and freeze the encoder parameters of the image generation network in the second partial volume correction model. The second target image data acquisition module 804 is used to acquire second target image data of the target patient; wherein, the second target image data includes a second target PET image and a corresponding second target MR image; The target patient model construction module 805 is used to construct a third volume correction model with the same structure as the first volume correction model, and to load the encoder parameters frozen in the second volume correction model into the third volume correction model. The individual fine-tuning training module 806 is used to perform individual fine-tuning training on the third partial volume correction model based on the second target image data and in conjunction with the partial volume correction function after the encoder parameters are loaded into the third partial volume correction model, to obtain a trained fourth partial volume correction model, and to use the output result of the image generation network in the fourth partial volume correction model as the target partial volume correction image corresponding to the target patient.
[0095] It is understood that the content of the above method embodiments is applicable to the present device embodiments. The specific functions implemented by the present device embodiments are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.
[0096] This application also provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the above-described method. This electronic device can be any smart terminal, including tablet computers, in-vehicle computers, etc.
[0097] It is understood that the content of the above method embodiments is applicable to this device embodiment. The specific functions implemented by this device embodiment are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.
[0098] Please see Figure 9 , Figure 9 The hardware structure of an electronic device according to another embodiment is illustrated. The electronic device includes: The processor 901 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this application. The memory 902 can be implemented as a read-only memory (ROM), static storage device, dynamic storage device, or random access memory (RAM). The memory 902 can store the operating system and other application programs. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 902 and is called and executed by the processor 901 using the methods described in the embodiments of this application. The input / output interface 903 is used to implement information input and output; The communication interface 904 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.). Bus 905 transmits information between various components of the device (e.g., processor 901, memory 902, input / output interface 903, and communication interface 904); The processor 901, memory 902, input / output interface 903, and communication interface 904 are connected to each other within the device via bus 905.
[0099] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method.
[0100] It is understood that the content of the above method embodiments is applicable to this storage medium embodiment. The specific functions implemented in this storage medium embodiment are the same as those in the above method embodiments, and the beneficial effects achieved are also the same as those achieved in the above method embodiments.
[0101] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the above-described method.
[0102] It is understood that the content of the above method embodiments is applicable to the embodiments of this program product. The specific functions implemented by the embodiments of this program product are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.
[0103] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may optionally include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
[0104] The partial volume correction method and related equipment for PET images based on depth image priors provided in this application avoids dependence on large-scale paired data by constructing a partial volume correction model using an unsupervised depth image prior framework. The introduction of a backprojection fidelity term improves adaptability to ill-conditioned inverse problems. Combining denoising regularization and MRI (magnetic resonance imaging) conditional guidance enhances the image quality and anatomical consistency of the correction results. Kernel constraints are used to adjust the weight distribution of the generated PSF (systemic point spread function), solving the PSF required for the backprojection fidelity term and avoiding human design errors. The introduction of backprojection fidelity, denoising regularization, and MRI conditional constraints significantly improves image structural similarity and noise suppression capabilities. The adoption of population information pre-training and individual fine-tuning training strategies improves the model's training efficiency and detail preservation capabilities, thereby enhancing the performance of the partial volume correction model.
[0105] The embodiments described in this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.
[0106] Those skilled in the art will understand that the technical solutions shown in the figures do not constitute a limitation on the embodiments of this application, and may include more or fewer steps than shown, or combine certain steps, or different steps.
[0107] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.
[0108] Those skilled in the art will understand that all or some of the steps in the methods disclosed above, as well as the functional modules / units in the systems and devices, can be implemented as software, firmware, hardware, or suitable combinations thereof.
[0109] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0110] It should be understood that in this application, "at least one (item)" means one or more, and "more than" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one (item) of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one (item) of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.
[0111] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of the units described above is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.
[0112] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0113] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0114] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes multiple instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing programs, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0115] The preferred embodiments of the present application have been described above with reference to the accompanying drawings, but this does not limit the scope of the claims of the present application. Any modifications, equivalent substitutions, and improvements made by those skilled in the art without departing from the scope and substance of the embodiments of the present application shall be within the scope of the claims of the present application.
Claims
1. A method for partial volume correction of PET images based on depth image priors, characterized in that, The method includes the following steps: Acquire first target image data of a group of patients; wherein, the first target image data includes a first target PET image and a corresponding first target MR image; The first volume correction model is constructed based on depth image priors, and a partial volume correction function is constructed that includes backprojection fidelity terms, denoising regularization, kernel constraints, and MRI conditional constraints; wherein, the first volume correction model includes an image generation network and a kernel generation network; Based on the first target image data, the first partial volume correction model is pre-trained with population information in conjunction with the partial volume correction function to obtain the trained second partial volume correction model, and the encoder parameters of the image generation network in the second partial volume correction model are frozen. Acquire second target image data of the target patient; wherein, the second target image data includes a second target PET image and a corresponding second target MR image; Construct a third volume correction model with the same structure as the first volume correction model, and load the encoder parameters frozen in the second volume correction model into the third volume correction model; After the encoder parameters are loaded into the third partial volume correction model, the third partial volume correction model is individually fine-tuned based on the second target image data and combined with the partial volume correction function to obtain the trained fourth partial volume correction model. The output of the image generation network in the fourth partial volume correction model is used as the target partial volume correction image corresponding to the target patient.
2. The method according to claim 1, characterized in that, The acquisition of the first target image data of the group of patients includes: The first raw image data of the group of patients was acquired using PET and MRI equipment; wherein the first raw image data includes a first raw PET image and a corresponding first raw MR image; The first original image data is converted to obtain the first formatted image data. The first formatted image data is subjected to image size unification processing to obtain first size standardized image data; The first illegal value replacement process is performed on the first size standardized image data to obtain the first compliant image data; The first compliant image data is normalized to obtain the first normalized image data; The first normalized image data is subjected to a second illegal value replacement process to obtain the first target image data.
3. The method according to claim 1, characterized in that, The step of pre-training the first partial volume correction model based on the first target image data and in conjunction with the partial volume correction function to obtain a trained second partial volume correction model, and freezing the encoder parameters of the image generation network in the second partial volume correction model, includes: The first target MR image is input into the image generation network in the first partial volume correction model to generate a first predicted partial volume correction image. The initial Gaussian kernel is input into the kernel generation network in the first part of the volume correction model to generate the first prediction system point spread function; The first target PET image corresponding to the first target MR image is used as the model pre-training target. Based on the first predicted partial volume correction image and the model pre-training target, the first pre-training loss corresponding to the image generation network is calculated in combination with the partial volume correction function. The second pre-training loss corresponding to the kernel generation network is calculated based on the point spread function of the first prediction system and the partial volume correction function. The total pre-training loss of the group information is determined based on the first pre-training loss and the second pre-training loss. Based on the total loss of the pre-training of the group information, gradient backpropagation is performed on the first partial volume correction model to iteratively update the model pre-training parameters of the first partial volume correction model until the training requirements of the pre-training of the group information are met, thereby obtaining the trained second partial volume correction model, and freezing the encoder parameters of the image generation network in the second partial volume correction model.
4. The method according to claim 3, characterized in that, The image generation network includes an encoder and a decoder. The skip connections between the encoder and the decoder are implemented using a channel Transformer module. The step of inputting the first target MR image into the image generation network in the first partial volume correction model to generate a first predicted partial volume correction image includes: The first target MR image is input into the image generation network in the first partial volume correction model. The encoder in the image generation network performs layer-by-layer downsampling on the first target MR image to extract encoder feature maps at different scales. The encoder feature maps at different scales include several layers of shallow feature maps and one layer of deep feature map. The shallow feature maps of several layers are embedded into corresponding feature tokens through the skip connections, and the feature tokens are concatenated along the channel dimension to obtain a feature token set. The feature token set is fused using a channel attention cross-fusion module to obtain token fusion features; The token fusion feature is reconstructed using the decoder to obtain several reconstructed feature maps. The deep feature map is obtained by upsampling the deep feature map using the decoder; By using a channel-based cross-attention module, channel responses in each of the reconstructed feature maps that are not related to partial volume correction are suppressed based on the deep feature map of the decoder, so as to achieve feature fusion of the deep feature map of the decoder and each of the reconstructed feature maps, and obtain a fused feature map. The fused feature map is input into the output layer of the image generation network to generate the first predicted volume-corrected image.
5. The method according to claim 1, characterized in that, The process involves individually fine-tuning the third partial volume correction model based on the second target image data and the partial volume correction function to obtain a trained fourth partial volume correction model. The output of the image generation network in the fourth partial volume correction model is then used as the target partial volume correction image corresponding to the target patient. The second target MR image is input into the image generation network in the third volume correction model to generate a second predicted volume correction image. The initial Gaussian kernel is input into the kernel generation network in the third part of the volume correction model to generate the point spread function of the second prediction system. The second target PET image corresponding to the second target MR image is used as the model fine-tuning training target. Based on the second predicted partial volume correction image and the model fine-tuning training target, the first fine-tuning training loss corresponding to the decoder in the image generation network is calculated in combination with the partial volume correction function. The second fine-tuning training loss corresponding to the kernel generation network is calculated based on the point spread function of the second prediction system and the partial volume correction function. The total individual fine-tuning training loss is determined based on the first fine-tuning training loss and the second fine-tuning training loss. The decoder and the kernel generation network are backpropagated using gradients based on the total loss of individual fine-tuning training to iteratively update the model fine-tuning training parameters of the decoder and the kernel generation network until the training requirements of individual fine-tuning training are met, thereby obtaining the trained fourth volume correction model. The output of the image generation network in the fourth volume correction model is then used as the target volume correction image corresponding to the target patient.
6. The method according to claim 1, characterized in that, The method further includes: An early stopping strategy for model training is adopted. Based on the early stopping judgment index and the preset early stopping training threshold, it is determined whether to terminate the pre-training of the group information or the fine-tuning training of the individual. The early stopping judgment index is a quantitative evaluation index of image quality, which includes peak signal-to-noise ratio and structural similarity.
7. A PET image partial volume correction device based on depth image prior, characterized in that, The device includes the following modules: The first target image data acquisition module is used to acquire first target image data of a group of patients; wherein, the first target image data includes a first target PET image and a corresponding first target MR image; The model and loss function construction module is used to construct a first volume correction model based on depth image priors, and to construct a partial volume correction function that includes back projection fidelity terms, denoising regularization, kernel constraints, and MRI conditional constraints; wherein, the first volume correction model includes an image generation network and a kernel generation network; The group information pre-training module is used to pre-train the first partial volume correction model based on the first target image data and in combination with the partial volume correction function to obtain the trained second partial volume correction model, and freeze the encoder parameters of the image generation network in the second partial volume correction model. The second target image data acquisition module is used to acquire second target image data of the target patient; wherein, the second target image data includes a second target PET image and a corresponding second target MR image; The target patient model construction module is used to construct a third volume correction model with the same structure as the first volume correction model, and to load the encoder parameters frozen in the second volume correction model into the third volume correction model. The individual fine-tuning training module is used to perform individual fine-tuning training on the third partial volume correction model based on the second target image data and in conjunction with the partial volume correction function after the encoder parameters are loaded into the third partial volume correction model, to obtain the trained fourth partial volume correction model, and to use the output result of the image generation network in the fourth partial volume correction model as the target partial volume correction image corresponding to the target patient.
8. An electronic device, characterized in that, The electronic device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the method according to any one of claims 1 to 6.
9. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the method of any one of claims 1 to 6.
10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the method of any one of claims 1 to 6.