A method, system and apparatus for registration of cross-sectional CT images
By introducing anatomical constraints into cross-phase CT image registration, extracting anatomical prior information using a pre-trained organ segmentation model, and training the registration network model to learn the deformation field, the problem of anatomical structure distortion in existing methods is solved, achieving high-precision and high-reliability cross-phase CT image registration.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NAT UNIV OF DEFENSE TECH
- Filing Date
- 2026-03-23
- Publication Date
- 2026-06-19
AI Technical Summary
Existing cross-phase CT image registration methods struggle to ensure that the generated deformation field conforms to the physiological laws of human anatomy when dealing with drastic grayscale changes and complex physiological deformations caused by contrast agents. This can lead to organ structural distortion or boundary errors, affecting the reliability of accurate clinical analysis and diagnosis.
By acquiring multi-phase CT images of the same patient, a pre-trained organ segmentation model is used to extract the target organ segmentation mask as anatomical prior information, construct anatomical constraints, and train the registration network model to learn the deformation field, ensuring that the generated deformation field conforms to the anatomical and physiological laws of the human body.
It significantly improves the accuracy and structural preservation of cross-phase CT image registration, enhances the reliability of clinical applications, avoids organ structure distortion and boundary errors, and achieves end-to-end automated processing.
Smart Images

Figure CN122244115A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of medical image processing technology, and in particular to a registration method, system and device for cross-phase CT images. Background Technology
[0002] Multiphase enhanced CT (computed tomography) imaging can reveal the dynamic enhancement information of tissues by acquiring images at different times before and after contrast agent injection (such as plain scan and enhanced scan). However, achieving accurate alignment (i.e. registration) between images of different phases faces two inherent challenges: first, contrast agent perfusion causes drastic and nonlinear changes in the CT value (grayscale) of the same tissue between different phases; second, physiological activities cause complex non-rigid deformation of organs and tissues.
[0003] Deep learning-based registration methods offer a novel approach to this task by learning deformation mappings between images. However, existing methods still have significant limitations when dealing with cross-period CT registration: their optimization objectives are usually limited to the statistical similarity of image pixel intensity, lacking explicit constraints on the rationality of the anatomical structures themselves. Therefore, when handling dramatic changes in grayscale and complex deformations, the network may generate spatial transformations that violate anatomical common sense, leading to distorted organ structures or incorrect boundaries after registration, severely restricting its reliability for direct application in precise clinical analysis and diagnosis. Summary of the Invention
[0004] In view of this, this application provides a registration method, system and device for cross-phase CT images. The cross-phase CT image registration method, system and device provided by this application can ensure that the generated deformation field conforms to the physiological laws of human anatomy while dealing with drastic grayscale changes and complex physiological deformations caused by contrast agents, thereby significantly improving the accuracy, structural preservation and reliability of cross-phase CT image registration in clinical applications.
[0005] This application provides a registration method for cross-phase CT images, including: Acquire multi-phase CT images of the same patient, the multi-phase CT images including plain scan images and at least one contrast-enhanced scan image; Using a pre-trained organ segmentation model, the corresponding target organ segmentation mask is extracted from the plain scan image and the at least one contrast-enhanced image as anatomical prior information. Using the plain scan image as a fixed image and at least one contrast-enhanced image as a floating image, and utilizing the anatomical prior information to construct anatomical constraint terms, a registration network model is trained to learn the deformation field. Using a trained registration network model, new CT image sequences are registered, and the deformation field and registration results are output, aligning the contrast-enhanced images to the anatomical space of the plain scan images.
[0006] Optionally, the registration network model includes a shared encoder and at least one independent decoder. The step of using the plain scan image as a fixed image and at least one contrast-enhanced image as a floating image, and constructing anatomical constraints using the anatomical prior information, to train the registration network model to learn the deformation field includes: Each contrast enhancement period image and the plain scan period image are paired to form an image pair, which are then input into the shared encoder to obtain the corresponding encoded features. The encoded features are then input into the corresponding independent decoder to generate the deformation field from the contrast enhancement period image to the plain scan period image. The deformation field is used to perform spatial transformation on the contrast enhancement image and the corresponding anatomical prior information to obtain the transformation result. Based on the transformation result and the anatomical prior information, a loss function including anatomical constraint terms is calculated to optimize the registration network model.
[0007] Optionally, the at least one contrast enhancement phase image includes an arterial phase image and a venous phase image, and the at least one independent decoder includes a first decoder and a second decoder; The step of forming image pairs between each contrast-enhanced image and the plain scan image, inputting them into the shared encoder to obtain corresponding encoded features, and then inputting the encoded features into the corresponding independent decoder to generate a deformation field from the contrast-enhanced image to the plain scan image includes: The arterial phase image and the plain scan image are combined to form a first image pair, which is then input into the shared encoder to obtain a first encoded feature. The first encoded feature is then input into the first decoder to generate a first deformation field from the arterial phase image to the plain scan image. The venous phase image and the plain scan image are combined to form a second image pair, which is then input into the shared encoder to obtain a second encoded feature. The second encoded feature is then input into the second decoder to generate a second deformation field from the venous phase image to the plain scan image.
[0008] Optionally, before using the plain scan image as a fixed image, at least one of the contrast-enhanced images as floating images, and constructing anatomical constraint terms using the anatomical prior information to train the registration network model to learn the deformation field, the method further includes: The 3D volume data of the plain scan image and the contrast-enhanced image are decomposed into 2D slices along at least two preset anatomical directions; By using multi-directional expert path networks, feature extraction and dimensionality standardization are performed on 2D slices of corresponding anatomical directions to obtain standardized features.
[0009] Optionally, the step of decomposing the 3D volumetric data of the plain scan image and the contrast-enhanced image into 2D slices along at least two preset anatomical directions includes: 2D slices were extracted from the 3D volume data of the plain scan image and the contrast-enhanced image along three orthogonal anatomical directions: transverse, coronal, and sagittal.
[0010] Optionally, the anatomical constraints include at least two of the following: organ overlap loss, boundary consistency loss, shape preservation loss, and region smoothness loss.
[0011] Optionally, the loss function further includes an image similarity term, which includes at least two of the following: mutual information loss, local normalized cross-correlation loss, gradient similarity loss, modality-independent neighborhood descriptor loss, and normalized gradient field loss.
[0012] Optionally, during the training of the registration network model, an adaptive weight adjustment strategy is adopted to dynamically adjust the weights of at least one of the loss terms. The adaptive weight adjustment is based on at least one of the following factors: the current stage of the training process, the enhancement intensity information of the contrast agent in the contrast enhancement period image, and the statistics of each loss term in historical training batches.
[0013] This application also provides a registration system for cross-phase CT images, including: The acquisition module is used to acquire multi-phase CT images of the same patient, the multi-phase CT images including plain scan images and at least one contrast-enhanced scan image; The segmentation module is used to extract the corresponding target organ segmentation mask from the plain scan image and the at least one contrast enhancement image using a pre-trained organ segmentation model, as anatomical prior information. The training module is used to train the registration network model to learn the deformation field by using the plain scan image as a fixed image and at least one of the contrast enhancement images as floating images, and by constructing anatomical constraint terms using the anatomical prior information. The registration module is used to register new CT image sequences using a trained registration network model, and outputs the deformation field and registration results that align the contrast-enhanced images to the anatomical space of the plain scan images.
[0014] This application also provides an electronic device, including: a processor, a memory, and a communication bus; The communication bus is used to realize the connection and communication between the processor and the memory; The processor is configured to execute a registration processing program for interphase CT images stored in the memory to implement the steps of the interphase CT image registration method as described in any of the above claims.
[0015] Compared with existing technologies, this application provides a registration method, system, and device for multi-phase CT images. By acquiring multi-phase CT images of the same patient, including plain scan images and at least one contrast-enhanced image, a pre-trained organ segmentation model extracts corresponding target organ segmentation masks from the plain scan images and at least one contrast-enhanced image as anatomical prior information. The plain scan image is used as a fixed image, and the at least one contrast-enhanced image is used as a floating image. Anatomical constraints are constructed using the anatomical prior information, and a registration network model is trained to learn the deformation field. The trained registration network model is then used to register new CT images. This method performs image registration, outputting the deformation field and registration result that aligns the contrast-enhanced image to the anatomical space of the plain scan image. By constructing the organ-level anatomical prior information output by the pre-trained organ segmentation model as a clear anatomical constraint term and incorporating it into the training objective of the registration network model, this method effectively overcomes the problem of anatomical structure distortion caused by excessive reliance on pixel intensity similarity in existing methods. This method can ensure that the generated deformation field conforms to the physiological laws of human anatomy while handling drastic grayscale changes and complex physiological deformations caused by contrast agents, thereby significantly improving the accuracy, structural preservation, and reliability of cross-phase CT image registration in clinical applications. Attached Figure Description
[0016] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0017] Figure 1 This is a flowchart illustrating a method for registering cross-phase CT images provided in an embodiment of this application. Figure 2 This is a diagram illustrating the training phase architecture of the registration network model provided in the embodiments of this application. Figure 3 This is a flowchart illustrating the inference stage framework of the registration network model provided in this application embodiment; Figure 4 This is a schematic diagram of the structure of a registration system for cross-phase CT images provided in an embodiment of this application; Figure 5 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0018] To enable those skilled in the art to better understand the technical solutions in this application, the technical solutions in the embodiments of this application will be clearly and completely described below. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0019] It should be noted that when a component is referred to as "fixed to" or "set on" another component, it can be directly on or indirectly set on the other component; when a component is referred to as "connected to" another component, it can be directly connected to or indirectly connected to the other component.
[0020] It should be understood that the terms "length", "width", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are only for the convenience of describing this application and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on this application.
[0021] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this application, "a plurality of" or "several" means two or more, unless otherwise explicitly specified.
[0022] It should be noted that the structures, proportions, sizes, etc., shown in the accompanying drawings of this specification are only for the purpose of assisting those skilled in the art in understanding and reading the content disclosed in the specification, and are not intended to limit the conditions under which this application can be implemented. Therefore, they have no substantial technical significance. Any modifications to the structure, changes in the proportions, or adjustments to the size should still fall within the scope of the technical content disclosed in this application, provided that they do not affect the effects and purposes that this application can produce.
[0023] like Figure 1 As shown in the figure, this application provides a registration method for cross-phase CT images, including: S11. Acquire multi-phase CT images of the same patient, including plain scan images and at least one contrast-enhanced scan image; In this embodiment, multi-phase CT images are typically sourced from a hospital's image archiving and communication system, stored in standard DICOM format or converted to research-standard formats such as NIfTI. Each patient's image includes at least one plain scan phase and at least one contrast-enhanced phase, such as the arterial and / or venous phase. These images are acquired in a single examination at specific time points before and after contrast agent injection using the same or multiple compatible CT scanners to ensure they have a registerable physical basis. The acquired images are three-dimensional (3D) volumetric data, with spatial dimensions (e.g., 512 × 512 × number of slices) and voxel spacing determined by the original scanning protocol.
[0024] As a preferred implementation, after acquiring multi-phase CT images of the same patient, the method further includes: preprocessing the acquired multi-phase CT images to obtain preprocessed multi-phase CT images; wherein, preprocessing the acquired multi-phase CT images to obtain preprocessed multi-phase CT images includes: performing background removal and noise suppression on the acquired multi-phase CT images to obtain processed images; performing Z-axis depth standardization on the processed images to obtain depth-standardized images; and performing intensity normalization on the depth-standardized images to obtain normalized images. Specifically, the 5th percentile threshold method can be used first to remove background air regions, and morphological opening and closing operations can be performed to suppress noise. Then, the Z-axis depth of the original CT volume is unified to 256 slices. If the depth is insufficient, zero filling is used; if it exceeds the limit, a block strategy is adopted, and then Z-score standardization is performed on each volume. The specific formula is as follows: , in, This refers to the raw CT volume data (or raw image intensity). Normalized CT volume data (or normalized image); This represents the average intensity of all effective voxels in the CT volume of that phase. The standard deviation of the intensity of all effective voxels in the CT volume of that phase; As a smoothing constant, a very small positive number (e.g.) (), used to prevent the denominator from being zero and to ensure numerical stability.
[0025] S12. Using a pre-trained organ segmentation model, extract the corresponding target organ segmentation mask from plain scan images and at least one contrast enhancement image as anatomical prior information. In this embodiment, the pre-trained organ segmentation model can be a deep convolutional neural network (e.g., based on U-Net, nnU-Net, etc.), which has been trained on a large, high-quality abdominal CT dataset and can accurately segment multiple key abdominal organs such as the liver, spleen, and both kidneys. This model is independently applied to plain scan images and each contrast-enhanced image to generate corresponding pixel-level ternary or multi-value segmentation masks for each 3D image. These masks exist at the same spatial resolution as the original image, where the value of each voxel identifies its organ category or background. The core function of this step is to transform the anatomical structural information implicit in the image grayscale, which is highly relevant to the registration task, into explicit and quantifiable spatial prior knowledge (i.e., anatomical prior information). This prior information does not depend on the image grayscale itself, thus providing stable and reliable anatomical structural guidance for subsequent cross-phase registration, which is key to overcoming the grayscale non-correspondence problem.
[0026] S13. Using plain scan images as fixed images and at least one contrast enhancement image as a floating image, and using anatomical prior information to construct anatomical constraint terms, the registration network model is trained to learn the deformation field. In this embodiment, the plain scan image is set as a fixed target space (reference space), while the contrast enhancement image is a floating image that needs to be transformed and aligned. As a preferred implementation, the registration network model can adopt an architecture combining a shared feature encoder and multiple independent deformation field decoders (e.g., a dual-branch structure). Its working principle is as follows: the network uses the anatomical prior information (segmentation mask) extracted in the aforementioned steps as additional supervision signals, and is trained by constructing a comprehensive loss function. This loss function typically includes an anatomical constraint term based on the anatomical prior information and an image similarity term.
[0027] Specifically, anatomical constraints may include, but are not limited to, losses used to measure organ overlap and losses for constraint boundary consistency; image similarity terms may include, but are not limited to, mutual information loss and normalized cross-correlation loss, used to handle grayscale differences.
[0028] During training, the network optimizes the comprehensive loss function, learning not only to minimize the appearance differences between images, but also to maximize the consistency between organ masks after registration. This dual optimization mechanism enables the network to directly capture and match stable anatomical structure correspondences between images through the drastic grayscale changes caused by contrast agents, thereby learning a more accurate deformation field that conforms to physiological laws. This deformation field is a 3D vector field of the same size as the image, where each vector indicates the displacement of the corresponding voxel from the floating image space to the fixed image space.
[0029] like Figure 2As shown, the training process of the registration network model can be combined with... Figure 2 understand.
[0030] S14. Using the trained registration network model, register the new CT image sequence and output the deformation field and registration result of aligning the contrast enhancement phase image to the anatomical space of the plain scan image.
[0031] In this embodiment, after the model training is completed, it can be applied to new patient CT data. During the inference stage, only the plain scan and contrast-enhanced scan images of the new patient and the organ masks obtained by the same segmentation model need to be input. The trained network can then propagate forward and quickly (usually within a few seconds) predict the deformation field from the contrast-enhanced scan to the plain scan. Subsequently, the deformation field is applied to perform spatial transformation (resampling) on the contrast-enhanced scan image to generate a registered image that is precisely aligned with the plain scan image in anatomical space.
[0032] The technical advantages of this method are mainly reflected in the following aspects: First, by introducing and integrating anatomical constraints, the registration accuracy and robustness under strong grayscale differences are significantly improved, avoiding serious misregistration caused by the failure of similarity measurement in traditional methods; Second, the introduction of anatomical constraints effectively ensures the integrity of important organ structures and the physiological rationality of deformation after registration, preventing non-physical deformations such as organ tearing and excessive compression, and greatly improving the credibility and direct usability of the results in clinical diagnosis; Finally, this method achieves end-to-end automated processing, ensuring high accuracy and high reliability while possessing the efficiency required for clinical deployment.
[0033] During the inference phase, the registration process for new CT image sequences can be combined with... Figure 3 Understanding. For example... Figure 3 As shown, the system supports multiple registration modes and includes a complete process from multi-directional slice reasoning to 3D deformation field reconstruction.
[0034] Compared with existing technologies, this application provides a registration method, system, and device for multi-phase CT images. By acquiring multi-phase CT images of the same patient, including plain scan images and at least one contrast-enhanced image, a pre-trained organ segmentation model extracts corresponding target organ segmentation masks from the plain scan images and at least one contrast-enhanced image as anatomical prior information. The plain scan image is used as a fixed image, and the at least one contrast-enhanced image is used as a floating image. Anatomical constraints are constructed using the anatomical prior information, and a registration network model is trained to learn the deformation field. The trained registration network model is then used to register new CT images. This method performs image registration, outputting the deformation field and registration result that aligns the contrast-enhanced image to the anatomical space of the plain scan image. By constructing the organ-level anatomical prior information output by the pre-trained organ segmentation model as a clear anatomical constraint term and incorporating it into the training objective of the registration network model, this method effectively overcomes the problem of anatomical structure distortion caused by excessive reliance on pixel intensity similarity in existing methods. This method can ensure that the generated deformation field conforms to the physiological laws of human anatomy while handling drastic grayscale changes and complex physiological deformations caused by contrast agents, thereby significantly improving the accuracy, structural preservation, and reliability of cross-phase CT image registration in clinical applications.
[0035] In one implementation, the registration network model in this application includes a shared encoder and at least one independent decoder. Step S13 includes: S131. Form an image pair between each contrast enhancement period image and the plain scan period image, input the pair into the shared encoder to obtain the corresponding encoded features, and input the encoded features into the corresponding independent decoder to generate the deformation field from the contrast enhancement period image to the plain scan period image. In this embodiment, the shared encoder consists of a series of convolutional layers and downsampling layers. Its parameters are shared by all input image pairs during training. Its design aims to extract deep features from image pairs of the same anatomical site but at different phases. These features are insensitive to contrast agent grayscale changes and focus on characterizing the anatomical structure itself. These are phase-invariant anatomical coding features. Each contrast-enhanced image and the plain scan image are concatenated along the channel dimension to form a dual-channel input. The independent decoder receives the coding features output by the shared encoder and gradually restores the spatial resolution through upsampling layers and skip connections (such as fusing intermediate features at the corresponding scale of the encoder). Finally, it outputs a two-channel 2D displacement field (or a three-channel 3D displacement field for 3D networks) through a convolutional layer. This displacement field defines the displacement vector of each pixel (or voxel) from the contrast-enhanced image space to the plain scan image space.
[0036] The advantage of adopting the "shared encoder + independent decoder" architecture is that the shared encoder forces the network to learn a general anatomical representation across phases, while the independent decoder allows the network to perform refined and differentiated deformation field prediction for specific grayscale distributions and deformation patterns in different enhancement phases such as the arterial phase and venous phase, thereby achieving a balance between general feature learning and specific task adaptation.
[0037] S132. Perform spatial transformation on the contrast enhancement image and the corresponding anatomical prior information based on the deformation field to obtain the transformation result. Based on the transformation result and the anatomical prior information, calculate the loss function including the anatomical constraint term to optimize the registration network model.
[0038] In this embodiment, the spatial transformation is implemented through a differentiable spatial transformer module. This module first calculates the (potentially non-integer) sampling coordinates in the source image for each target location in the floating image (contrast enhancement image) based on the deformation field predicted by the network. Then, it obtains the pixel value at that location through bilinear interpolation, thereby generating a transformed image spatially aligned with the fixed image (plain scan image). Crucially, to calculate the anatomical constraint loss, the exact same deformation field and spatial transformation operation are simultaneously applied to the corresponding organ segmentation mask to generate a registered mask, thus ensuring mathematical consistency between image deformation and anatomical structure deformation.
[0039] The loss function is crucial for driving network learning. It's a weighted sum of multiple objectives, primarily including image similarity terms and anatomical constraint terms. Image similarity terms (e.g., mutual information, local normalized cross-correlation) are responsible for establishing meaningful pixel-level correspondences even under dramatic grayscale changes. Anatomical constraint terms are calculated directly using the transformed mask and a fixed flat scan mask, aiming to transform prior anatomical knowledge into numerical optimization objectives. Common anatomical constraint terms may include: Organ overlap loss (e.g., Dice loss): Maximize the overlap area between the registered mask and the target mask; Boundary consistency loss: constrains the alignment of organ contours after registration; Shape preservation loss: Limiting the centroid shift and volume change of an organ to within a reasonable range; Loss of regional smoothness: smoothness of the forced deformation field within an organ.
[0040] To balance the importance of different loss terms at different training stages, this embodiment can also introduce an adaptive weight adjustment strategy. For example, in the early stages of training, image similarity loss can be given a higher weight to stabilize convergence, and then the weight of anatomical constraint terms can be gradually increased to refine structural alignment; or the weight of organ-related constraint terms can be dynamically adjusted according to the enhancement degree of contrast agent in a specific organ. The network optimizes this comprehensive loss through the backpropagation algorithm, and finally learns a high-precision deformation field that can align image grayscale while maintaining anatomical accuracy.
[0041] In one implementation, in this embodiment of the application, at least one contrast-enhanced image includes an arterial phase image and a venous phase image, and at least one independent decoder includes a first decoder and a second decoder; wherein, each contrast-enhanced image and a plain scan image are paired to form an image pair, input into a shared encoder to obtain corresponding encoded features, and the encoded features are input into the corresponding independent decoder to generate a deformation field from the contrast-enhanced image to the plain scan image, including: S1311. Form a first image pair by combining the arterial phase image and the plain scan image, input the pair into a shared encoder to obtain a first encoded feature, and input the first encoded feature into a first decoder to generate a first deformation field from the arterial phase image to the plain scan image; In this embodiment, the first image pair uses the arterial phase image as the floating image and the plain scan image as the fixed image. The shared encoder receives the image pair as input, and its convolutional layers abstract and extract features step by step. The key to this process is that, since the encoder parameters are shared and the plain scan image appears as a fixed reference in all image pairs, the encoder is forced to learn how to peel off the specific enhancement pattern brought about by the arterial phase perfusion of the contrast agent from the arterial phase image, and then extract the feature representation that is common to the plain scan image and reflects the anatomical structure itself, which is the first encoded feature.
[0042] The first decoder is a separate network branch specifically optimized for the arterial phase to plain scan registration task. The first decoder receives the first encoded features and learns to map general anatomical features onto the spatial deformation patterns specific to the arterial phase image through a series of upsampling and possible skip connection operations (e.g., mainly reflecting the early filling and deformation of areas with rich arterial blood supply). Finally, it outputs the first deformation field, which accurately describes the displacement required to map each pixel (or voxel) in the arterial phase image to the plain scan image space.
[0043] S1312. Form a second image pair by combining the venous phase image and the plain scan image, input the pair into the shared encoder to obtain the second coding feature, and input the second coding feature into the second decoder to generate the second deformation field from the venous phase image to the plain scan image.
[0044] In this embodiment, the second image pair uses the venous phase image as the floating image and the plain scan image as the fixed image. The shared encoder processes the second image pair with the exact same structure and parameters as the first image pair. This allows the encoder to extract anatomically consistent feature representations from the venous phase image that are consistent with both the plain and arterial phase images, i.e., the second encoded features, from the venous phase image using a consistent perspective and standard. The second decoder is a separate branch from the first decoder. Its structure may be the same as the first decoder, but its parameters are trained independently. The second decoder specifically learns the deformation mapping from the venous phase to the plain scan phase, focusing on the grayscale distribution and organ morphological changes (such as portal venous system enhancement and homogeneous parenchymal enhancement) in the venous phase image caused by venous contrast agent return and tissue homogeneous enhancement, which differ from those in the arterial phase. By processing the second encoded features, the second decoder outputs the second deformation field.
[0045] The core advantage of the dual-branch design is that, based on the unified anatomical features provided by the shared encoder, the two decoders learn the phase-specific deformations of their respective phases in parallel and in a targeted manner, thereby achieving high-precision and differentiated registration of arterial and venous phase images to the plain scan reference space, and unifying all phases to the same coordinate system.
[0046] As one implementation method, in this embodiment of the application, before using the plain scan image as the fixed image, at least one contrast-enhanced image as the floating image, and constructing anatomical constraint terms using anatomical prior information to train the registration network model to learn the deformation field, the method further includes: S21. Decompose the 3D volume data of the plain scan image and the contrast enhancement image into 2D slices along at least two preset anatomical directions. In this embodiment, at least two preset anatomical directions typically include the transverse (axial) plane, the coronal plane, and the sagittal plane. Decomposing the 3D volume data into 2D slices along these orthogonal directions is a strategy to reduce the dimensionality of the high-dimensional 3D registration problem into multiple parallel 2D registration subproblems. Its working principle lies in the anisotropy of human anatomical structures in different directions. For example, the cross-sectional outline of an organ can be clearly observed in the transverse plane, while the longitudinal extension and adjacent relationships of organs are easier to observe in the coronal and sagittal planes. Through multi-directional decomposition, the network can learn the consistent correspondence of anatomical structures from different geometric perspectives, thereby capturing three-dimensional deformation more comprehensively. In specific operation, all 2D slices are extracted from the original 3D volume slice by slice along the three coordinate axes. For example, for a volume of size [H, W, D], D slices of size [H, W] can be obtained along the transverse plane (Z-axis); H slices of size [W, D] can be obtained along the coronal plane (X-axis); and W slices of size [H, D] can be obtained along the sagittal plane (Y-axis).
[0047] The advantage of this step is that it significantly reduces the computational complexity from O(N³) to O(3N²), making it feasible to train deep networks on large-scale 3D medical images. At the same time, it enhances the robustness and accuracy of registration through multi-directional information complementarity.
[0048] S22. Through a multi-directional expert path network, feature extraction and dimensionality standardization are performed on 2D slices of corresponding anatomical directions to obtain standardized features.
[0049] In this embodiment, the orientation-specific expert path network is a set of parallel, lightweight convolutional sub-networks. Each sub-network is dedicated to processing 2D slices from a specific anatomical orientation. The design principle is that since 2D slices extracted from different orientations have different original sizes and aspect ratios (e.g., coronal and sagittal slices are typically rectangular), directly inputting them into the subsequent shared encoder would cause geometric inconsistencies. Therefore, each expert path network is designed with two main functions: first, to extract the primary orientation-sensitive features of the slice in that orientation; and second, to unify the feature maps of slices from all orientations to the same standard spatial size through specific downsampling or spatial transformation operations. In specific implementation, for the X / Z direction path, two 3×3 convolution layers can be used with strides of 1 and 2 respectively to downsample and output a 256×256 feature map; for the Y direction path, an asymmetric stride (2,1) convolution is used, downsampling is only performed in the height direction while the width direction remains unchanged, and the output is adjusted to 256×256. All path outputs are unified into 64-channel feature maps. By keeping the output feature maps of all expert paths consistent in spatial dimension and number of channels, they can be spliced or used as a unified format input for subsequent shared encoders.
[0050] The core effect of this step is that, while preserving the anatomical specificity of each direction, it achieves the early geometric standardization of the data, providing a regular feature representation rich in complementary directional information for the subsequent shared encoder, thereby improving the network's capacity and efficiency in 3D deformation modeling.
[0051] As one implementation method, in this embodiment of the application, the training of the registration network model to learn the deformation field is performed on a 2D slice basis.
[0052] As one implementation method, in this embodiment of the application, the 3D volumetric data of the plain scan image and the contrast-enhanced image are decomposed into 2D slices along at least two preset anatomical directions, including: 2D slices were extracted from the 3D volume data of plain scan images and contrast-enhanced images along three orthogonal anatomical directions: transverse, coronal, and sagittal.
[0053] In this embodiment, the selection of three orthogonal anatomical directions—transverse, coronal, and sagittal—for decomposition is the optimal strategy based on human anatomy and medical imaging diagnostic practices. Its technical principles and advantages are as follows: 1. Complementarity of Anatomical Information: The three directions constitute a complete orthogonal basis for describing three-dimensional space. The transverse section (usually parallel to the examination table) is the most commonly used view in clinical image reading, which can clearly show the axial cross-sectional morphology, symmetry and lateral spread of lesions of organs; the coronal plane (from anterior to posterior) and sagittal plane (from left to right) can intuitively reflect the longitudinal extension of organs and the adjacent relationships of superior, inferior and anterior layers. By extracting slices from these three dimensions at the same time, the network can obtain the most comprehensive and non-redundant two-dimensional projection information of anatomical structures, laying a solid foundation for subsequent learning of complex three-dimensional spatial correspondences; 2. Algorithm Efficiency and Feasibility: The three-dimensional (3D) registration problem is decomposed into multiple two-dimensional (2D) registration subproblems in these three directions, which fundamentally reduces the computational complexity. The computational complexity of directly processing a 3D volume of size N×N×N is about O(N³). After decomposition, it is transformed into processing about 3N 2D slices (about N slices in each direction). If the complexity of the 2D network processing each slice is O(N²), the total complexity is reduced to about O(3N²). This makes it possible to train deeper and more complex networks to process high-resolution 3D medical images with limited computing resources, which is the key design for the practical application of the method. 3. Enhanced robustness of deformation modeling: Real physiological deformations (such as respiration and peristalsis) are anisotropic in three-dimensional space. Single-direction 2D processing will lose the deformation component perpendicular to the plane, while direct 3D overall processing has extremely high requirements for data volume and computation. Through three-directional decomposition and subsequent independent feature extraction and fusion, the network can model and constrain deformation from three orthogonal perspectives respectively. Finally, when reconstructing the 3D deformation field, it can more robustly integrate multi-directional evidence, reduce the error caused by the uncertainty of a single perspective, and thus generate a smoother deformation field that is more in line with physical laws.
[0054] In practice, the 3D volume data of the flat scan period and each contrast enhancement period are independently traversed along the three coordinate axes to extract all 2D slices, forming three independent slice sequences. These slice sequences will be sent to the corresponding expert path networks for processing, thereby starting a multi-directional parallel feature extraction process.
[0055] In one embodiment of this application, the anatomical constraints include at least two of the following: organ overlap loss, boundary consistency loss, shape preservation loss, and region smoothness loss.
[0056] In this embodiment, the anatomical constraint term is a combination of loss terms used to quantify prior anatomical knowledge into an optimizable objective function. Its core purpose is to explicitly guide and force the network to produce deformations that conform to anatomical common sense, beyond pixel-level alignment. By introducing these constraints, the network must simultaneously satisfy the physical and geometric rationality of the anatomical structure during the learning process, thereby effectively preventing non-physiological deformations such as organ tearing, excessive compression, or topological destruction.
[0057] The specific design and function of each loss in the anatomical constraint terms are as follows: 1. Organ Overlap Loss: This loss function measures the degree of matching between the deformed organ mask and the target (plain scan) mask by calculating the area of the overlapping region. Its value is between 0 and 1; the more complete the overlap, the lower the loss value. The core function of organ overlap loss is to drive the deformation field to move the organ in the floating image to a position that overlaps with the corresponding organ in the fixed image as much as possible, which is the basis for ensuring macroscopic anatomical alignment.
[0058] 2. Boundary Consistency Loss: This loss extracts the contour of the organ mask using edge detection operators such as Sobel and calculates the difference (e.g., mean squared error) between the deformed contour and the target contour. The core function of boundary consistency loss is to refine the alignment of organ boundaries. Since organ boundaries are key to distinguishing different tissues and defining anatomical extent, this loss ensures that the registered organ contours are clear and accurate, avoiding blurred or misaligned boundaries.
[0059] 3. Shape Preservation Loss: This loss typically constrains organ deformation based on global geometric properties. Common measures include centroid shift loss and area / volume change loss. Centroid shift loss penalizes the distance between the organ's center point before and after deformation, preventing unreasonable overall translation of the organ. Area change loss constrains the expansion or contraction ratio of the organ during deformation, avoiding non-physiological drastic size changes. The core function of shape preservation loss is to maintain the organ's basic geometric properties and relative spatial position, ensuring the overall coordination of deformation.
[0060] 4. Regional Smoothness Loss: This loss is specifically calculated for the deformation field within the organ's internal mask region. By constraining the amplitude of the gradient (such as the first or second derivative) of the deformation field within the organ, it forces the deformation within the organ to be continuous and smooth, rather than abrupt or chaotic. The core function of regional smoothness loss is to ensure the deformation consistency of the organ as a continuous whole, preventing unnatural wrinkles or discontinuities from forming inside, which conforms to the mechanical properties of real biological tissues.
[0061] The overall effect is that these anatomical constraints do not work in isolation, but rather function collaboratively as a multi-faceted, multi-layered constraint system. For example, overlap loss ensures the location of organs, boundary loss refines their edges, shape loss constrains their overall shape, and smoothness loss ensures uniform internal deformation. By incorporating these terms into the overall loss function in a weighted sum manner, the network receives direct optimization signals from the anatomical structure itself during backpropagation. This allows it to learn to maintain high-precision alignment while maximizing respect for anatomical rationality in the deformation field. This fundamentally solves the problem of anatomical distortion caused by the lack of such constraints in existing methods, as pointed out in the background section.
[0062] As a specific implementation method, the specific calculation formulas and parameters for each anatomical constraint loss are explained below: 1. Loss of organ overlap The specific formula for organ overlap loss is as follows: , in, This represents the organ overlap loss value. K is the total number of pixels (or voxels) in the image; K is the pixel index, from 1 to... ; This is the value of the organ mask in the arterial or venous phase after deformation field transformation at the Kth pixel. The value of the plain scan organ mask used for registration is at the Kth pixel. This is the smoothing constant.
[0063] 2. Boundary consistency loss The specific formula for boundary consistency loss is as follows: , in, This represents the boundary consistency loss value. K represents the total number of pixels in the entire pixel (or voxel) spatial domain of the image; K is the pixel index, from 1 to... ; This is the value of the organ mask in the arterial or venous phase after deformation field transformation at the Kth pixel (or the value of the deformed mask at the Kth pixel). The value of the organ mask (or the value of the target mask) at the Kth pixel during plain scan, which is used as the registration target. This represents the boundary strength value of the deformed mask at point K. The boundary strength value of the target mask at point K; Let K be the square of the difference in strength between the deformed mask boundary and the target mask boundary at pixel K; summing the squares of this difference over all pixels and averaging them gives the boundary consistency loss in the form of mean square error (MSE).
[0064] 3. Loss of shape retention The shape preservation loss is a weighted sum of the centroid shift loss and the area change loss, as shown in the following formula: 3.1 Formula for calculating centroid coordinates For the Individual organs, target mask centroid coordinates: , , in, Let be the centroid coordinates of the k-th organ target mask; The x-coordinate of the centroid of the target mask; Let be the ordinate of the centroid of the target mask; Ω represents the set of pixel coordinates of the k-th organ target mask; Ω represents the entire pixel (or voxel) spatial domain of the image.
[0065] The centroid coordinates of the deformed mask: ; in, Let be the centroid coordinates of the mask after the k-th organ is deformed; The x-coordinate of the centroid of the deformed mask; The ordinate of the centroid of the deformed mask; In addition, the formulas for calculating the x-coordinate and y-coordinate of the deformed mask centroid adopt the same structure as those for calculating the x-coordinate and y-coordinate of the target mask centroid, but are calculated based on the coordinate data of the deformed mask.
[0066] 3.2 Formula for Centroid Loss The Euclidean distance loss between the centroid of the mask and the centroid of the target mask after deformation is: , in, For centroid loss; Let be the centroid coordinates of the k-th organ target mask; Let be the centroid coordinates of the mask after the k-th organ is deformed.
[0067] 3.3 Formula for calculating organ area The area of the target mask for the kth organ is the sum of the pixel values within the target mask, as shown in the following formula: , in, Let be the area of the mask for the k-th organ target; Let be the set of pixel coordinates of the k-th organ target mask; In addition, the formula for calculating the area of the deformed mask adopts the same structure as the formula for calculating the area of the target mask, but is calculated based on the area-related data of the deformed mask.
[0068] 3.4 Area Loss Formula Loss due to the relative difference between the deformed area and the target area: , in, For area loss; The area of the target mask; This represents the area of the mask after deformation.
[0069] 3.5 Formula for Total Shape Retention Loss Combine centroid loss and area loss (area loss weighted at 0.5): , in, For shape retention loss; For centroid loss; This represents area loss.
[0070] 4. Regional smoothness loss The specific formula for the regional smoothness loss is as follows: , in, This represents the regional smoothness loss value. The total number of pixels within the region; Let Laplace be the deformation field at pixel k; Let L2 norm be the square of the Laplacian vector of the deformation field at pixel k; This serves as an organ mask during plain scans, used to define the spatial extent for calculating loss. Only pixel locations with a value of 1 (i.e., regions inside organs) are included in the calculation of smoothness loss; This is the smoothing constant.
[0071] As one implementation method, in this embodiment of the application, the loss function further includes deformation field smoothness loss, which is used to constrain the spatial smoothness of the entire displacement field, ensuring that the deformation field is continuous in the global range and avoiding non-physical drastic deformation or folding. It is achieved by penalizing the first-order gradient (rate of change) and second-order gradient (curvature) of the deformation field.
[0072] As a specific implementation method, the specific formula for the deformation field smoothness loss is as follows: , in, The value represents the smoothness loss of the deformation field; mean is the arithmetic mean function. The deformation field (displacement field) predicted by the network. It is the first-order spatial gradient of the deformation field (gradient field). The square of its L2 norm is used to penalize abrupt shifts in displacement between adjacent voxels; This is the second-order spatial gradient of the deformation field (Laplace field). The square of its L2 norm is used to penalize excessive bending or curvature changes in the deformation field; The weight coefficients for the second-order smoothness constraint are hyperparameters used in training.
[0073] In one embodiment of this application, the loss function further includes an image similarity term, which includes at least two of the following: mutual information loss, local normalized cross-correlation loss, gradient similarity loss, modality-independent neighborhood descriptor loss, and normalized gradient field loss.
[0074] In this embodiment, the image similarity term is a set of loss functions specifically designed to establish effective pixel correspondences between images across different time periods where grayscale distributions undergo drastic and nonlinear changes. Unlike the anatomical constraint term, which focuses on macroscopic structure, the image similarity term emphasizes measuring the underlying pattern similarity between image pairs from different mathematical perspectives, and is the fundamental driving force for the network to achieve pixel-level precise alignment.
[0075] The design principles and complementary effects of the various loss terms in the image similarity term are as follows: 1. Mutual Information Loss: Mutual information measures the strength of the statistical dependency between two images, rather than a direct correspondence of gray values. It works by calculating the entropy of the joint gray-level histogram of the two images. The core advantage of mutual information loss is that it does not assume linearity or monotonicity in the gray-level mapping relationship. Even if a bright blood vessel in the arterial phase appears as a shadow in the plain scan phase, as long as the two co-occur stably in space, mutual information can capture this association. This makes it a fundamental tool for handling the global gray-level distribution reshaping caused by contrast agents.
[0076] 2. Locally Normalized Cross-Correlation Loss: This loss calculates the normalized cross-correlation coefficient between two images within a predefined local window (e.g., 9×9). Local computation makes it robust to non-uniform brightness variations (e.g., local contrast differences). The core function of the locally normalized cross-correlation loss is to promote the alignment of local textures and patterns in the image, effectively utilizing the structural consistency information of tissue regions that are unaffected or minimally affected by contrast agents.
[0077] 3. Gradient Similarity Loss: This loss compares the similarity of the gradient fields of two images, including gradient magnitude and direction. Image gradients mainly capture edge and contour information, which is more stable relative to gray values across different phases. The core function of gradient similarity loss is to enhance the alignment of edge structures, ensuring that key anatomical landmarks such as organ boundaries and tissue interfaces accurately coincide after registration.
[0078] 4. Modality Independent Neighborhood Descriptor Loss: Modality Independent Neighborhood Descriptors are descriptors based on local image self-similarity. These descriptors are insensitive to absolute grayscale values, instead describing the relative relationship patterns between each pixel and its surrounding neighborhood. Their core advantage lies in the fact that this descriptor is an intrinsic, modality-invariant feature. Regardless of whether the region appears bright or dark in another phase, as long as their local texture patterns are similar, the descriptors are similar. This provides an extremely robust metric for establishing correspondences in regions with dramatic grayscale changes.
[0079] 5. Normalized Gradient Field Loss: Normalized gradient field loss focuses on the orientation alignment of image gradient vectors. By normalizing the gradient vectors to unit vectors, it completely ignores the gradient magnitude (i.e., edge strength) and only focuses on the edge orientation. Its core function is to ensure that even if the edge contrast is significantly changed by the contrast agent, the edge orientation and interface remain consistent after registration.
[0080] Collaborative Working Mechanism: These loss terms, from multiple complementary dimensions such as global statistical dependence (mutual information), local intensity correlation, edge structure (gradient), and intrinsic geometric patterns (modal independent neighborhood descriptors, normalized gradient fields), jointly construct a robust image similarity measurement system. During training, the various loss terms work collaboratively, enabling the network to find the correct alignment cues even in regions with completely chaotic gray-level correspondences through one or more metrics. Combined with anatomical constraint terms, this forms a dual optimization paradigm driven by pixel appearance similarity and constrained by high-level anatomical rules. Ultimately, this allows the network to penetrate the noise of gray-level variations and accurately reconstruct the stable anatomical spatial correspondences hidden behind the image, thereby learning a high-precision deformation field.
[0081] As a specific implementation method, the specific calculation formulas and parameters for each anatomical constraint loss are explained below: 1. Mutual information loss: The specific formula for mutual information loss is as follows: , in, The mutual information loss value; To compare the enhanced phase images (arterial or venous phase images) after deformation field transformation; Images from the plain scan period; The entropy of the transformed image; The entropy of the image during the plain scan period; This is the joint entropy of the transformed image and the flat scan image.
[0082] 2. Locally normalized cross-correlation loss: The specific formula for the locally normalized cross-correlation loss is as follows: , in, is the locally normalized cross-correlation loss value; N is the total number of pixels in the image domain; Let be the gray value of the transformed image at point i; is the average gray value of the transformed image within a local neighborhood window centered at pixel w; Let be the gray value of the image at point i during the flat scan period; This represents the average gray value of the image during the flat scan period within a local neighborhood window centered at pixel w. It is the sum of squares of the deviations of all pixel grayscale values of the deformed image from the average grayscale value within the window; It is the sum of squares of the deviations between the gray values of all pixels in the image during the flat scan period and the average gray value within the window; This is the smoothing constant.
[0083] 3. Gradient similarity loss: The specific formula for gradient similarity loss is as follows: , in, The gradient similarity loss value is used; MSE is the mean squared error function. The gradient vector of the deformed image; This represents the gradient vector of the image during the plain scan phase. These are the weighting coefficients; mean is the arithmetic mean function. The angle of the gradient vector of the deformed image; The angle of the gradient vector of the image during the flat scan period.
[0084] 4. Modality Independent Neighborhood Descriptor Loss: The relevant formula for the modality-independent neighborhood descriptor loss is as follows: 4.1 For each pixel position in image I Calculate the squared difference of pixel intensity between it and the offset position x+r within the local neighborhood R: , in, Image I (specifically, the deformed image) The squared difference in pixel intensity between pixel x and offset x+r; r is the spatial offset within the neighborhood search radius, belonging to set R; For The image domain centered on; This represents the total number of pixels within the image domain. The pixel intensity at pixel x in the deformed image; denoted as the pixel intensity at pixel x+r in the deformed image.
[0085] 4.2 Calculate the local pixel variance as a normalization factor: , in, Image after deformation The local variance at pixel x; R is a predefined set of local neighborhood offsets (i.e., a set of relative displacement vectors r). Image after deformation The squared difference in pixel intensity at pixel x and at x+r.
[0086] 4.3 Generate modally independent neighborhood descriptors: , in, Image after deformation The modally independent neighborhood descriptor value at pixel x with respect to offset r; exp is an exponential function; Image after deformation The squared difference in pixel intensity between pixel x and x+r; image Local variance at pixel x.
[0087] Modality-independent neighborhood descriptor loss is defined as the L1 distance between descriptors in the pre-scanned image and the post-deformed image: , in, The loss value is the modality-independent neighborhood descriptor. This represents the modal-independent neighborhood descriptor value at pixel x with respect to offset r in the flat scan image. Image after deformation The modal-independent neighborhood descriptor value at pixel x with respect to offset r.
[0088] 5. Normalized gradient field loss: The specific formula for the normalized gradient field loss is as follows: , in, Here, represents the normalized gradient field loss value; Mean is the arithmetic mean function. The gradient vector of the deformed image; This represents the gradient vector of the image during the plain scan phase. It is a constant.
[0089] As one implementation method, in this embodiment of the application, the loss function is a weighted combination of image similarity loss, anatomical constraint loss and deformation field smoothness loss.
[0090] Specifically, for arterial phase registration branches, the loss function is defined as: , in, This represents the arterial phase branch loss value. The image similarity loss is a weighted combination of at least two of the following: mutual information loss, local normalized cross-correlation loss, gradient similarity loss, modality-independent neighborhood descriptor loss, and normalized gradient field loss. Anatomical constraint loss is a weighted combination of organ overlap loss, boundary consistency loss, shape preservation loss, and regional smoothness loss. This represents the loss of smoothness in the deformation field; , , , which are the weight coefficients for image similarity loss, anatomical constraint loss, and deformation field smoothness loss, respectively.
[0091] The loss function for the venous phase registration branch adopts the same structure as that for the arterial phase branch, but is calculated based on venous phase image data.
[0092] During training, the overall training loss is defined as the average of the arterial phase branch loss and the venous phase branch loss: , This represents the overall training loss value. This represents the arterial phase branch loss value. This represents the branch loss value during the venous phase.
[0093] This averaging strategy achieves joint optimization of the two branches, ensuring that the network simultaneously learns high-precision registration mappings from the arterial and venous phases to the plain scan phase, and uniformly aligns all phases to the plain scan reference space.
[0094] As one implementation method, in the embodiments of this application, during the training of the registration network model, an adaptive weight adjustment strategy is adopted to dynamically adjust the weight of at least one loss term. The adaptive weight adjustment is based on at least one of the following factors: the current stage of the training process, the enhancement intensity information of the contrast agent in the contrast enhancement period image, and the statistics of each loss term in historical training batches.
[0095] In this embodiment, the adaptive weight adjustment strategy is a dynamic and intelligent loss function balancing mechanism designed to address optimization conflicts and instability issues arising from the different dimensions, convergence speeds, and varying importance of different loss terms during training in multi-task learning (i.e., simultaneously optimizing image similarity and multiple anatomical constraints). By sensing the training state in real time and adjusting accordingly, this strategy can significantly improve training efficiency, stability, and the performance of the final model.
[0096] The working principles and implementation methods of each adjustment factor are as follows: 1. Adjustment based on the current stage of the training process: This strategy follows the concept of phased learning. In the early stage of training, the network parameters are randomly initialized, and the understanding of image content and anatomical structure is very weak. At this time, by assigning relatively high initial weights to image similarity loss (such as mutual information) and reducing the weights of anatomical constraint loss, the network prioritizes learning the most basic and approximate spatial correspondences between images across different periods, quickly entering a better search region and ensuring a smooth start to training. As the number of training rounds increases, the network's adaptability to grayscale changes increases, and the weights of anatomical constraint terms (such as Dice loss and boundary loss) are gradually increased linearly or according to a predetermined curve. This allows the network to gradually strengthen the fine alignment of anatomical details on the basis of the established coarse alignment, and finally achieve synergistic optimization of macroscopic and microscopic registration accuracy.
[0097] 2. Adjustment based on contrast agent enhancement intensity information in contrast-enhanced images: Different anatomical structures and different regions of the same structure exhibit significant differences in enhancement intensity during contrast-enhanced phases (e.g., significant enhancement of the renal cortex during the arterial phase, while weaker enhancement of the medulla). This biological difference implies that areas with significant enhancement show dramatic grayscale changes, making registration potentially more difficult and important. This strategy calculates the voxel-level difference map between contrast-enhanced and plain scan images and normalizes it to obtain a spatial weight map. When calculating image similarity loss, this weight map is used to spatially weight the loss. For example, for areas with significant enhancement (such as blood vessels and highly vascularized tumors), a higher weight is assigned to their similarity loss, forcing the network to invest more attention in these key and information-rich regions, achieving targeted, physiologically relevant, and refined registration.
[0098] 3. Adjustment based on historical statistics of each loss term: During training, different loss terms often have different magnitudes and descent rates. Loss terms with excessively large magnitudes or excessively rapid descents can easily dominate the gradient direction, inhibiting the optimization of other loss terms and causing the model to fall into suboptimal solutions. This strategy maintains a sliding window to record the average value of each loss term in several recent training batches. Every certain number of iterations, these historical statistics are analyzed. If a certain loss term is found to be consistently too high relative to other terms, its weight is automatically reduced; conversely, it is appropriately increased. This dynamic balancing mechanism ensures that all optimization objectives are considered in a balanced manner throughout the training process, promoting a more comprehensive and robust convergence of the model.
[0099] Synergistic Effect and Final Outcome: The adjustments in the above three dimensions are not carried out in isolation, but rather work synergistically to form a closed-loop intelligent optimization system. For example, in the early stages of training, the anatomical weight is relatively low; as training progresses, its weight gradually increases based on training progress factors; at the same time, during the process of increasing the anatomical weight, the constraints of different organs within it may be differentially weighted according to the enhancement intensity information; and throughout the entire process, historical statistical factors are always fine-tuning the proportion of various weights behind the scenes to maintain the balance of the optimization path.
[0100] The technical effect of this adaptive weight adjustment strategy is fundamental: it replaces the tedious and inefficient process of manually setting fixed weights through extensive trial and error in traditional methods, and achieves automated and rational configuration of loss function weights. This not only significantly reduces the barrier to entry and parameter tuning costs, but more importantly, it guides the network along a smoother and more efficient optimization trajectory, eventually converging to a better equilibrium point. At this point, the model can achieve high-precision pixel-level alignment while strictly adhering to the physiological laws of anatomical structures, thereby improving the robustness, generalization ability, and clinical usability of the registration method as a whole.
[0101] As a preferred and systematic implementation, this application embodiment constructs a three-layer adaptive weight system driven by the above three dimensions, and its design and working mechanism will be described in detail below.
[0102] First layer: Temporal weight adjustment based on training progress The weights in this layer are specifically used to control the strength of the anatomical constraint loss. Following the course learning philosophy, it mainly relies on image similarity alignment in the early stages of training, and gradually strengthens anatomical constraints in later stages. Specifically, the weights of the anatomical constraint loss change dynamically with each training epoch, as shown in the following formula: , in, represents the weights of the anatomical constraint loss (related to training progress); e represents the current training round. Initial anatomical weights; The preheating process ends the dissection weight; This refers to the number of rounds in the preheating stage (e.g., 20). This represents the total number of training rounds (e.g., 200).
[0103] This function causes the anatomical constraint weight to start from an initial value of 0.1, slowly increase to 0.15 during the warm-up phase, and then linearly increase to a final value of 0.8 during the reinforcement phase.
[0104] Second layer: Spatial weighting adjustment based on contrast agent enhancement intensity This layer's weights are finely adjusted in the spatial domain to address image similarity loss, giving higher registration priority to regions significantly enhanced by the contrast agent. First, the contrast agent enhancement map is calculated using the following formula: , in, Contrast enhancement image; To contrast the enhanced images; This is a plain scan image. For Perform global normalization (e.g., divide by its maximum value) so that its range falls within the interval [0,1].
[0105] Next, using the organ segmentation mask obtained from the plain scan image, the average enhancement value inside the k-th organ is calculated. .
[0106] Then, combining the preset organ priority weights (e.g., liver = 1.2, spleen = 1.1, background = 0.5), the spatial adaptive weight of the organ is calculated, as shown in the following formula: , in, Preset organ priority weights; Spatial adaptive weights for organs.
[0107] Finally, a spatial weight map is generated: for each pixel in the image, based on its corresponding organ category k, the weight map is... The value is assigned to that pixel. This weight map is used to weight the image similarity loss pixel by pixel, so that in organ regions with significant contrast enhancement, the network obtains higher gradient signals during training, thereby forcing the network to devote more attention to these key regions to achieve more refined registration.
[0108] 3. Statistics of each loss term in historical training batches The weights of this layer automatically balance the relative contributions of various losses during training, preventing a certain type of loss from dominating the gradient direction due to its excessive magnitude, and promoting balanced optimization of various losses.
[0109] Specifically, a historical loss queue of length L=10 is maintained, recording the average value of each loss Lk in the most recent 10 training batches, where k represents the index of the loss item.
[0110] Calculate the normalized ratio of the loss of the k-th term. This refers to the relative proportion of the loss in the total loss. Based on this ratio, the basic weights of each loss (usually the initial set value or the weights from the previous stage) are adjusted in reverse to generate new balancing weights. , in, The weights are adjusted for each loss. This serves as the base weight for each type of loss.
[0111] When the historical average of a certain loss is large, its ratio When it is also large, the adjustment factor (1.0- The weight of a certain loss term will decrease, thus reducing the adjusted weights from the base weights and weakening its dominant role in the current training phase. Conversely, if the proportion of a certain loss term is small ( If the value is small, then the adjustment factor (1.0-) When the weight is close to 1, the adjusted weight will be close to or slightly higher than the base weight to promote the optimization of that loss term. This formula ensures that the adjusted weight for each loss term is limited to a range of [0.8, 1.2] times the base weight, avoiding drastic weight oscillations and guaranteeing training stability. This dynamic balancing mechanism is executed periodically (e.g., every 10 batches), continuously and automatically adjusting the relative strength of each loss term, guiding the network towards balanced convergence towards each optimization objective.
[0112] The three weight layers work together: training progress weights control the timing and intensity of anatomical constraints, enhancement-driven weights achieve fine-grained spatial registration control, and loss balancing weights ensure the stability of the optimization process. These three mechanisms together form a closed-loop intelligent weight adjustment system, significantly improving training efficiency and final model performance.
[0113] During the inference phase, after obtaining the multi-directional 2D deformation field output by the network, it is necessary to reconstruct it into the final 3D deformation field and generate a registration image through the following steps: 1. Voxel-level displacement information collection: For each voxel coordinate (i,j,k) in the target 3D space, the displacement vector corresponding to the voxel position is extracted from the 2D deformation field corresponding to the i-th slice in the X direction, the j-th slice in the Y direction, and the k-th slice in the Z direction.
[0114] 2. Adaptive Weighted Fusion: Since the deformation fields in different directions may have uncertainties in local areas, an adaptive weighted fusion strategy based on reliability is adopted. The local gradient magnitude of the 2D deformation field in each direction at the current position is calculated. The smaller the gradient magnitude, the smoother and more reliable the deformation. The normalized directional weights w_x, w_y, w_z are calculated in this way. The final 3D displacement (dx, dy, dz) of the voxel is obtained by weighted averaging of the displacement components provided by the three directions.
[0115] 3. Post-processing and verification of deformation field: The initial 3D deformation field generated by fusion is subjected to three-dimensional Gaussian smoothing in the whole space to eliminate possible minor discontinuities between directions. The Jacobian determinant of the deformation field is calculated and verified to be positive throughout the entire domain to ensure that the deformation field maintains its topological structure and does not fold.
[0116] 4. Volume resampling: Using a post-processed 3D deformation field, the original CT volumes of the arterial and venous phases are spatially transformed by trilinear interpolation to generate registered volumes that are strictly spatially aligned with the plain scan images.
[0117] 5. Z-axis block stitching: For ultra-long volumes processed using a block strategy, steps 1-4 are executed independently for each data block. For the overlapping area of adjacent blocks in the Z-axis direction, a linear gradient weight is used to perform a weighted average of the registration results from the two blocks to ensure a continuous transition of intensity and deformation along the entire Z-axis without seam artifacts.
[0118] like Figure 4 As shown in the embodiments of this application, a registration system for cross-phase CT images is also provided, comprising: The acquisition module 41 is used to acquire multi-phase CT images of the same patient, including plain scan images and at least one contrast-enhanced scan image; The segmentation module 42 is used to extract the corresponding target organ segmentation mask from the plain scan image and at least one contrast enhancement image using a pre-trained organ segmentation model, as anatomical prior information. Training module 43 is used to train the registration network model to learn the deformation field by using plain scan images as fixed images and at least one contrast enhancement image as a floating image, and by constructing anatomical constraint terms using anatomical prior information. The registration module 44 is used to register new CT image sequences using a trained registration network model, and outputs the deformation field and registration results that align the contrast-enhanced images to the anatomical space of the plain scan images.
[0119] like Figure 5 As shown, this application embodiment also provides an electronic device, including: a processor 51, a memory 52, and a communication bus 53; Communication bus 53 is used to realize the connection and communication between processor 51 and memory 52; The processor 51 is used to execute the registration processing program of the interphase CT images stored in the memory 52 to implement the steps of any of the interphase CT image registration methods described above.
[0120] It should be understood that the use of terms such as "system," "device," "unit," and / or "module" in this application is merely one method of distinguishing different components, elements, parts, sections, or assemblies at different levels. However, if other terms can achieve the same purpose, they may be replaced by other expressions.
[0121] The embodiments in this specification are described in a progressive manner, with each embodiment focusing on the related aspects. For any differences between the embodiments, or for the same or similar parts between the embodiments, please refer to each other.
[0122] The above description of the disclosed embodiments enables those skilled in the art to make or use this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A registration method for cross-phase CT images, characterized in that, include: Acquire multi-phase CT images of the same patient, the multi-phase CT images including plain scan images and at least one contrast-enhanced scan image; Using a pre-trained organ segmentation model, the corresponding target organ segmentation mask is extracted from the plain scan image and the at least one contrast-enhanced image as anatomical prior information. Using the plain scan image as a fixed image and at least one contrast-enhanced image as a floating image, and utilizing the anatomical prior information to construct anatomical constraint terms, a registration network model is trained to learn the deformation field. Using a trained registration network model, new CT image sequences are registered, and the deformation field and registration results are output, aligning the contrast-enhanced images to the anatomical space of the plain scan images.
2. The method according to claim 1, characterized in that, The registration network model includes a shared encoder and at least one independent decoder. The process of using the plain scan image as a fixed image and at least one contrast-enhanced image as a floating image, and constructing anatomical constraints using the anatomical prior information, to train the registration network model to learn the deformation field includes: Each contrast enhancement period image and the plain scan period image are paired to form an image pair, which are then input into the shared encoder to obtain the corresponding encoded features. The encoded features are then input into the corresponding independent decoder to generate the deformation field from the contrast enhancement period image to the plain scan period image. The deformation field is used to perform spatial transformation on the contrast enhancement image and the corresponding anatomical prior information to obtain the transformation result. Based on the transformation result and the anatomical prior information, a loss function including anatomical constraint terms is calculated to optimize the registration network model.
3. The method according to claim 2, characterized in that, The at least one contrast enhancement phase image includes an arterial phase image and a venous phase image, and the at least one independent decoder includes a first decoder and a second decoder; The step of forming image pairs between each contrast-enhanced image and the plain scan image, inputting them into the shared encoder to obtain corresponding encoded features, and then inputting the encoded features into the corresponding independent decoder to generate a deformation field from the contrast-enhanced image to the plain scan image includes: The arterial phase image and the plain scan image are combined to form a first image pair, which is then input into the shared encoder to obtain a first encoded feature. The first encoded feature is then input into the first decoder to generate a first deformation field from the arterial phase image to the plain scan image. The venous phase image and the plain scan image are combined to form a second image pair, which is then input into the shared encoder to obtain a second encoded feature. The second encoded feature is then input into the second decoder to generate a second deformation field from the venous phase image to the plain scan image.
4. The method according to claim 2, characterized in that, Before using the plain scan image as a fixed image, at least one of the contrast-enhanced images as floating images, and constructing anatomical constraint terms using the anatomical prior information to train the registration network model to learn the deformation field, the method further includes: The 3D volume data of the plain scan image and the contrast-enhanced image are decomposed into 2D slices along at least two preset anatomical directions; By using multi-directional expert path networks, feature extraction and dimensionality standardization are performed on 2D slices of corresponding anatomical directions to obtain standardized features.
5. The method according to claim 4, characterized in that, The step of decomposing the 3D volumetric data of the plain scan image and the contrast-enhanced image into 2D slices along at least two preset anatomical directions includes: 2D slices were extracted from the 3D volume data of the plain scan image and the contrast-enhanced image along three orthogonal anatomical directions: transverse, coronal, and sagittal.
6. The method according to claim 2, characterized in that, The anatomical constraints include at least two of the following: organ overlap loss, boundary consistency loss, shape preservation loss, and region smoothness loss.
7. The method according to claim 2 or 6, characterized in that, The loss function further includes an image similarity term, which includes at least two of the following: mutual information loss, local normalized cross-correlation loss, gradient similarity loss, modality-independent neighborhood descriptor loss, and normalized gradient field loss.
8. The method according to claim 7, characterized in that, During the training of the registration network model, an adaptive weight adjustment strategy is adopted to dynamically adjust the weights of at least one of the loss terms. The adaptive weight adjustment is based on at least one of the following factors: the current stage of the training process, the enhancement intensity information of the contrast agent in the contrast enhancement period image, and the statistics of each loss term in historical training batches.
9. A registration system for cross-phase CT images, characterized in that, include: The acquisition module is used to acquire multi-phase CT images of the same patient, the multi-phase CT images including plain scan images and at least one contrast-enhanced scan image; The segmentation module is used to extract the corresponding target organ segmentation mask from the plain scan image and the at least one contrast enhancement image using a pre-trained organ segmentation model, as anatomical prior information. The training module is used to train the registration network model to learn the deformation field by using the plain scan image as a fixed image and at least one of the contrast enhancement images as floating images, and by constructing anatomical constraint terms using the anatomical prior information. The registration module is used to register new CT image sequences using a trained registration network model, and outputs the deformation field and registration results that align the contrast-enhanced images to the anatomical space of the plain scan images.
10. An electronic device, characterized in that, include: Processor, memory, and communication bus; The communication bus is used to realize the connection and communication between the processor and the memory; The processor is configured to execute a registration processing program for interphase CT images stored in the memory, to implement the steps of the interphase CT image registration method as described in any one of claims 1-8.