A physically constrained deformable medical image registration method based on force-coupled voxel interaction

By constructing a multi-scale image pyramid and an elastic propagation module, the mechanical coupling relationship between voxels is explicitly modeled, solving the problem of the lack of voxel coupling relationship in existing methods. This achieves high-precision and stable medical image registration, enhancing the anatomical consistency and interpretability of biological tissues.

CN122244111APending Publication Date: 2026-06-19TONGJI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TONGJI UNIV
Filing Date
2026-03-06
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing deformable medical image registration methods lack modeling of inter-voxel mechanical coupling and stress transfer when dealing with complex physiological or pathological changes, leading to topological destruction and local rearrangement. In particular, the registration accuracy and stability are insufficient under nonlinear and large-amplitude deformation.

Method used

A physical constraint deformable medical image registration method based on force-coupled voxel interaction is adopted. A multi-scale image pyramid is constructed through a parameter-sharing twin ResNet encoder. Combined with a flow field estimation module and a spatial transformation network, an elastic propagation module is used to simulate the elastic force propagation between voxels, and a physical constraint loss function is constructed to realize the coupling relationship and stress transfer between voxels.

Benefits of technology

It improves the accuracy and stability of medical image registration, avoids topological tearing and local discontinuities, enhances anatomical consistency and clinical interpretability, conforms to the actual elastic behavior of biological tissues, and improves the biological interpretability and accuracy of registration.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244111A_ABST
    Figure CN122244111A_ABST
Patent Text Reader

Abstract

This invention provides a physically constrained deformable medical image registration method based on force-coupled voxel interaction. The method includes acquiring fixed and moving images; constructing multi-scale feature maps from low to high resolution using a parameter-shared Siamese ResNet encoder; predicting the initial deformation field at the lowest resolution scale and accumulating it scale-by-scale; generating a voxel-level error map and modulating high-resolution features using a three-dimensional differentiable spatial transformation network; guiding a flow field estimation module to predict the deformation field at each scale; inputting the highest resolution accumulated deformation field into a physical elasticity propagation module to calculate the strain tensor of each voxel; and constructing a physical constraint loss function based on the strain tensor for iterative optimization. This achieves deformable medical image registration that is topologically protected, anatomically consistent, highly accurate, and biomechanically sound. This invention enhances the representation capability of spatial geometry, effectively avoiding local folding and tissue tearing, while improving clinical interpretability and registration stability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of medical image processing technology, specifically relating to a physical constraint deformable medical image registration method based on force-coupled voxel interaction. Background Technology

[0002] Deformable medical image registration is a fundamental technology in medical image analysis, widely used in areas such as medical image alignment, 3D reconstruction, and image-guided therapy. Its main task is to establish spatial correspondences between different medical images through feature extraction and matching, thereby estimating the voxel-level non-rigid deformation field that aligns the source image to the target domain. Compared to rigid or affine methods, non-rigid deformable registration can capture complex nonlinear deformations caused by physiological processes (such as respiratory movements and organ displacement) and pathological changes (such as tumor growth and tissue atrophy), offering greater flexibility and expressive power.

[0003] Traditional deformable medical image registration methods are typically formulated as optimization-based variational problems, solved iteratively through a multi-scale pyramid framework. The core principle involves combining intensity-based similarity measures such as Normalized Cross-Correlation (NCC), Mutual Information (MI), and Sum of Squared Differences (SSD) with regularization terms (including smoothness constraints, curvature penalties, or differential homeomorphism constraints) to estimate the deformation field that non-rigidly maps a moving image to a stationary image. However, traditional deformable medical image registration methods often incur significant computational overhead and implementation complexity. The optimization process requires numerous interpolation operations, gradient calculations, and parameter searches, making the registration process both time-consuming and computationally expensive.

[0004] With the rapid development of deep learning, researchers have begun to explore deformable registration methods based on neural networks. Unlike traditional optimization-based methods, these neural networks learn end-to-end mappings from moving to stationary images, eliminating the need for iterative optimization in specific cases and enabling rapid inference and simplified deployment. However, most existing methods operate at the voxel level and rely on statistical regularization terms (such as smoothness and curvature constraints), lacking the biomechanical characteristics of biological tissue elasticity or neighborhood interaction properties. Under large-amplitude, highly nonlinear deformations, these constraints lead to topological tearing, often inducing folding, local discontinuities, and voxel rearrangements in local regions, typically manifesting as negative Jacobian voxels or tissue fragmentation, thereby impairing anatomical consistency and clinical interpretability.

[0005] In the prior art, CN120031926A discloses an unsupervised medical image registration method, which includes: (1) accepting a pair of medical images as input, wherein the medical images include a fixed image and a floating image; (2) downsampling and feature extraction of the images layer by layer through an encoder; (3) generating and optimizing a deformation field by applying an iterative structure and a decoder at the lowest scale; (4) applying and upsampling the deformation field through the decoder, and finally outputting the deformation field result; (5) deforming the floating image using the deformation field through a spatial transformation network, and outputting the final deformed image. However, this method usually treats image voxels as independent units, ignoring the interdependence and mechanical coupling between voxels. In the deformation process of complex biological tissues, the interaction between voxels is crucial for accurate modeling. Existing methods fail to effectively capture this, leading to topological destruction and local rearrangement in highly nonlinear deformation regions.

[0006] In summary, existing deformable medical image registration methods have not been able to effectively incorporate biomechanical features, especially in nonlinear and large-scale deformation scenarios. They lack modeling of mechanical coupling and stress transfer between voxels, thus failing to fully improve registration accuracy and stability, particularly when dealing with complex physiological or pathological changes. Summary of the Invention

[0007] The purpose of this invention is to overcome the shortcomings of the existing technology and provide a physically constrained deformable medical image registration method based on force-coupled voxel interaction.

[0008] The objective of this invention can be achieved through the following technical solutions: This invention provides a physically constrained deformable medical image registration method based on force-coupled voxel interaction, comprising the following steps: Acquire the fixed and moving images to be registered; By using a parameter-shared Siamese ResNet encoder, features are extracted from both stationary and moving images, and a multi-scale image pyramid and corresponding feature maps are constructed from low to high resolution. At the lowest resolution scale, the flow field estimation module predicts the initial deformation field based on the corresponding scale feature maps of the moving image and the fixed image, and the initial deformation field is used as the cumulative deformation field at the lowest resolution scale. Based on the cumulative deformation field at the lowest resolution scale, the moving image is deformed through a spatial transformation network, and the voxel-level error map between the deformed moving image and the fixed image is calculated. The error map is upsampled to the next high-resolution scale, the feature map of the moving image at the next high-resolution scale is modulated, the flow field estimation module is guided to predict the deformation field at this scale, and the deformation field predicted at this scale is combined and accumulated with the accumulated deformation field at the previous scale to obtain the accumulated deformation field at the current scale. Repeated error maps and cumulative deformation fields are calculated to obtain cumulative deformation fields and corresponding error maps at all scales. The cumulative deformation field at the highest resolution scale is input into the physics-based elastic propagation module, and the strain tensor of each voxel is calculated by simulating the propagation of elastic forces between voxels. A physical constraint loss function is constructed based on the strain tensor, and the cumulative deformation field at the highest resolution scale is iteratively updated until convergence is achieved with the goal of minimizing the physical constraint loss function. The converged highest-resolution accumulated deformation field is applied to the moving image to complete medical image registration.

[0009] Furthermore, the step of extracting features from both fixed and moving images using a parameter-sharing Siamese ResNet encoder to construct a multi-scale image pyramid and corresponding feature maps from low to high resolution specifically includes: Construct two ResNet encoders with identical structures and shared parameters, each receiving a fixed image. With moving images As input, the ResNet encoder is composed of multiple cascaded residual blocks, each residual block including a convolutional layer, a batch normalization layer and a nonlinear activation function layer; During the encoding process, the spatial resolution is progressively reduced through 3D convolutions with a stride greater than 1 or 3D downsampling operations, forming feature maps from the lowest resolution scale to the highest resolution scale. Let the th... The fixed image feature map and the moving image feature map at each scale are represented as follows: and ; Through a stepwise downsampling and feature extraction process, a multi-scale image pyramid is constructed, arranged from low resolution to high resolution. The lowest resolution scale corresponds to the smallest spatial size feature map, and the highest resolution scale corresponds to the original image resolution feature map. The multi-scale image pyramid includes 5 scales.

[0010] Furthermore, the flow field estimation module is constructed based on the inter-voxel force coupling interaction and structural topology perception mechanism, including a coupling alignment and structural topology perception module.

[0011] Furthermore, the flow field estimation module employs a three-stage progressive modeling strategy to achieve multi-scale feature fusion and deformation field prediction, specifically including: Fixed image feature map at the current scale With moving image feature map Perform channel splicing to obtain joint features; The joint features are subjected to two layers of three-dimensional convolution with a kernel size of 3×3×3 to obtain local interaction features. The local interaction features are then decomposed into channel branch features and spatial branch features through a convolution with a kernel size of 1×1×1. Global average pooling is performed on the spatial branch features to obtain global features, and the global features are then fused back into the channel branch features through element-wise multiplication to obtain global enhanced features. The global enhancement features are batch normalized and channel weights are adjusted. The features are decomposed into strong response features and weak response features through the Sigmoid gating mechanism. At the same time, cross-branch interaction is performed through channel dimension rearrangement to obtain adaptive decomposed features. The statistical information of average pooling, max pooling and spatial branch features is nonlinearly projected and fused to generate channel weight vectors, and the channel weight vectors are applied to adaptive decomposition features to obtain fused modulation features. Through convolutional layers with kernel sizes of 3×3×3, the fused modulation features are mapped to the deformation field of the moving image at the current scale. .

[0012] Furthermore, the spatial transformation network is a three-dimensional differentiable spatial transformation network, including a voxel-level sampling and interpolation module, used to perform three-dimensional spatial sampling and interpolation transformation on the moving image according to the deformation field to generate a deformed moving image at the corresponding scale; the spatial transformation network receives the moving image and the accumulated deformation field at the corresponding scale as input, outputs the deformed moving image, and maintains the correspondence between the deformation field and the moving image in the network to realize the image deformation operation required for multi-scale iterative registration.

[0013] Furthermore, the error map represents the absolute difference between the moving image and the fixed image at the voxel level, and the specific calculation method is as follows: Let the moving image at the current scale be the result of deformation by the spatial transformation network. Fixed image Voxel-level error plots of the two Represented as: in, Voxels in the error plot The value; Indicates a fixed image in voxels Pixel value at; This represents the moving image after being deformed by the spatial transformation network in voxels. Pixel value at; This represents the set of image voxels.

[0014] Furthermore, the step of upsampling the error map to the next high-resolution scale, modulating the feature map of the moving image at the next high-resolution scale, and guiding the flow field estimation module to predict the deformation field at that scale specifically includes: Error map of the next higher scale Upsampled to the current scale using bicubic interpolation The spatial resolution is used to obtain the upsampling error map. ;Will The input is fed into a series of 3D convolutional layers with a kernel size of 1×1×1, expanding the single-channel error map to the current scale. Corresponding moving image feature map Error characteristics are obtained using the same channel dimensions. ; Error characteristics Compared to the current scale Corresponding moving image feature map By splicing along the channel dimension, a joint feature is formed. ; The joint features are subjected to multi-layer 3D convolution with a kernel size of 3×3×3 to obtain fused features; the fused features are then subjected to softmax normalization at the voxel level to generate voxel-level attention weights. The formula is expressed as: Attention weight Compared to the current scale Corresponding moving image feature map Element-wise multiplication yields the modulated motion image feature map: in, Representing scale The corresponding modulated motion image feature map; Represents element-wise product; The modulated motion image feature map The flow field estimation module is used to predict the deformation field at that scale by inputting a fixed image feature map at the corresponding scale. .

[0015] Furthermore, the step of performing a composite transformation and accumulation of the deformation field predicted at this scale and the accumulated deformation field at the previous scale to obtain the accumulated deformation field at the current scale specifically includes: The accumulated deformation field of the previous scale is sampled to the spatial resolution of the current scale using a 3D bicubic interpolation or spatial transformation network. The deformation field predicted at the current scale is then applied to the upsampled accumulated deformation field of the previous scale using a spatial transformation network (STN) to obtain the composite transformed deformation field. This composite transformed deformation field is then added element-wise to the predicted deformation field at the current scale to obtain the accumulated deformation field at the current scale, as shown in the formula: in, For scale The corresponding deformation field; For scale The corresponding cumulative deformation field; This indicates a three-dimensional upsampling operation on the accumulated deformation field of the previous scale; Represents spatial transformation network operations; For scale The corresponding cumulative deformation field.

[0016] Furthermore, the step of inputting the accumulated deformation field at the highest resolution scale into the physics-based elastic propagation module, and calculating the strain tensor of each voxel by simulating the propagation of elastic forces between voxels, specifically includes: The cumulative deformation field at the highest resolution scale is used as the initial displacement field; By analyzing the cumulative deformation field and its error map at all scales, reliable voxels are identified as motion points, and other voxels are identified as response points, thus obtaining a set of motion points. With the set of response points ,in, Voxel representation The set of error map values ​​at five scales, i.e., the values ​​of the error map at all five scales are less than or equal to the error threshold or greater than the error threshold. is the error threshold, and is the 90th percentile of the error distribution; By iteratively updating the displacement field, the elastic deformation on the voxel mesh is simulated, and the strain tensor of each voxel is calculated using the following formula: in, voxels Displacement field at that location, The gradient of the displacement field, voxels The strain tensor; Indicates transpose; Based on the strain tensor, the corresponding stress tensor is derived using Hooke's law; the displacement of the response point is updated using the stress equilibrium equation according to the stress tensor until convergence; wherein, the moving point remains unchanged, and only the displacement of the response point is updated; Through an iterative process, a strain tensor is provided for each voxel based on the final deformation field after convergence. .

[0017] Furthermore, the physical constraint loss function is formulated as follows: in, The physical constraint loss function, The image similarity loss function; This represents the cumulative deformation field corresponding to the current highest resolution; Represents image transformation operations; , These represent moving and stationary images, respectively. The loss function is the smoothness regularization function. , These are the weighting coefficients; The global average elastic energy density is calculated using the following formula: in, Represents the set of image voxels The total number; For voxels Elastic energy density at the location; , Here, represents the Lamé constant, and represents the penalty intensity controlling for volume change and shear deformation, respectively. voxels The strain tensor at the point; The trace of the strain tensor; voxels Deformation energy density at the location; For the normal strain component of the strain tensor; The shear strain component of the strain tensor; The image similarity loss function is defined by the following formula: in, , Representing voxels The pixel values ​​of the moving and stationary images at the location; , These are the global pixel averages for the moving image and the stationary image, respectively. The smoothing regularization loss function is formulated as follows: in, The cumulative deformation field in voxels corresponding to the current highest resolution The amount of deformation at the location; Represents the deformation field In voxels The gradient at that point.

[0018] Compared with the prior art, the present invention has the following advantages: (1) Most existing deformable medical image registration methods treat image voxels as independent units, ignoring the mechanical coupling relationship between adjacent voxels. This approach makes it difficult to effectively capture the interaction and mechanical transmission between voxels when dealing with complex physiological and pathological deformations. Especially in scenarios with large deformations, this may cause topological tearing and local discontinuities, impairing anatomical consistency and clinical interpretability. To address this problem, this invention breaks through the limitations of existing methods by explicitly modeling the mechanical coupling and stress transmission between adjacent voxels in a deep learning registration framework for the first time. Through a three-stage modeling strategy (local-global modeling, adaptive feature decomposition, and multi-source statistical fusion), the coupling relationship between voxels is established, thereby enhancing the representation ability of spatial geometry. This enables the network to accurately capture the interaction between tissues under large nonlinear deformations, effectively avoiding the occurrence of local folding and negative Jacobian voxels, and improving the accuracy and stability of registration.

[0019] (2) In the prior art, although deep learning-based medical image registration methods have reduced their reliance on iterative optimization, they still do not fully consider the elastic properties of biological tissues during the registration process and lack modeling of the physical properties during tissue deformation. Most existing methods rely on statistical regularization terms (such as smoothness and curvature constraints) to control the deformation field, but they cannot simulate the real elastic behavior of tissues, leading to unreasonable local distortions and voxel rearrangements in areas of strong deformation. To solve this problem, this invention innovatively integrates the principles of continuum mechanics (Hooke's law, stress-strain relationship) into the deep learning framework and proposes a physics-based elastic propagation module to simulate the real elastic deformation process of tissues. Through a GPU-parallelized elastic propagation algorithm, seed points are adaptively selected based on the registration error to simulate the strain tensor of each voxel and output the complete stress tensor (including 3 normal stresses and 3 shear stresses). This technical feature effectively improves the physical consistency of image registration, making the deformation process more consistent with the actual behavior of biological tissues, avoiding unreasonable local deformations, and enhancing the biological interpretability of image registration.

[0020] (3) Existing medical image registration methods typically rely on traditional smoothness regularization constraints, which are often highly hypothetical in numerical terms and lack clear physical meaning. Specifically, traditional smoothness loss is usually limited to gradient or curvature penalties between images, failing to constrain the deformation field from a physical perspective, thus limiting further improvement in image registration accuracy. To address this issue, this invention proposes a differentiable physical loss term based on the strain tensor, transforming the strain energy in classical linear elasticity theory into a constraint for end-to-end training. By parameterizing the material's elastic properties using the Lamé constant, the loss function not only has a clear physical meaning but also dynamically adjusts the deformation field during training, promoting the minimization of strain energy. Compared to traditional mathematical smoothness constraints, this innovative loss term more effectively reflects the true tissue elasticity, improving the accuracy and consistency of registration.

[0021] (4) Existing pyramid methods often encounter error accumulation problems when dealing with multi-scale deformations, especially at the coarse-scale stage. The registration error map affects the deformation field estimation at the fine-scale stage, resulting in the inability to effectively correct local errors and causing inaccurate final registration results. To address this issue, this invention breaks through the error accumulation bottleneck of the traditional pyramid method by transforming the registration error map of the previous scale into attention weights and dynamically modulating the feature matching at the current scale. Specifically, through channel projection of the error map and voxel-level attention modulation mechanism, the network can adaptively focus on high-error regions and accurately complete the registration from coarse to fine. Through this targeted feature adjustment, the network can better handle local high-error regions, avoid the propagation and accumulation of errors, and greatly improve the registration accuracy.

[0022] (5) In existing registration methods, although a multi-scale framework is used, the lack of effective feature fusion and error propagation mechanisms often leads to problems when handling matching between large and fine scales. To solve this problem, this invention adopts a three-stage progressive modeling strategy to perform multi-scale feature fusion and deformation field prediction, thereby further improving the accuracy and stability of registration. In each scale, the feature map is effectively modulated, and the flow field estimation module accurately predicts the deformation field at the current scale by gradually fusing local and global features. Through composite transformation and accumulation methods, the deformation field is gradually refined, thereby effectively reducing the error propagation problem in multi-scale matching and improving the accuracy and stability of the final registration result. Attached Figure Description

[0023] Figure 1 This is a flowchart of the medical image registration method according to an embodiment of the present invention; Figure 2 This is a schematic diagram of the overall model framework of an embodiment of the present invention; Figure 3This is a V-CAST structure diagram according to an embodiment of the present invention. Detailed Implementation

[0024] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0025] Example 1: This embodiment provides a physically constrained deformable medical image registration method based on force-coupled voxel interaction, such as... Figure 1 , Figure 2 As shown, it includes the following steps: Step S1: Obtain the fixed image and the moving image to be registered; wherein, Step S2: Feature extraction is performed on both stationary and moving images using a parameter-shared Siamese ResNet encoder, constructing a multi-scale image pyramid and corresponding feature maps from low to high resolution. Specifically, this includes: Construct two ResNet encoders with identical structures and shared parameters, each receiving a fixed image. With moving images As input, the ResNet encoder consists of multiple concatenated residual blocks, each of which includes a convolutional layer, a batch normalization layer, and a non-linear activation function layer. During the encoding process, the spatial resolution is progressively reduced through 3D convolutions with a stride greater than 1 or 3D downsampling operations, forming feature maps from the lowest resolution scale to the highest resolution scale. Let the th... The fixed image feature map and the moving image feature map at each scale are represented as follows: and These feature maps not only preserve local structural information, but also learn cross-voxel contextual relationships through residual blocks, providing rich representations for subsequent deformation field prediction.

[0026] A multi-scale image pyramid is constructed, arranged from low to high resolution, through a progressive downsampling and feature extraction process. The lowest resolution scale corresponds to the feature map with the smallest spatial size, mainly used to capture coarse global displacement information; the highest resolution scale corresponds to the original image resolution and can reflect subtle local structural changes. At the low resolution scale, the network can obtain a larger receptive field and quickly capture the deformation trend of the overall structure, while at the high resolution scale, the network can refine the deformation field and improve voxel-level registration accuracy. Through this multi-scale pyramid strategy, the network achieves a balance between coarse localization and fine alignment, making the registration process both efficient and accurate. The multi-scale image pyramid includes five scales, which, through progressive refinement, can effectively handle large-amplitude and nonlinear deformations while reducing the computational burden at the high-resolution stage.

[0027] Step S3: At the lowest resolution scale, the flow field estimation module predicts the initial deformation field based on the corresponding scale feature maps of the moving image and the fixed image, and uses the initial deformation field as the accumulated deformation field at the lowest resolution scale. At the lowest resolution scale, the flow field estimation module predicts the initial deformation field based on the corresponding scale feature maps of the moving and stationary images, and uses this initial deformation field as the accumulated deformation field at the lowest resolution scale. Figure 3 As shown, the flow field estimation module includes a voxel-based CoupledAlignment with Structural Topology (V-CAST) module. This module can establish the coupling relationship between adjacent voxels at the voxel level, while capturing the topological information of the image structure, providing spatial geometric perception capability for deformation field prediction, thereby improving the accuracy and stability of the initial deformation field.

[0028] The specific operation process is as follows: Fix the image feature map With moving image feature map By concatenating the data along the channel dimension, joint features are obtained. This operation fuses local structural information from both fixed and moving images, providing a foundation for subsequent interaction modeling. Two layers of 3D convolution with a 3×3×3 kernel are applied to the joint features to capture voxel interaction information within the local spatial neighborhood, thereby extracting local interaction features. To separate channel and spatial information, the local interaction features are decomposed into channel branch features and spatial branch features using a 1×1×1 convolution. This helps the network independently learn different types of feature representations, improving feature expressiveness and deformation prediction sensitivity.

[0029] Global average pooling is applied to the spatial branch features to obtain global features. These global features are then fused back into the channel branch features through element-wise multiplication to obtain global enhanced features. This fusion strategy injects global structural information into local features, ensuring that local predictions not only rely on neighborhood information but also consider overall geometric constraints, thus helping to avoid local registration errors. Batch normalization and channel weight adjustment are applied to the global enhanced features, and the features are decomposed into strong and weak response features using a Sigmoid gating mechanism. Simultaneously, cross-branch interaction is achieved through channel dimension rearrangement to obtain adaptive decomposition features. This process dynamically adjusts the contribution of different feature channels to deformation prediction, allowing the network to focus more on key structural regions.

[0030] Furthermore, the statistical information from average pooling, max pooling, and spatial branching features is nonlinearly projected and fused to generate channel weight vectors. These vectors are then applied to adaptively decomposed features to obtain fused modulation features. This step, through the fusion of multi-source statistical information, enables the network to effectively focus on important structures even in regions with high errors or low contrast, improving the accuracy and robustness of the initial deformation field. Finally, the fused modulation features are mapped to the current-scale moving image deformation field through convolutional layers (with kernel size of 3×3×3). This deformation field provides the overall deformation trend at the lowest resolution scale, providing a reliable initial field for subsequent accumulation and refinement at higher resolution scales, and ensuring the convergence and registration accuracy of the multi-scale iterative process.

[0031] Step S4: Deform the moving image using a spatial transformation network based on the accumulated deformation field at the lowest resolution scale, and calculate the voxel-level error map between the deformed moving image and the fixed image, specifically including: Based on the accumulated deformation field at the lowest resolution scale, a spatial transformation network deforms the moving image and calculates the voxel-level error map between the deformed moving image and the stationary image. The spatial transformation network employs a three-dimensional differentiable spatial transformation network architecture, including voxel-level sampling and interpolation modules. It can perform three-dimensional spatial sampling and interpolation transformation on the moving image based on the input deformation field, thereby generating a deformed moving image at the corresponding scale. This network receives the moving image and the accumulated deformation field at the corresponding scale as input and outputs the deformed moving image while maintaining the voxel-level correspondence between the deformation field and the moving image. This design ensures that deformation information at each scale can be accurately transmitted and updated during multi-scale iterative registration, thereby improving the overall registration accuracy and stability.

[0032] The difference between the deformed moving image and the stationary image is calculated using voxel-level absolute differences, forming an error map. Let the moving image deformed by the spatial transformation network at the current scale be... Fixed image Voxel-level error plots of the two Defined as: in, Voxels in the error plot The value; Indicates a fixed image in voxels Pixel value at; This represents the moving image after being deformed by the spatial transformation network in voxels. Pixel value at; This represents the set of image voxels.

[0033] This voxel-level error calculation provides a clear visual representation of the differences between moving and stationary images at each spatial location, offering reference information for feature modulation and deformation field prediction at subsequent scales. The error map not only guides iterative optimization at high-resolution scales but also helps the network identify local high-error regions, enabling the network to adaptively focus on complex structural regions in subsequent processing stages, thereby improving local registration accuracy and overall anatomical consistency.

[0034] Step S5: Upsample the error map to the next high-resolution scale, modulate the feature map of the moving image at the next high-resolution scale, guide the flow field estimation module to predict the deformation field at this scale, and perform a composite transformation and accumulation of the predicted deformation field at this scale with the accumulated deformation field at the previous scale to obtain the accumulated deformation field at the current scale. Specifically, this includes: Error plot of the previous scale By upsampling to the current high-resolution scale using bicubic interpolation, an upsampling error map consistent with the spatial resolution of the current scale is obtained. To ensure that the error information can be matched with the feature map at the current scale, the following steps are taken: Input a series of 3D convolutional layers with a kernel size of 1×1×1, and expand the single-channel error map to match the feature map of the moving image at the current scale. The same channel dimension is used to generate error features. This channel expansion preserves local voxel error information in each channel, allowing the network to fully utilize the error distribution information when subsequently predicting the deformation field.

[0035] Error characteristics Feature map of moving image at current scale Joint features are formed by splicing together at the channel dimension. Then, multi-layer 3D convolutions with a kernel size of 3×3×3 are performed on the joint features to extract the fused features. Softmax normalization is then performed on the fused features at the voxel level to generate voxel-level attention weights. The formula is expressed as: This attention weight can dynamically calibrate high-error regions, enabling the network to pay more attention to voxels with large registration errors during feature modulation, thereby improving the alignment accuracy of local structures at high-resolution scales.

[0036] Voxel-level attention weights Feature map of moving image at current scale Element-wise multiplication yields the modulated motion image feature map: in, Representing scale The corresponding modulated motion image feature map; Represents element-wise product; This operation can adaptively enhance the representation capability of key regions through error-guided feature modulation, thereby guiding the flow field estimation module to generate a more accurate deformation field. Subsequently, Inputting a fixed image feature map of the corresponding scale into the flow field estimation module predicts the deformation field at the current scale. This provides a foundation for subsequent calculations of the cumulative deformation field.

[0037] This error-guided multi-scale feature modulation mechanism can overcome the error accumulation problem of the traditional pyramid method, enabling the network to adaptively focus on high-error regions during the coarse-to-fine iteration process, thereby improving the overall registration accuracy and the consistency of local anatomical structures.

[0038] The accumulated deformation field of the previous scale is sampled to the spatial resolution of the current scale using a 3D bicubic interpolation or spatial transformation network. The deformation field predicted at the current scale is then applied to the upsampled accumulated deformation field of the previous scale using a spatial transformation network (STN) to obtain the composite transformed deformation field. This composite transformed deformation field is then added element-wise to the predicted deformation field at the current scale to obtain the accumulated deformation field at the current scale, as shown in the formula: in, For scale The corresponding deformation field; For scale The corresponding cumulative deformation field; This indicates a three-dimensional upsampling operation on the accumulated deformation field of the previous scale; Represents spatial transformation network operations; For scale The corresponding cumulative deformation field. Through this multi-scale composite and cumulative mechanism, global deformation information obtained at a low-resolution scale can be combined with local fine deformation at a high-resolution scale to achieve iterative optimization from coarse to fine. This preserves the continuity of the overall registration structure while improving the alignment accuracy of local tissues, effectively preventing local folding and topological destruction, thereby ensuring voxel-level anatomical consistency and clinical interpretability.

[0039] Step S6: Repeat steps S4 and S5 until the cumulative deformation field and corresponding error map for all scales are obtained. Step S7: Input the accumulated deformation field at the highest resolution scale into the physics-based elastic propagation module, and calculate the strain tensor of each voxel by simulating the propagation of elastic forces between voxels. Specifically, this includes: The accumulated deformation field at the highest resolution scale is used as the initial displacement field input to the physics-based elastic propagation module. First, the accumulated deformation fields at all scales and their corresponding error maps are analyzed. Reliable voxels are identified as motion points based on the voxel-level error distribution, and the remaining voxels are used as response points. Specifically, the voxels... p The set of error map values ​​at five scales is denoted as... ,when All values ​​are less than or equal to the set error threshold. When the action is in motion, the voxel is designated as a motion point; otherwise, it is designated as a response point. Error threshold. It can be set to the 90th percentile of the error distribution to ensure that the selected motion point has highly reliable displacement information. The motion point serves as a rigid reference and does not update its displacement with iteration, while the response point adjusts its displacement according to the mechanical constraints of its neighborhood during iteration, thereby simulating the elastic response of a real tissue.

[0040] During the iteration process, the strain tensor of each voxel is calculated by simulating elastic deformation on the voxel mesh. (Strain tensor) It is calculated from the displacement field gradient, and the formula is: in, voxels Displacement field at that location, The gradient of the displacement field, voxels The strain tensor; Indicates transpose; The calculated strain tensor is used to further derive the stress tensor using Hooke's law, thereby obtaining the mechanical feedback at each response point. Based on the stress tensor and the stress balance equations, the displacement of the response point is updated until the iteration converges. In this way, each voxel obtains its final displacement considering the elastic coupling of its neighborhood, ensuring the continuity of the overall structure and the physical rationality of local deformations.

[0041] The final deformation field after iterative convergence not only provides high-precision registration displacement, but also generates a complete strain tensor for each voxel. The model includes three normal strain components and three shear strain components, which can be used for further calculation of physical constraint loss or clinical analysis. This method can effectively suppress local folding, topological disruption, and voxel rearrangement, improve the anatomical consistency and interpretability of medical image registration, and simulate the real elastic mechanical properties of tissues, thus enhancing the biological rationality of the registration.

[0042] Step S8: Construct a physical constraint loss function based on the strain tensor, and iteratively update the cumulative deformation field at the highest resolution scale with the goal of minimizing the physical constraint loss function until convergence; The physical constraint loss function is given by the following formula: in, The physical constraint loss function, The image similarity loss function; This represents the cumulative deformation field corresponding to the current highest resolution; Represents image transformation operations; , These represent moving and stationary images, respectively. The loss function is the smoothness regularization function. , These are the weighting coefficients; The global average elastic energy density is calculated using the following formula: in, Represents the set of image voxels The total number; For voxels Elastic energy density at the location; , Here, represents the Lamé constant, and represents the penalty intensity controlling for volume change and shear deformation, respectively. voxels The strain tensor at the point; The trace of the strain tensor; voxels Deformation energy density at the location; For the normal strain component of the strain tensor; The shear strain component of the strain tensor; The image similarity loss function is formulated as follows: in, , Representing voxels The pixel values ​​of the moving and stationary images at the location; , These are the global pixel averages for the moving image and the stationary image, respectively. The smoothness regularization loss function is given by the following formula: in, The cumulative deformation field in voxels corresponding to the current highest resolution The amount of deformation at the location; Represents the deformation field In voxels The gradient at that point.

[0043] Step S9: Apply the converged highest resolution accumulated deformation field to the moving image to complete the medical image registration.

[0044] Example 2: This embodiment provides a physically constrained deformable medical image registration system based on force-coupled voxel interaction, including an image input module, a feature extraction module, a flow field estimation module, a spatial transformation module, an accumulated deformation field generation module, a physically constrained elastic propagation module, and a loss calculation module. The image input module acquires and preprocesses both stationary and moving images; the feature extraction module constructs multi-scale feature maps and image pyramids using a parameter-shared Siamese ResNet encoder; the flow field estimation module predicts the deformation field for each scale feature map based on the V-CAST module and modulates high-resolution features using an attention mechanism combined with the error map; the spatial transformation module applies the deformation field to the moving image to generate a deformed image; the accumulated deformation field generation module performs composite accumulation of deformation fields at each scale to maintain continuity between scales; the physically constrained elastic propagation module iteratively optimizes the deformation field based on the strain tensor and Hooke's law to ensure force coupling and topological consistency between voxels; and the loss calculation module optimizes the accumulated deformation field by combining image similarity, smoothness, and global elastic energy density to achieve high-precision, anatomically consistent, and physically reasonable deformable registration.

[0045] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0046] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in the present invention, and these modifications or substitutions should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A physically constrained deformable medical image registration method based on force-coupled voxel interaction, characterized in that, Includes the following steps: Acquire the fixed and moving images to be registered; By using a parameter-shared Siamese ResNet encoder, features are extracted from both stationary and moving images, and a multi-scale image pyramid and corresponding feature maps are constructed from low to high resolution. At the lowest resolution scale, the flow field estimation module predicts the initial deformation field based on the corresponding scale feature maps of the moving image and the fixed image, and the initial deformation field is used as the cumulative deformation field at the lowest resolution scale. Based on the cumulative deformation field at the lowest resolution scale, the moving image is deformed through a spatial transformation network, and the voxel-level error map between the deformed moving image and the fixed image is calculated. The error map is upsampled to the next high-resolution scale, the feature map of the moving image at the next high-resolution scale is modulated, the flow field estimation module is guided to predict the deformation field at this scale, and the deformation field predicted at this scale is combined and accumulated with the accumulated deformation field at the previous scale to obtain the accumulated deformation field at the current scale. Repeated error maps and cumulative deformation fields are calculated to obtain cumulative deformation fields and corresponding error maps at all scales. The cumulative deformation field at the highest resolution scale is input into the physics-based elastic propagation module, and the strain tensor of each voxel is calculated by simulating the propagation of elastic forces between voxels. A physical constraint loss function is constructed based on the strain tensor, and the cumulative deformation field at the highest resolution scale is iteratively updated until convergence is achieved with the goal of minimizing the physical constraint loss function. The converged highest-resolution accumulated deformation field is applied to the moving image to complete medical image registration.

2. The method for physical constraint deformable medical image registration based on force-coupled voxel interaction according to claim 1, characterized in that, The method involves extracting features from both fixed and moving images using a parameter-shared Siamese ResNet encoder, constructing a multi-scale image pyramid and corresponding feature maps from low to high resolution. Specifically, this includes: Construct two ResNet encoders with identical structures and shared parameters, each receiving a fixed image. With moving images As input, the ResNet encoder is composed of multiple cascaded residual blocks, each residual block including a convolutional layer, a batch normalization layer and a nonlinear activation function layer; During the encoding process, the spatial resolution is progressively reduced through 3D convolutions with a stride greater than 1 or 3D downsampling operations, forming feature maps from the lowest resolution scale to the highest resolution scale. Let the th... The fixed image feature map and the moving image feature map at each scale are represented as follows: and ; Through a stepwise downsampling and feature extraction process, a multi-scale image pyramid is constructed, arranged from low resolution to high resolution. The lowest resolution scale corresponds to the smallest spatial size feature map, and the highest resolution scale corresponds to the original image resolution feature map. The multi-scale image pyramid includes 5 scales.

3. The method for physical constraint deformable medical image registration based on force-coupled voxel interaction according to claim 1, characterized in that, The flow field estimation module is constructed based on the inter-voxel force coupling interaction and structural topology perception mechanism, including the coupling alignment and structural topology perception modules.

4. The method for registering physically constrained deformable medical images based on force-coupled voxel interaction according to claim 3, characterized in that, The flow field estimation module employs a three-stage progressive modeling strategy to achieve multi-scale feature fusion and deformation field prediction, specifically including: Fixed image feature map at the current scale With moving image feature map Perform channel splicing to obtain joint features; The joint features are subjected to two layers of three-dimensional convolution with a kernel size of 3×3×3 to obtain local interaction features. The local interaction features are then decomposed into channel branch features and spatial branch features through a convolution with a kernel size of 1×1×1. Global average pooling is performed on the spatial branch features to obtain global features, and the global features are then fused back into the channel branch features through element-wise multiplication to obtain global enhanced features. The global enhancement features are batch normalized and channel weights are adjusted. The features are decomposed into strong response features and weak response features through the Sigmoid gating mechanism. At the same time, cross-branch interaction is performed through channel dimension rearrangement to obtain adaptive decomposed features. The statistical information of average pooling, max pooling and spatial branch features is nonlinearly projected and fused to generate channel weight vectors, and the channel weight vectors are applied to adaptive decomposition features to obtain fused modulation features. Through convolutional layers with kernel sizes of 3×3×3, the fused modulation features are mapped to the deformation field of the moving image at the current scale. .

5. The method for registering physically constrained deformable medical images based on force-coupled voxel interaction according to claim 1, characterized in that, The spatial transformation network is a three-dimensional differentiable spatial transformation network, including a voxel-level sampling and interpolation module, which is used to perform three-dimensional spatial sampling and interpolation transformation on the moving image according to the deformation field to generate a deformed moving image at the corresponding scale. The spatial transformation network receives the moving image and the accumulated deformation field at the corresponding scale as input, outputs the deformed moving image, and maintains the correspondence between the deformation field and the moving image in the network to realize the image deformation operation required for multi-scale iterative registration.

6. The method for physical constraint deformable medical image registration based on force-coupled voxel interaction according to claim 1, characterized in that, The error map represents the absolute difference between the moving image and the stationary image at the voxel level, and the specific calculation method is as follows: Let the moving image at the current scale be the result of deformation by the spatial transformation network. Fixed image Voxel-level error plots of the two Represented as: in, Voxels in the error plot The value; Indicates a fixed image in voxels Pixel value at; This represents the moving image after being deformed by the spatial transformation network in voxels. Pixel value at; This represents the set of image voxels.

7. The method for physical constraint deformable medical image registration based on force-coupled voxel interaction according to claim 1, characterized in that, The step of upsampling the error map to the next high-resolution scale, modulating the feature map of the moving image at the next high-resolution scale, and guiding the flow field estimation module to predict the deformation field at that scale specifically includes: Error map of the next higher scale Upsampled to the current scale using bicubic interpolation The spatial resolution is used to obtain the upsampling error map. ;Will The input is fed into a series of 3D convolutional layers with a kernel size of 1×1×1, expanding the single-channel error map to the current scale. Corresponding moving image feature map Error characteristics are obtained using the same channel dimensions. ; Error characteristics Compared to the current scale Corresponding moving image feature map By splicing along the channel dimension, a joint feature is formed. ; The joint features are subjected to multi-layer 3D convolution with a kernel size of 3×3×3 to obtain fused features; the fused features are then subjected to softmax normalization at the voxel level to generate voxel-level attention weights. The formula is expressed as: Attention weight Compared to the current scale Corresponding moving image feature map Element-wise multiplication yields the modulated motion image feature map: in, Representing scale The corresponding modulated motion image feature map; Represents element-wise product; The modulated motion image feature map The flow field estimation module is used to predict the deformation field at that scale by inputting a fixed image feature map at the corresponding scale. .

8. The method for physical constraint deformable medical image registration based on force-coupled voxel interaction according to claim 1, characterized in that, The step of performing a composite transformation and accumulation of the deformation field predicted at this scale and the accumulated deformation field at the previous scale to obtain the accumulated deformation field at the current scale specifically includes: The accumulated deformation field of the previous scale is sampled to the spatial resolution of the current scale using a 3D bicubic interpolation or spatial transformation network. The deformation field predicted at the current scale is then applied to the upsampled accumulated deformation field of the previous scale using a spatial transformation network (STN) to obtain the composite transformed deformation field. This composite transformed deformation field is then added element-wise to the predicted deformation field at the current scale to obtain the accumulated deformation field at the current scale, as shown in the formula: in, For scale The corresponding deformation field; For scale The corresponding cumulative deformation field; This indicates a three-dimensional upsampling operation on the accumulated deformation field of the previous scale; Represents spatial transformation network operations; For scale The corresponding cumulative deformation field.

9. The method for physical constraint deformable medical image registration based on force-coupled voxel interaction according to claim 1, characterized in that, The step of inputting the accumulated deformation field at the highest resolution scale into the physics-based elastic propagation module, and calculating the strain tensor of each voxel by simulating the propagation of elastic forces between voxels, specifically includes: The cumulative deformation field at the highest resolution scale is used as the initial displacement field; By analyzing the cumulative deformation field and its error map at all scales, reliable voxels are identified as motion points, and other voxels are identified as response points, thus obtaining a set of motion points. With the set of response points ,in, Voxel representation The set of error map values ​​at five scales, i.e., the values ​​of the error map at all five scales are less than or equal to the error threshold or greater than the error threshold. is the error threshold, and is the 90th percentile of the error distribution; By iteratively updating the displacement field, the elastic deformation on the voxel mesh is simulated, and the strain tensor of each voxel is calculated using the following formula: in, voxels Displacement field at that location, The gradient of the displacement field, voxels The strain tensor; Indicates transpose; Based on the strain tensor, the corresponding stress tensor is derived using Hooke's law; the displacement of the response point is updated using the stress equilibrium equation according to the stress tensor until convergence; wherein, the moving point remains unchanged, and only the displacement of the response point is updated; Through an iterative process, a strain tensor is provided for each voxel based on the final deformation field after convergence. .

10. The method for registering physically constrained deformable medical images based on force-coupled voxel interaction according to claim 1, characterized in that, The physical constraint loss function is formulated as follows: in, The physical constraint loss function, The image similarity loss function; This represents the cumulative deformation field corresponding to the current highest resolution; Represents image transformation operations; , These represent moving and stationary images, respectively. The loss function is the smoothness regularization function. , These are the weighting coefficients; The global average elastic energy density is calculated using the following formula: in, Represents the set of image voxels The total number; For voxels Elastic energy density at the location; , Here, represents the Lamé constant, and represents the penalty intensity controlling for volume change and shear deformation, respectively. voxels The strain tensor at the point; The trace of the strain tensor; voxels Deformation energy density at the location; For the normal strain component of the strain tensor; The shear strain component of the strain tensor; The image similarity loss function is defined by the following formula: in, , Representing voxels The pixel values ​​of the moving and stationary images at the location; , These are the global pixel averages for the moving image and the stationary image, respectively. The smoothing regularization loss function is formulated as follows: in, The cumulative deformation field in voxels corresponding to the current highest resolution The amount of deformation at the location; Represents the deformation field In voxels The gradient at that point.