Fusion cross-modal contrast learning and augmented reality fracture auxiliary surgery three-dimensional imaging system, electronic device, storage medium
By using cross-modal contrastive learning and augmented reality technology, a three-dimensional imaging model is constructed, which solves the problem of incomplete image information in traditional fracture surgery, improves the accuracy of image fusion and surgical navigation, and achieves the safety and stability of complex fracture surgery.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HANGZHOU DIANZI UNIV
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-12
Smart Images

Figure CN122199812A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical image processing and computer-assisted surgery technology, specifically to a three-dimensional imaging system, electronic device, and storage medium for fracture-assisted surgery that integrates cross-modal contrastive learning and augmented reality. Background Technology
[0002] In fracture treatment, soft tissue injuries (such as those to ligaments, muscles, and joint capsules) often occur simultaneously with the fracture. Because soft tissue injuries can significantly impact bone recovery, functional reconstruction, and postoperative rehabilitation, accurate assessment of soft tissue injuries is crucial for surgical planning and treatment decisions. However, traditional X-ray imaging typically only provides information about the fracture site and struggles to effectively detect details and damage to soft tissues. Therefore, introducing MRI imaging, which is sensitive to soft tissues, for soft tissue injury assessment, combined with the advantages of X-ray imaging in depicting fracture morphology, provides comprehensive information on the fracture and its accompanying soft tissue injuries for clinical surgery. This allows for optimized surgical planning and improved surgical success rates, representing an ideal solution.
[0003] In recent years, the rapid development of deep learning technology, especially cross-modal contrastive learning, has provided new possibilities for the multi-dimensional analysis and fusion of medical images. By combining image data from different modalities (such as X-rays and MRI), not only can richer anatomical information be extracted, but the limitations of single-modal images can also be overcome to generate more comprehensive and accurate three-dimensional imaging models. At the same time, the introduction of augmented reality (AR) technology can combine virtual image information with the real world, displaying surgical target areas and operational suggestions in real time, thereby helping doctors perform surgical operations more intuitively and accurately.
[0004] Imaging plays an increasingly important role in the diagnosis and treatment of fractures. Imaging technologies such as X-rays and MRI have become routine means of fracture diagnosis. However, traditional two-dimensional image processing systems often cannot fully reflect the complexity of fractures, especially when dealing with complex or multiple fractures, and are unable to provide sufficient anatomical and functional information. Summary of the Invention
[0005] To address the shortcomings of existing technologies, the present invention aims to provide a three-dimensional imaging system for fracture-assisted surgery that integrates cross-modal contrastive learning and augmented reality, in order to solve the problems of incomplete image information and insufficient navigation accuracy in traditional fracture surgery mentioned in the background art.
[0006] To achieve the above objectives, the present invention provides the following technical solution: a three-dimensional imaging system for fracture-assisted surgery that integrates cross-modal comparative learning and augmented reality. The system executes according to the following steps: S1 Acquires and preprocesses multimodal image data of the fracture site; S2 Generates a target modal image based on the first modal image; S3 Fuses and corrects the information of the multimodal images; S4 Constructs a three-dimensional imaging model and combines it with augmented reality technology to realize surgical navigation; S5 Introduces an incremental learning module to optimize system performance.
[0007] As a further improvement of the present invention, the multimodal image data includes a first modal image and a target modal image, and the preprocessing includes grayscale normalization for automatic segmentation of the fracture region and elastic deformation processing and injection of modality-specific noise.
[0008] As a further improvement of the present invention, cross-modal generation of target modal imagery includes extracting deep features of the first modal imagery using a modal feature extraction model, constructing sample pairs for comparative learning training, and generating the target modal imagery based on a diffusion model.
[0009] As a further improvement of the present invention, multimodal image fusion and information correction includes constructing a generative adversarial network, using a feature point extraction algorithm to achieve nonlinear registration, and optimizing fusion accuracy through multiple loss functions.
[0010] As a further improvement of the present invention, the three-dimensional imaging model constructs a three-dimensional mesh through a three-dimensional reconstruction algorithm and performs smoothing processing. Augmented reality navigation includes spatial registration and tracking of surgical instruments through positioning sensors and providing deviation warnings.
[0011] As a further improvement of the present invention, the incremental learning module includes using a distribution difference metric to filter incremental data, updating the model through a knowledge distillation mechanism, and accumulating a preset number of surgical data to trigger incremental training.
[0012] As a further improvement of the present invention, the system for implementing any one of the above-described systems includes an image acquisition and preprocessing module, a cross-modal generation module, a fusion correction module, a three-dimensional imaging and navigation module, and an incremental learning module.
[0013] As a further improvement of the present invention, it includes a processor and a memory, the memory storing computer-readable instructions that, when executed by the processor, implement the system described in any of the preceding claims.
[0014] This solution also provides an electronic device, including a memory, a processor, and a computer program stored on and executable on the memory, wherein the processor executes the program to implement the fracture-assisted surgery three-dimensional imaging system that integrates cross-modal contrastive learning and augmented reality as described in any of the above improvements.
[0015] This solution also provides a storage medium on which a computer program is stored, which, when executed by a processor, implements the fracture-assisted surgery three-dimensional imaging system that integrates cross-modal contrastive learning and augmented reality as described in any of the above improvements.
[0016] Compared with the prior art, the three-dimensional imaging system for fracture-assisted surgery that integrates cross-modal contrastive learning and augmented reality provided by the present invention has at least the following beneficial effects: (1) Solving the problem of incomplete information in a single image modality: The present invention generates a target modality image (such as MRI) by using cross-modal contrastive learning and diffusion generation model when only the first modality image (such as X-ray) is acquired. Without increasing the patient's additional examination burden, it supplements the soft tissue structure information, thereby overcoming the problem of insufficient soft tissue display caused by relying on a single image modality in the prior art; (2) Improving the spatial consistency and imaging accuracy of multimodal image fusion: By introducing nonlinear registration and multi-loss joint optimization mechanism, the differences in spatial scale, imaging angle and structural morphology of different modal images are effectively reduced, avoiding the traditional fusion method (2) Improve the imaging accuracy of the fracture area and its adjacent tissues by addressing structural misalignment and information distortion problems that are prone to occur in the surgical procedure; (3) Improve the navigation accuracy and safety of complex fracture surgery: This invention combines augmented reality technology to spatially register the three-dimensional reconstruction model with the real surgical scene and track the position of surgical instruments in real time. When the operation deviation exceeds the preset threshold, an automatic warning is triggered, which effectively reduces the reliance on human experience and improves the operation accuracy and safety in complex or multiple fracture surgery; (4) Have continuous optimization capabilities and reduce the risk of system performance degradation: Through incremental learning and knowledge distillation mechanisms, the system can maintain the performance of the original model while continuously introducing new surgical data, avoiding the problem of catastrophic forgetting when traditional models are retrained, thereby improving the stability and adaptability of the system in long-term clinical applications. Attached Figure Description
[0017] Figure 1 This is a diagram illustrating the overall architecture of the 3D imaging system for fracture-assisted surgery that integrates cross-modal contrastive learning and augmented reality, as described in this invention. Figure 2 This is a flowchart illustrating the cross-modal generation of MRI images based on X-ray images according to the present invention. Detailed Implementation
[0018] As shown in the figure, an embodiment of the present invention provides a three-dimensional imaging system for fracture-assisted surgery that integrates cross-modal contrastive learning and augmented reality to achieve the above objectives. In implementation, this solution... Acquiring and preprocessing multimodal imaging data of the fracture site: X-ray images (16-bit grayscale format) and MRI images (32-bit DICOM format) of the patient's fracture site were acquired, and key anatomical areas such as fracture lines, ligament tears, and muscle injuries were marked by an orthopedic surgeon. Preprocessing procedures included: (1) Gray-level normalization: Convert the gray-level values of the original image to the range [0, 255], using the following formula: (1) in Coordinates in the original image grayscale value at that location , These represent the minimum and maximum grayscale values of the image, respectively. Normalized coordinates The coordinate values (ranging from 0 to 225) are used to unify the grayscale values of different modal images to the range of 8-bit images through linear transformation, eliminating the grayscale scale differences between modalities and facilitating subsequent feature extraction. (2) The pre-trained U-Net model is used to automatically segment the fracture area, and the cropping range is extended to 20 pixels outside the target boundary to preserve key anatomical structures; (3) The image is subjected to elastic deformation processing (deformation coefficient 0.1-0.3), and modality-specific noise is injected at the same time. The noise variance is distributed according to formula (2). (2) in, The total number of diffusion steps is set to 1000 in this experiment. This represents the current diffusion step number, with a value between 0 and T. For the first noise variance of the step The initial noise variance is set to 0.0001. The final noise variance is set to 0.02. The linearly increasing noise variance allows the diffusion model to learn progressively to recover image structures from noise during training. 1000 iterations ensure the detail accuracy of the generated MRI.
[0019] Specifically, the process of generating MRI images across modalities based on X-ray images includes the following steps: (1) Feature extraction: The improved ResNet-50 encoder was used to extract deep features of X-rays, and the 2048-dimensional feature vector output by the fourth stage was retained. The focus was on extracting the fracture line gradient, cortical bone thickness distribution and joint space geometric parameters. (2) Comparative learning training: including the following two parts: Construct sample pairs: positive samples are X-ray and MRI images of the same patient after rigid transformation, and negative samples are images of the same site from different patients or images of the same patient in different positions; Calculate the contrastive loss: The loss function formula is (3), where X-ray eigenvectors With MRI feature vectors cosine similarity, For the total number of sample pairs, For the first The eigenvectors of the fake. For temperature parameters, the quantitative value is 0.07. To compare the loss values, the temperature parameter of 0.07 controls the steepness of the similarity distribution, making the model pay more attention to highly similar sample pairs and enhancing the consistent learning of cross-modal features; (3) (3) MRI generation: Based on the diffusion model, corresponding MRI images are generated using X-ray features as conditions. The mode conversion accuracy is optimized through multiple rounds of iteration to ensure that the generated images retain key soft tissue features. Specifically, the process of fusing and correcting information in multimodal images includes the following steps: (1) Generative Adversarial Network Construction: including a generator, which adopts a conditional U-Net structure, with X-ray features and anatomical prior masks as inputs and outputs 128×128×3 MRI images; and a discriminator, which adopts a PatchGAN structure, containing 4 layers of 4×4 convolutional kernels to output 30×30 true and false probability maps. (2) Image registration: The SIFT algorithm is used to extract the first 500 high-response feature points, and nonlinear registration is achieved by thin plate spline interpolation. The objective function is (4). (4) in, Transformation function for feature points The mapping results The coordinates of the target feature point, The regularization parameter has a value of 0.1. To account for the registration loss, a regularization parameter of 0.1 is used to balance feature point alignment accuracy and transformation smoothness, avoid overfitting local noise, and ensure global spatial consistency between X-ray and MRI images.
[0020] (3) Optimization of multiple loss functions, including: Content Loss: used to measure the structural similarity between the generated image and the real MRI image; Adversarial Loss: through adversarial training of GAN, optimize the realism of the generated image; Perceptual Loss: compare the generated image with the real image in a high-level feature space to ensure the preservation of details; Registration Loss: optimize the spatial alignment of X-ray image and MRI image to reduce geometric distortion.
[0021] Specifically, the process of constructing a 3D imaging model and combining it with AR technology to achieve surgical navigation includes the following steps: (1) 3D reconstruction: The MarchingCubes algorithm is used to construct a 3D mesh with a voxel resolution of 0.5mm×0.5mm×0.5mm. After Laplacian smoothing, the texture mapping of the fused image is achieved by bilinear interpolation. (2) AR spatial registration: AprilTag visual markers are used, with a recognition speed of ≥30fps and a positioning error of <0.5mm. The camera extrinsic parameters are then calculated using the PnP algorithm, and Kalman filtering is used to compensate for display delay. (3) Real-time navigation: An electromagnetic positioning sensor (accuracy 0.1mm, sampling rate 100Hz) is used to track the position of the surgical instruments and calculate the position deviation. When the position deviation is greater than 2mm, the AR interface triggers a red flashing warning. Specifically, the process of introducing the incremental learning module includes the following steps: (1) Incremental data screening: After each surgery is completed, the system automatically collects intraoperative images and postoperative validation data, and uses KL divergence to measure the difference between the distribution of the new data and the training set: (5) in The probability distribution of the newly collected data. The probability distribution of the original training set. As a measure of distribution difference, a threshold of 0.6 is used to filter incremental data with similar distributions, avoiding the introduction of noisy samples that could affect model stability.
[0022] (2) Model update mechanism: Knowledge distillation is used to retain the previous model as the teacher model, and the distillation loss is... in These are the generators before and after the update, respectively. T is a fixed value of 4, representing the distillation temperature. For the i-th input sample, temperature 4 controls the smoothness of the softmax output, enabling the new model to better learn the probability distribution characteristics of the teacher model.
[0023] Training scheduling: Incremental training is triggered every 50 surgical data points. A learning rate warm-up strategy is adopted, with 50 training rounds to ensure that the model's accuracy improves by ≥5% on new data.
[0024] In one embodiment, the fracture-assisted surgical three-dimensional imaging system of the present invention, which integrates cross-modal contrastive learning and augmented reality, can be deployed in an electronic device. The electronic device includes a processor and a memory, wherein the memory is a non-transitory computer-readable storage medium storing a computer program. When the computer program is loaded and executed by the processor, it performs the following functions: Acquire multimodal image data of the fracture site and perform preprocessing operations; Based on the features of the first modality image, target modality image data is generated through a cross-modality contrastive learning model; The first modal image and the target modal image are fused and information correction is performed. A three-dimensional imaging model of the fracture area is constructed, and augmented reality technology is used to achieve surgical navigation; After acquiring new surgical data, an incremental learning mechanism is introduced to update the model parameters in order to optimize the overall system performance.
[0025] Thus, when the electronic device executes the computer program, it is able to realize the fracture-assisted surgery three-dimensional imaging system that integrates cross-modal contrastive learning and augmented reality as described in any one of claims 1 to 8.
[0026] The above are merely preferred embodiments of the present invention. The scope of protection of the present invention is not limited to the above embodiments. All technical solutions falling within the scope of the present invention's concept are within the scope of protection of the present invention. It should be noted that for those skilled in the art, any improvements and modifications made without departing from the principle of the present invention should also be considered within the scope of protection of the present invention.
Claims
1. A three-dimensional imaging system for fracture-assisted surgery integrating cross-modal contrastive learning and augmented reality, characterized in that, The system executes according to the following steps: S1 Acquires and preprocesses multimodal image data of the fracture site; S2 Generates target modal image based on the first modal image across modalities; S3 Fuses and corrects information of the multimodal images; S4 Constructs a three-dimensional imaging model and combines augmented reality technology to realize surgical navigation; S5 Introduces an incremental learning module to optimize system performance.
2. The three-dimensional imaging system for fracture-assisted surgery integrating cross-modal contrastive learning and augmented reality as described in claim 1, characterized in that, The multimodal image data includes a first modal image and a target modal image. Preprocessing includes grayscale normalization for automatic segmentation of the fracture region, elastic deformation processing, and injection of modality-specific noise.
3. The three-dimensional imaging system for fracture-assisted surgery integrating cross-modal contrastive learning and augmented reality as described in claim 1, characterized in that, The cross-modal generation of target modal images includes extracting deep features of the first modal image using a modal feature extraction model, constructing sample pairs for comparative learning training, and generating the target modal image based on a diffusion model.
4. The three-dimensional imaging system for fracture-assisted surgery integrating cross-modal contrastive learning and augmented reality as described in claim 1, characterized in that, Multimodal image fusion and information correction includes constructing a generative adversarial network, using a feature point extraction algorithm to achieve nonlinear registration, and optimizing fusion accuracy through multiple loss functions.
5. The three-dimensional imaging system for fracture-assisted surgery integrating cross-modal contrastive learning and augmented reality as described in claim 1, characterized in that, The 3D imaging model constructs a 3D mesh using a 3D reconstruction algorithm and performs smoothing. Augmented reality navigation includes spatial registration and tracking surgical instruments through positioning sensors, providing deviation alerts.
6. The three-dimensional imaging system for fracture-assisted surgery integrating cross-modal contrastive learning and augmented reality as described in claim 1, characterized in that, The incremental learning module includes using distribution difference measurement to filter incremental data, updating the model through knowledge distillation mechanism, and triggering incremental training after accumulating a preset number of surgical data.
7. The three-dimensional imaging system for fracture-assisted surgery integrating cross-modal contrastive learning and augmented reality as described in claim 1, characterized in that, It includes an image acquisition and preprocessing module, a cross-modal generation module, a fusion correction module, a 3D imaging and navigation module, and an incremental learning module.
8. The three-dimensional imaging system for fracture-assisted surgery integrating cross-modal contrastive learning and augmented reality as described in claim 1, characterized in that, It includes a processor and memory. The memory stores computer-readable instructions, which are implemented when executed by the processor.
9. An electronic device comprising a memory, a processor, and a computer program stored in and executable on the memory, characterized in that, When the processor executes the program, it implements the fracture-assisted surgery three-dimensional imaging system as described in any one of claims 1 to 8, which integrates cross-modal contrastive learning and augmented reality.
10. A storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the fracture-assisted surgery three-dimensional imaging system as described in any one of claims 1 to 8, which integrates cross-modal contrastive learning and augmented reality.