Medical image super-resolution system based on kernel prior and adaptive interpolation transformer

By using a medical image super-resolution system based on kernel prior and adaptive interpolation Transformer, the problems of low pathological segmentation accuracy and poor cross-modal adaptability in primary medical institutions are solved, and efficient sub-millimeter-level fine pathological structure reconstruction and real-time deployment are achieved.

CN122265035APending Publication Date: 2026-06-23JIAMUSI UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
JIAMUSI UNIVERSITY
Filing Date
2026-03-25
Publication Date
2026-06-23

Smart Images

  • Figure CN122265035A_ABST
    Figure CN122265035A_ABST
Patent Text Reader

Abstract

The application discloses a medical image super-resolution system based on kernel prior and adaptive interpolation Transformer, which comprises a data processing module, a feature processing module, a kernel prior dynamic weighting module, a cross-modal alignment module and an adaptive interpolation module; the data processing module is used for acquiring low-quality medical images to be processed and a multi-modal pathology-degradation joint annotation dataset; the feature processing module extracts pathological semantic features and generates modal-specific degradation kernel priors through double parallel sub-networks; then, the kernel prior dynamic weighting module is sequentially used for completing pathology risk-driven kernel prior weighting, the cross-modal alignment module is used for completing cross-modal feature alignment, a spatial adaptive interpolation kernel is generated through a Transformer module, and finally, a super-resolution high-definition image is output through closed-loop optimization. The application significantly improves the reconstruction accuracy of pathological microstructure, realizes cross-modal fine-tuning adaptation, and greatly reduces the deployment algorithm power threshold of the grassroots scene.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of medical image processing and artificial intelligence, and in particular to a medical image super-resolution system based on kernel prior and adaptive interpolation Transformer. Background Technology

[0002] Medical imaging is the basis for clinical disease diagnosis and treatment evaluation. CT, MRI, and ultrasound are the three most widely used modalities in clinical practice, covering the diagnostic scenarios of the vast majority of diseases. With the continuous advancement of my country's hierarchical medical system, primary healthcare institutions have become the front line for disease screening and chronic disease management. However, they generally suffer from outdated imaging equipment and low hardware configuration. The acquired images generally have defects such as insufficient resolution, high noise, and numerous artifacts, failing to clearly display sub-millimeter-level fine pathological structures such as myocardial microfibrosis and coronary artery microcalcification. This directly leads to low early detection rates and high rates of missed and misdiagnosed diseases at the primary level, becoming a bottleneck restricting the implementation of hierarchical medical systems.

[0003] Image super-resolution reconstruction technology can reconstruct low-resolution images into high-resolution images through algorithms, improving image quality without upgrading hardware equipment. It is the optimal path to solve the problem of insufficient image quality in primary healthcare. In recent years, super-resolution technology based on deep learning has made breakthrough progress. Models based on convolutional neural networks and Transformer architectures have achieved reconstruction accuracy far exceeding that of traditional algorithms in general image super-resolution tasks and have been gradually applied to the field of medical image super-resolution. However, existing technologies still have many insurmountable defects in clinical application and are subject to common technical biases in the industry, making them unsuitable for the multimodal, low-quality, and low-computing-power real clinical scenarios in primary healthcare institutions.

[0004] The industry generally regards degradation prior modeling and pathological semantic perception as independent technical branches, rigidly adopting a unidirectional serial architecture of first pathological segmentation and then region super-resolution. However, this ignores the fact that the degradation characteristics of low-quality images themselves severely restrict the accuracy of pathological segmentation. Low-precision segmentation results further amplify the structural errors of super-resolution reconstruction, forming a vicious cycle where the worse the segmentation, the worse the super-resolution, ultimately resulting in the reconstruction accuracy of pathological fine structures failing to meet clinical diagnostic requirements. At the same time, existing technologies have blind spots in their understanding of the degradation nature of medical images, only treating medical image degradation as a general image degradation for black-box fitting, without combining the imaging physical mechanisms of CT, MRI, and ultrasound modalities to build dedicated degradation prior models. This results in extremely poor cross-modal adaptability of the models, requiring readjustment or even retraining when switching between different modalities of images, making it impossible to achieve cross-modal adaptation without fine-tuning within a single framework. Summary of the Invention

[0005] This invention overcomes the shortcomings of the prior art and provides a medical image super-resolution system based on kernel prior and adaptive interpolation Transformer.

[0006] To achieve the above objectives, the technical solution adopted by the present invention is as follows: a medical image super-resolution system based on kernel prior and adaptive interpolation Transformer, comprising: a data processing module, a feature processing module, a kernel prior dynamic weighting module, a cross-modal alignment module, and an adaptive interpolation module;

[0007] The data processing module acquires low-quality medical images to be processed and constructs a multimodal pathology-degeneration joint annotation dataset covering CT, MRI, and ultrasound modalities, and annotating the physical mechanisms of modal degradation, pathological regions, and clinical diagnostic risk levels.

[0008] The feature processing module extracts the initial pathological semantic features of the low-quality medical images through the pathological semantic coding sub-network and generates the clinical risk level region division results; simultaneously, through the modality-specific degradation coding sub-network, based on the degradation kernel parameter space of the multimodal pathology-degradation joint annotation dataset, it extracts modality-specific degradation features and generates the initial modality-specific degradation kernel prior.

[0009] The nuclear prior dynamic weighting module is driven by the case risk level. Based on the clinical risk level regional division results, it assigns differentiated nuclear prior modeling weights and parameter dimensions to different regions, and generates a weighted modality-specific degenerate nuclear prior.

[0010] The cross-modal alignment module uses the initial modality-specific degenerate kernel prior as the domain alignment anchor point, integrates the modality-specific degenerate kernel prior with the initial pathological semantic features, completes feature distribution alignment, and then inputs it into the Transformer module to generate a spatial adaptive interpolation kernel.

[0011] The adaptive interpolation module generates upsampling features based on the spatial adaptive interpolation kernel, and inputs the upsampling features back into the pathological semantic coding sub-network to complete iterative optimization. Based on the optimized pathological semantic features, the spatial adaptive interpolation kernel is updated, and finally, super-resolution high-definition medical images are output.

[0012] In a preferred embodiment of the present invention, the process of constructing the multimodal pathology-degeneration joint annotation dataset includes:

[0013] S11. Obtain the original high-definition medical image set covering CT, MRI, and ultrasound modalities. Based on the degradation physics generation mechanism, perform degradation simulation processing on the original high-definition medical image set to generate a paired low-quality medical image set.

[0014] S12. For the paired low-quality medical image set and the original high-definition medical image set, complete the annotation of modal degradation physical mechanism, pathological region, and clinical diagnosis risk level. The clinical diagnosis risk level is divided into three levels: high-risk pathological area, medium-risk structural edge area, and low-risk normal tissue area.

[0015] S13. Perform data normalization and data augmentation on the labeled and paired low-quality medical image set and the original high-definition medical image set to generate a multimodal pathology-degradation joint labeled dataset.

[0016] In a preferred embodiment of the present invention, the process of generating the clinical risk level region division results includes:

[0017] S21. Input the low-quality medical image into the pathological semantic coding sub-network, and extract the multi-scale pathological semantic feature map of the low-quality medical image through the multi-layer convolutional coding unit of the pathological semantic coding sub-network.

[0018] S22. The multi-scale pathological semantic feature map is classified into pathological categories and risk levels pixel by pixel through the classification head unit of the pathological semantic coding sub-network, generating initial pathological semantic features and pixel by pixel clinical risk level masks.

[0019] S23. Based on the clinical risk level mask, complete the clinical risk level region division of the low-quality medical images and generate the clinical risk level region division result.

[0020] In a preferred embodiment of the present invention, the generation process of the initial mode-specific degenerate kernel prior includes:

[0021] S211. Construct a degenerate kernel parameter space that corresponds one-to-one with CT, MRI, and ultrasound modalities. The CT modal corresponds to the convolutional degeneration and motion artifact parameter space, the MRI modal corresponds to the k-space phase degeneration and magnetic field inhomogeneity noise parameter space, and the ultrasound modal corresponds to the multiplicative speckle noise parameter space.

[0022] S212. Identify the modality type of the low-quality medical image, call the degradation kernel parameter space that matches the modality type, and input the low-quality medical image into the modality-specific degradation coding subnetwork;

[0023] S213. Modality-specific degradation features of low-quality medical images are extracted through a modality-specific degradation coding sub-network, and an initial modality-specific degradation kernel prior is generated based on the matching degradation kernel parameter space fitting.

[0024] In a preferred embodiment of the present invention, the generation process of the mode-specific degenerate kernel prior includes:

[0025] S31. Receive the results of the clinical risk level region division and the initial modality-specific degenerate kernel prior. Through the pathological risk level-driven kernel prior dynamic weighting module, match the preset weight and parameter dimension configuration corresponding to each clinical risk level region.

[0026] S32. Assign a full-dimensional degenerative nuclear parameter space and a first modeling weight to high-risk pathological areas, assign a medium-dimensional degenerative nuclear parameter space and a second modeling weight to medium-risk structural edge areas, and assign a low-dimensional degenerative nuclear parameter space and a third modeling weight to low-risk normal tissue areas.

[0027] S33. Based on the assigned weights and parameter dimensions, the initial modality-specific degenerate kernel prior is subjected to regional-level weighting to generate a weighted modality-specific degenerate kernel prior.

[0028] In a preferred embodiment of the present invention, the process of feature distribution alignment performed by the cross-modal alignment module includes:

[0029] S41. Receive the initial pathological semantic features and the modality-specific degenerate kernel prior, and set the initial pathological semantic features as fixed anchor point features aligned across modal domains;

[0030] S42. Extract the common feature space of pathological semantic features corresponding to different modal images, and map the modality-specific degeneration kernel prior to the common feature space to obtain the degeneration kernel prior features;

[0031] S43. Deeply fuse the degenerated nuclear prior features with the initial pathological semantic features at the channel dimension to generate a fused feature map with complete feature distribution alignment.

[0032] In a preferred embodiment of the present invention, the process by which the Transformer module generates the adaptive interpolation kernel includes:

[0033] S51. Receive the fused feature map with completed feature distribution alignment and input it into the Transformer module. Based on the clinical risk level region division results, dynamically adjust the channel attention weights of the module for different regions.

[0034] S52. A strong channel attention mechanism is enabled for high-risk pathological areas. Multi-scale local structural features of high-risk pathological areas are captured by multi-scale convolutional units in the Transformer module, and long-distance dependency features are captured.

[0035] S53. The captured multi-scale local structural features and long-distance dependent features are fused to generate spatial adaptive interpolation kernel parameters that match the pathological attributes and degradation characteristics of each region, and the spatial adaptive interpolation kernel is output.

[0036] In a preferred embodiment of the present invention, the process of iteratively optimizing the spatial adaptive interpolation kernel through the adaptive interpolation module includes:

[0037] S61. Based on the spatial adaptive interpolation kernel, the initial feature map of the low-quality medical image is upsampled to generate upsampled features of corresponding resolution.

[0038] S62. The upsampling features are input back into the pathological semantic coding sub-network. The optimized pathological semantic features are extracted again through the pathological semantic coding sub-network to generate an optimized clinical risk level mask and clinical risk level region division results.

[0039] S63. The optimized pathological semantic features and the results of the clinical risk level region division are fed back to the pathological risk level-driven kernel prior dynamic weighting module to iteratively update the parameters of the spatial adaptive interpolation kernel.

[0040] In a preferred embodiment of the present invention, the process of the adaptive interpolation module outputting the super-resolution high-definition medical image includes:

[0041] S611. Receive the iteratively optimized spatial adaptive interpolation kernel and the multi-scale feature map of the low-quality medical image, and input them into the structure-aware upsampling module;

[0042] S612. Through the structure-aware upsampling module, based on the iteratively optimized spatial adaptive interpolation kernel, the multi-scale feature map is subjected to regionally differentiated interpolation upsampling processing. A structure-preserving interpolation strategy is executed for high-risk pathological areas and medium-risk structural edge areas, and a fast interpolation strategy is executed for low-risk normal tissue areas.

[0043] S613. Perform feature fusion and pixel reconstruction on the upsampled feature map to generate a super-resolution high-definition medical image that matches the preset super-resolution ratio, and complete the image output.

[0044] In a preferred embodiment of the present invention, the medical image super-resolution system further includes an end-side adaptation module;

[0045] The edge-side adaptation module obtains the hardware computing power parameters of the deployment device, matches the model lightweight configuration corresponding to the computing power parameters based on the clinical risk level region division results, and completes the system edge-side compilation based on the model lightweight configuration to generate an executable file adapted to the hardware computing power of the deployment device.

[0046] This invention addresses the shortcomings of the prior art and has the following beneficial effects:

[0047] (1) This invention breaks through the industry’s fixed technical bias of segmentation first and super-resolution later, and constructs a bidirectional closed-loop joint modeling architecture with dual constraints of pathological semantics and degenerative physical mechanism. The upsampled features are input into the pathological semantic encoding sub-network in reverse, and the accuracy of pathological semantic extraction and degenerative kernel prior modeling are iteratively optimized. This solves the error amplification problem of unidirectional serial architecture from the root, and greatly improves the reconstruction fidelity of pathological fine structure.

[0048] (2) This invention combines the imaging physical mechanisms of three major modalities: CT, MRI, and ultrasound, and constructs a modality-specific degenerate kernel parameter space. At the same time, it uses pathological semantic features as cross-modal domain alignment anchors for feature fusion, and utilizes the cross-modal invariance of pathological semantic features to eliminate feature domain shifts in different modalities, thus solving the problem of poor cross-modal adaptability in existing technologies.

[0049] (3) Based on the priority of clinical diagnosis, this invention assigns differentiated nuclear prior modeling dimensions, interpolation strategies and attention weights to different regions. It enables full-dimensional modeling and strong attention mechanism for high-risk pathological areas and adopts lightweight modeling and low-intensity attention for low-risk normal tissue areas. Under the premise of ensuring the reconstruction accuracy of key pathological areas, it significantly reduces the computational redundancy of the model.

[0050] (4) This invention constructs a dynamic window division for pathological perception and a self-attention computing mechanism with pathological constraints. It can dynamically adjust the window size and attention weight based on the clinical risk level, which not only enhances the ability to capture local features of sub-millimeter-level fine pathological structures, but also realizes accurate modeling of global long-distance dependence. At the same time, with the dynamic pruning and hierarchical quantization scheme of the end-side adaptation module, the model can be deployed in real time on the edge of ordinary desktop computers and portable imaging equipment at the grassroots level, breaking the existing model's computing power deployment barrier. Attached Figure Description

[0051] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0052] Figure 1 This is a system architecture diagram of the present invention;

[0053] Figure 2 This is a comparison chart of the super-resolution reconstruction effect of the system of the present invention on cardiac MRI images;

[0054] Figure 3 This is a comparison chart of the super-resolution reconstruction effect of the system of the present invention on lung CT images;

[0055] Figure 4 This is a comparison image of the super-resolution reconstruction effect of the system of the present invention on brain CT images. Detailed Implementation

[0056] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0057] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein. Therefore, the scope of protection of the invention is not limited to the specific embodiments disclosed below.

[0058] This system runs on an electronic device equipped with a processor, memory, and medical image data interface. The memory stores computer programs that implement the functions of each module of the system, and the processor executes the computer programs to complete the super-resolution reconstruction processing of medical images. The overall system architecture includes a data processing module, a feature processing module, a kernel prior dynamic weighting module, a cross-modal alignment module, an adaptive interpolation module, and an end-to-end adaptation module. Each module forms an end-to-end closed-loop processing link through data flow. After inputting low-quality medical images to be processed, the system sequentially completes data preprocessing, feature extraction, kernel prior modeling, cross-modal feature alignment, adaptive interpolation upsampling, and iterative optimization, and finally outputs super-resolution high-definition medical images that meet clinical diagnostic standards.

[0059] The data processing module acquires low-quality medical images to be processed and constructs a multimodal pathology-degeneration joint annotation dataset covering CT, MRI, and ultrasound modalities, and annotating the physical mechanisms of modal degradation, pathological regions, and clinical diagnostic risk levels.

[0060] Furthermore, the construction process of the multimodal pathology-degradation joint annotation dataset includes: S11, obtaining the original high-definition medical image set covering CT, MRI, and ultrasound modalities, and performing degradation simulation processing on the original high-definition medical image set based on the degradation physics generation mechanism to generate a paired low-quality medical image set;

[0061] S12. For the paired low-quality medical image set and the original high-definition medical image set, complete the annotation of modal degradation physical mechanism, pathological region, and clinical diagnosis risk level. Among them, the clinical diagnosis risk level is divided into three levels: high-risk pathological area, medium-risk structural edge area, and low-risk normal tissue area.

[0062] S13. Perform data normalization and data augmentation on the labeled and paired low-quality medical image set and the original high-definition medical image set to generate a multimodal pathology-degradation joint labeled dataset.

[0063] In the specific implementation of constructing a multimodal pathology-degradation joint annotation dataset, the data processing module first acquires the original high-definition medical image set covering CT, MRI, and ultrasound modalities. Based on the degradation physics generation mechanism specific to each modality, the original high-definition medical image set is subjected to degradation simulation processing to generate a low-quality medical image set that is paired one-to-one with the original high-definition images.

[0064] The degradation simulation process for different modalities follows the imaging physics principles to construct mathematical models: For CT images, the degradation model is as follows: In the formula, Low-quality CT images generated for degradation simulation, Original high-resolution CT images, This is a two-dimensional convolution operation. For point spread function convolution kernels in CT imaging, The Gaussian noise matrix is... The motion artifact matrix caused by the patient's respiratory movements.

[0065] For MRI images, the degradation process is modeled based on the k-space acquisition mechanism, and the formula is: In the formula, k-space data corresponding to low-quality MRI images generated for degradation simulation. k-space data for high-definition images, This represents the phase perturbation matrix caused by magnetic field inhomogeneity. The low-quality MRI image is obtained by inverse Fourier transform of the k-space Gaussian noise matrix.

[0066] For ultrasound images, the degradation into multiplicative speckle noise is expressed by the following formula: In the formula, Low-quality ultrasound images generated for degradation simulation. Original high-definition ultrasound images, For multiplicative speckle noise that follows a Rayleigh distribution, It is additive Gaussian noise.

[0067] After completing the degradation simulation, the data processing module performs modal degradation physical mechanism annotation, pathological region annotation, and clinical diagnostic risk level annotation on the paired low-quality medical image set and the original high-definition medical image set. The clinical diagnostic risk level annotation is divided into three levels: high-risk pathological area, medium-risk structural edge area, and low-risk normal tissue area. The annotation process uses a pixel-by-pixel annotation method to generate corresponding masks. Finally, the data processing module performs data normalization and data augmentation on the annotated paired image set to generate a multimodal pathology-degradation joint annotation dataset.

[0068] The normalization method used is the min-max normalization method, and the formula is: In the formula, For normalized medical images, For raw medical images that are yet to be normalized, This represents the maximum pixel value of the medical image. This is the minimum pixel value of the medical image. Map the image pixel values ​​to... In the interval, data augmentation uses random horizontal flipping, random rotation, and elastic deformation to expand the dataset size and improve the model's generalization ability.

[0069] The feature processing module extracts the initial pathological semantic features of low-quality medical images through the pathological semantic coding sub-network and generates the clinical risk level region division results. Simultaneously, through the modality-specific degradation coding sub-network, based on the degradation kernel parameter space of the multimodal pathology-degradation joint annotation dataset, it extracts modality-specific degradation features and generates the initial modality-specific degradation kernel prior.

[0070] Furthermore, the process of generating the clinical risk level region classification results includes:

[0071] S21. Input the low-quality medical images into the pathological semantic coding sub-network, and extract the multi-scale pathological semantic feature map of the low-quality medical images through the multi-layer convolutional coding unit of the pathological semantic coding sub-network.

[0072] S22. Through the classification head unit of the pathological semantic coding sub-network, the pathological category and risk level are classified pixel by pixel in the multi-scale pathological semantic feature map to generate the initial pathological semantic features and the pixel by pixel clinical risk level mask.

[0073] S23. Based on the clinical risk level mask, complete the clinical risk level region division of low-quality medical images and generate the clinical risk level region division results.

[0074] Furthermore, the generation process of the initial mode-specific degenerate kernel prior includes:

[0075] S211. Construct a degenerate kernel parameter space that corresponds one-to-one with CT, MRI, and ultrasound modalities. The CT modal corresponds to the convolutional degeneration and motion artifact parameter space, the MRI modal corresponds to the k-space phase degeneration and magnetic field inhomogeneity noise parameter space, and the ultrasound modal corresponds to the multiplicative speckle noise parameter space.

[0076] S212. Identify the modality type of low-quality medical images, call the degradation kernel parameter space that matches the modality type, and input the low-quality medical images into the modality-specific degradation coding subnetwork.

[0077] S213. Modality-specific degradation features of low-quality medical images are extracted through a modality-specific degradation coding sub-network, and an initial modality-specific degradation kernel prior is generated based on the matching degradation kernel parameter space fitting.

[0078] In the specific implementation of the pathological semantic coding sub-network, the sub-network adopts a lightweight U-Net variant architecture. The encoder is composed of multiple convolutional coding units cascaded together. Each convolutional coding unit includes a 3×3 two-dimensional convolutional layer, a batch normalization layer and a ReLU activation function in sequence.

[0079] After inputting low-quality medical images, multi-scale pathological semantic features are extracted through multi-layer convolutional coding units. The formula for calculating the output feature map of a layer convolutional coding unit is: In the formula, For the first The multi-scale pathological semantic feature map output by the layer convolutional coding unit is the final output result after the current layer convolution operation and nonlinear activation. The ReLU (Rectified Linear Unit) activation function is used to introduce a nonlinear mapping into the feature extraction process, fitting the complex distribution of pathological features in medical images. Its mathematical expression is: , The output value of the convolution operation; End-side adaptation module This is a 2D convolution operation with a kernel size of 3×3, used to extract local pathological semantic features from the input feature map; it is the feature extraction operator for the encoding unit; End-side adaptation module. For the first The feature map output by the convolutional coding unit of the current layer, i.e., the feature map of the current layer. Input feature map of the convolutional coding unit; edge adaptation module For the first The learnable bias term corresponding to the layer convolutional coding unit.

[0080] The decoder of the pathological semantic coding sub-network fuses the multi-scale features of the encoder through upsampling layers and skip connections. Finally, it performs pixel-by-pixel classification of the pathological semantic feature map and risk level through a classification head unit composed of 1×1 convolutions, outputting a three-dimensional probability vector for each pixel location. In the formula, The two-dimensional coordinates of a pixel in a medical image, with the x-axis being... The vertical axis is It covers the entire pixel range of the input image; coordinates The three-dimensional probability vector of the clinical risk level corresponding to each pixel, with the vector dimension corresponding one-to-one with the number of preset risk level categories; coordinates The predicted probability that a pixel belongs to a high-risk pathological area ranges from [value range missing]. ; coordinates The predicted probability that a pixel belongs to the edge region of a medium-risk structure has a value range of [value range missing]. ; coordinates The predicted probability that a pixel belongs to a low-risk normal tissue area ranges from [value range missing]. .

[0081] through The function obtains pixel-by-pixel risk level labels and generates a clinical risk level mask. Based on the clinical risk level mask, the system completes the clinical risk level region segmentation of the low-quality medical images, generates the clinical risk level region segmentation results, and outputs the initial pathological semantic features; wherein: This is the maximum value indexing function, used to output the risk level label corresponding to the maximum value in the three-dimensional probability vector. That is, for each pixel, it takes... , , The level corresponding to the highest value in the middle is used as the final risk level label for that pixel; The clinical risk level mask is a two-dimensional matrix that is exactly the same size as the input low-quality medical image, where each coordinate in the matrix... The element value at that location is the final risk level label corresponding to that pixel, used to achieve precise regional division of the image.

[0082] In the specific implementation of the modality-specific degradation coding subnetwork, a degradation kernel parameter space corresponding one-to-one with CT, MRI, and ultrasound modalities is first pre-constructed. Specifically, the CT modality corresponds to the convolution degradation and motion artifact parameter space. Dimensions are set to MRI modal corresponding k-space phase degradation and magnetic field inhomogeneity noise parameter space The multiplicative speckle noise parameter space corresponding to the ultrasonic mode Each parameter space is pre-constructed and parameterized based on the annotation results of the multimodal pathology-degeneration joint annotation dataset.

[0083] Based on the above, a further explanation is as follows: A dedicated degradation kernel parameter space for CT modalities, used to cover the set of degradation kernel parameters corresponding to all typical degradation types in CT images; The spatial size of the degradation kernel is in pixels, covering three common sizes: 3×3, 5×5, and 7×7, to adapt to different degrees of spatial resolution degradation. This represents the total number of typical degradation types in CT images, covering three categories: convolutional blur, Gaussian noise, and motion artifacts. It can be expanded based on clinical data. A degenerate kernel parameter space specific to MRI modalities, used to cover the parameter set corresponding to typical degenerate types in MRI images such as k-space phase degradation, magnetic field inhomogeneity noise, and motion artifacts; This is a dedicated degradation kernel parameter space for ultrasound modes, used to cover the parameter set corresponding to typical degradation types in ultrasound images, such as multiplicative speckle noise, edge blurring, and acoustic artifacts.

[0084] During operation, the modality-specific degradation encoding subnetwork identifies the modality type of the low-quality medical image through a pre-trained lightweight modality classifier, calls the degradation kernel parameter space matching the modality type, and inputs the low-quality medical image into a degradation feature encoder composed of multi-scale convolutions to extract the modality-specific degradation features of the input image. Then, the initial mode-specific degenerate kernel prior is generated by fitting through a fully connected layer. .

[0085] Based on the above, further explanation is as follows: The modality-specific degradation features extracted from the input low-quality medical images are high-dimensional feature vectors that characterize the type and degree of image degradation, and are extracted by a multi-scale convolutional degradation feature encoder. The initial modality-specific degradation kernel prior is the initial degradation kernel parameter matrix that matches the modality and degradation characteristics of the input image, used to characterize the basic degradation laws of the input image.

[0086] The fitting process uses least squares optimization, and the objective function is:

[0087] ;

[0088] In the formula, The minimum value solver is used to find the minimum value in... Within the range of values ​​of , solve for the objective function Optimization variables that achieve minimum value ; Let be the degenerate kernel parameter matrix to be optimized, and be the optimization variables of the objective function; The dedicated degenerate kernel parameter space for the modality corresponding to low-quality input medical images, i.e., the optimization variables. The range of valid values; The square operation of the L2 norm is used to measure the pixel-level difference between two matrices, for a two-dimensional matrix. Its mathematical expression is , These are the two-dimensional coordinates of the pixels within the matrix; To and One-to-one matching of low-quality medical images, i.e., low-quality images to be fitted generated by degraded simulation; It serves as the original high-definition medical image and the benchmark gold standard image for degradation simulation; This is a two-dimensional convolution operation used to simulate the physical imaging process of generating low-quality images from high-definition images after being processed by a degradation kernel. By optimizing the solution of this objective function, it is ensured that the generated initial kernel prior accurately matches the degradation characteristics of the input image.

[0089] The dynamic weighted nuclear prior module is driven by case risk level. Based on the results of clinical risk level regional division, it assigns differentiated nuclear prior modeling weights and parameter dimensions to different regions, generating a weighted modality-specific degenerate nuclear prior.

[0090] Furthermore, the generation process of the modality-specific degenerate kernel prior includes:

[0091] S31. Receive the results of the clinical risk level region division and the initial modality-specific degenerate kernel prior. Through the pathological risk level-driven kernel prior dynamic weighting module, match the preset weight and parameter dimension configuration corresponding to each clinical risk level region.

[0092] S32. Assign a full-dimensional degenerative nuclear parameter space and a first modeling weight to high-risk pathological areas, assign a medium-dimensional degenerative nuclear parameter space and a second modeling weight to medium-risk structural edge areas, and assign a low-dimensional degenerative nuclear parameter space and a third modeling weight to low-risk normal tissue areas.

[0093] S33. Based on the assigned weights and parameter dimensions, the initial modality-specific degenerate kernel prior is subjected to regional-level weighting to generate a weighted modality-specific degenerate kernel prior.

[0094] In practice, the kernel prior dynamic weighting module receives the clinical risk level region division results output by the feature processing module and the initial modality-specific degenerate kernel prior. First, it matches the preset weights and parameter dimension configurations corresponding to each clinical risk level region. High-risk pathological regions correspond to the full-dimensional degenerate kernel parameter space and the first modeling weight. The marginal region of the medium-risk structure corresponds to the medium-dimensional degenerate kernel parameter space and the second modeling weights. Low-risk normal tissue regions correspond to low-dimensional degenerate kernel parameter spaces and third modeling weights. The parameter dimensions and weight coefficients have a linear relationship, with the medium dimension accounting for 60% of the total dimensions and the low dimension accounting for 30% of the total dimensions.

[0095] A pixel-by-pixel weight matrix is ​​generated based on the clinical risk level mask, using the following formula: ;

[0096] In the formula, Coordinates in the image The kernel prior modeling weight value corresponding to each pixel is a two-dimensional weight matrix that is completely consistent with the size of the input image. The first modeling weight corresponding to the preset high-risk pathological area is fixed at 1.0, which is the highest priority weight; The second modeling weight corresponding to the pre-defined medium-risk structural edge zone is fixed at 0.6, which is a medium-priority weight. The third modeling weight corresponding to the preset low-risk normal tissue area is fixed at 0.3, which is the lowest priority weight; A binary mask for high-risk pathological areas in coordinates The value at the location is 1 for pixels in the target area (high-risk pathological area) and 0 for pixels in the background area; A binary mask for the edge region of a medium-risk structure in coordinates The value at the location is 1 for pixels within the target area (the edge area of ​​the medium-risk structure) and 0 for pixels in the background area; A binary mask for low-risk normal tissue regions at coordinates The value is set to 1 for pixels in the target area (low-risk normal tissue area) and 0 for pixels in the background area.

[0097] The initial modality-specific degradation kernel prior is weighted at the region level based on a pixel-wise weight matrix to generate a weighted modality-specific degradation kernel prior, as shown in the formula: In the formula, coordinates The weighted modality-specific degradation kernel prior value corresponding to the pixel is the final output region weighted degradation kernel parameter matrix; Coordinates generated by the above formula The pixel-by-pixel kernel prior modeling weight values; Coordinates output by the feature processing module The initial mode-specific degradation kernel prior value corresponding to the pixel.

[0098] Simultaneously, dynamic dimensionality adjustment is performed on the kernel parameter channels in different risk areas. In low-risk areas, the parameter dimensionality is reduced through channel pruning to reduce computational overhead, while in high-risk areas, all-dimensional parameters are retained to ensure the modeling accuracy of degenerate priors.

[0099] The cross-modal alignment module uses the initial modality-specific degenerate kernel prior as the domain alignment anchor point, integrates the modality-specific degenerate kernel prior with the initial pathological semantic features, and inputs the feature distribution alignment into the Transformer module to generate a spatial adaptive interpolation kernel.

[0100] Furthermore, the cross-modal alignment module performs feature distribution alignment as follows:

[0101] S41. Receive the initial pathological semantic features and the modality-specific degenerate kernel prior, and set the initial pathological semantic features as fixed anchor features aligned across modal domains;

[0102] S42. Extract the common feature space of pathological semantic features corresponding to different modal images, and map the modality-specific degenerate kernel prior to the common feature space to obtain the degenerate kernel prior features;

[0103] S43. Deeply fuse the degenerated nuclear prior features with the initial pathological semantic features at the channel dimension to generate a fused feature map with complete feature distribution alignment.

[0104] Furthermore, the process by which the Transformer module generates an adaptive interpolation kernel includes:

[0105] S51. Receive the fused feature map with completed feature distribution alignment and input it into the Transformer module. Based on the clinical risk level region division results, dynamically adjust the channel attention weights of the module for different regions.

[0106] S52. A strong channel attention mechanism is enabled for high-risk pathological areas. Multi-scale local structural features of high-risk pathological areas are captured by multi-scale convolutional units in the Transformer module, and long-distance dependency features are captured.

[0107] S53. The captured multi-scale local structural features and long-distance dependent features are fused to generate spatial adaptive interpolation kernel parameters that match the pathological attributes and degradation characteristics of each region, and the spatial adaptive interpolation kernel is output.

[0108] In practice, the cross-modal alignment module receives the initial pathological semantic features. Compared with the weighted modality-specific degenerate kernel prior The initial pathological semantic features are set as fixed anchor features aligned across modal domains. The principle is that the semantic features of pathological tissues in different modal medical images have cross-modal invariance. For example, the semantic features of tissues such as myocardium and blood vessels are consistent in CT, MRI and ultrasound images. Using these as anchors can effectively eliminate feature domain offsets between different modalities.

[0109] A common feature space for pathological semantic features corresponding to different modalities of images is extracted, and feature distribution alignment is achieved by minimizing the maximum mean difference (MMD). The MMD loss function is: ;

[0110] In the formula, The maximum mean difference loss value for cross-modal alignment is used to measure the distribution difference of pathological semantic features in different modalities. The smaller the value, the higher the degree of alignment of feature distributions. The square operation of the L2 norm is used to measure the Euclidean distance between two eigenvectors, for the feature matrix. Its mathematical expression is , These are the coordinates of the elements of the characteristic matrix; The mathematical expectation (mean) operation for features is used to calculate the average of all elements in the feature matrix, representing the overall distribution center of the features; This is the pathological semantic feature matrix corresponding to CT modal medical images; This is the pathological semantic feature matrix corresponding to MRI modal medical images; This is the pathological semantic feature matrix corresponding to ultrasound modal medical images.

[0111] Then, the degenerated kernel prior features and the initial pathological semantic features are deeply fused along the channel dimension to generate a fused feature map with complete feature distribution alignment: In the formula, This is to complete the fused feature map after feature distribution alignment and deep fusion, which serves as the input data for the subsequent Transformer module; This is a channel-dimensional feature concatenation operation used to merge two feature maps of the same size but different numbers of channels along the channel dimension without changing the spatial size of the feature maps. is the initial pathological semantic feature matrix, and is the fixed anchor point feature aligned across modal domains; It is a feature space mapping function used to map the weighted modality-specific degenerate kernel prior to the common feature space of different modal pathological semantic features, thereby eliminating cross-modal feature domain offset; This is the weighted modality-specific degenerate kernel prior matrix.

[0112] After feature fusion is completed, the fused feature map is input into the pathology perception-enhanced Transformer module. Based on the clinical risk level region segmentation results, the Transformer module dynamically adjusts the channel attention weights for different regions. The formula for calculating the attention weights of the ECA channel is as follows:

[0113] In the formula, This is the ECA channel attention weight vector, with the same dimension as the number of channels in the fused feature map, used to assign differentiated attention weights to different feature channels; The sigmoid activation function is used to map attention weights to... The interval is used to normalize the channel weights, and its mathematical expression is: ; This is the output value of a one-dimensional convolution operation; The kernel size is One-dimensional convolution operations are used to capture local dependencies between feature channels. The kernel size in this scheme is... The number of feature channels is determined adaptively. This is a global average pooling operation used to globally average the two-dimensional feature maps across their spatial dimensions. It compresses the two-dimensional feature map of each channel into a single value, preserving the global feature information of each channel. For a size of... The feature map, after the GAP operation, has an output size of [size missing]. eigenvectors, where For feature map height, For feature map width, This refers to the number of channels in the feature map. The input is the fused feature map.

[0114] The attention weight for high-risk pathological areas is multiplied by an amplification factor of 2.0 to enable a strong channel attention mechanism, while the original weight is retained for medium-risk areas, and the attention intensity for low-risk areas is reduced by multiplying the weight by a factor of 0.5.

[0115] Multi-scale local structural features of high-risk pathological areas are captured by multi-scale convolutional units within the Transformer module, and long-range dependency features are captured through the Transformer's shift window attention mechanism. Finally, the captured multi-scale local structural features and long-range dependency features are fused using a channel fusion process. A two-layer multilayer perceptron (MLP) is then used to generate spatially adaptive interpolation kernel parameters that match the pathological attributes and degradation characteristics of each region, outputting the spatially adaptive interpolation kernel. The interpolation kernel parameters for each pixel position are generated independently, enabling regionally differentiated interpolation strategy adaptation; among them This is the spatial adaptive interpolation kernel matrix, where each coordinate in the matrix... The element at that location is the independent interpolation kernel parameter corresponding to that pixel position, used to implement pixel-by-pixel differential interpolation calculation.

[0116] In this embodiment, the input to the Transformer module is the fused feature map output by the cross-modal alignment module, which has completed feature distribution alignment. The clinical risk level region segmentation results and clinical risk level masks output by the pathological semantic coding subnetwork in the synchronous correlation input feature processing module are compared. The final output is a spatially adaptive interpolation kernel parameter matrix that perfectly matches the pathological attributes and degradation characteristics of each region in the image. The encoded feature map, which integrates local fine structure and global context information, is respectively connected to the subsequent adaptive interpolation module and closed-loop iterative optimization branch.

[0117] The Transformer module adopts an architecture of local-global dual-branch parallel encoding + pathological perception full-process constraints. From top to bottom, it consists of six units: input adaptation layer, pathological perception ECA channel attention layer, multi-scale local feature extraction branch, SwingTransformer global encoding branch, cross-branch feature fusion layer, and spatial adaptive interpolation kernel generation head. Clinical risk level constraints are embedded throughout the process to achieve accurate encoding of medical image features.

[0118] The input adaptation layer consists of a 1×1 2D convolutional layer, a layer normalization layer, and a feature concatenation unit, arranged sequentially. During operation, the input feature map is fused using 1×1 2D convolution. Number of channels Mapped to the pre-defined embedding dimension of the Transformer backbone network The default value is 96, which yields the initial input features after dimension adaptation. The corresponding formula is .

[0119] right Performing layer normalization stabilizes the feature distribution and accelerates model convergence; the corresponding formula is: In the formula, The mean of the channel dimensions of the feature map. For channel dimension variance, To prevent extremely small values ​​from division by zero, the default value is [value to be filled in]. , , These are learnable scaling and bias parameters.

[0120] Then, the clinical risk level is masked using feature splicing units. In the channel dimension and The pathological risk information is then embedded into the feature space through concatenation to obtain the final module input features. The corresponding formula is The number of channels in the spliced ​​feature map is This approach preserves the semantics and degradation features of the original images while also anchoring the clinical priorities of different regions.

[0121] The pathological perception ECA channel attention layer, based on the general ECA efficient channel attention, introduces a dynamic weight scaling mechanism driven by clinical risk level. Unlike the general globally unified channel attention, it realizes regional-level differentiated control of channel attention, focuses on strengthening the characteristic channel weights of high-risk pathological areas, suppresses redundant channel calculations in low-risk areas, and balances reconstruction accuracy and computational efficiency.

[0122] The pathological perception ECA channel attention layer consists of a global average pooling unit, a bi-branch one-dimensional convolutional unit, a sigmoid activation layer, and a pathological risk weight scaling unit, in sequence. During operation, the input features... Performing global average pooling compresses the two-dimensional spatial features of each channel into a single global representation, preserving the global semantic information of the channels. The corresponding formula is: In the formula, For the first The global pooling result for each channel. For the input feature of the first Each channel feature map is then processed by two cascaded convolutional kernels with a size of [size missing]. One-dimensional convolution captures the local dependencies between channels and generates the initial channel attention weight vector, as shown in the formula: In the formula, Let be the initial channel attention weight vector, with dimension . , It is the Sigmoid activation function. The feature vector is composed of the global pooling results of all channels, and the kernel size is... The number of channels is adaptively determined to be 3.

[0123] Then, through the pathological risk weight scaling unit, based on the clinical risk level mask, differentiated channel attention weight scaling is performed on the feature maps of different spatial regions to achieve regional attention regulation. The corresponding formula is as follows: In the formula, coordinates The final channel attention weights corresponding to each pixel are calculated. For high-risk pathological areas, the weights are amplified by 2 times to enable a strong attention mechanism; for medium-risk areas, the original weights are retained; and for low-risk areas, the weights are compressed to 0.5 times to suppress redundant calculations. Finally, the final channel attention weights are compared with the input features. Perform channel-wise multiplication to achieve attention weighting and obtain the channel-optimized feature map. The corresponding formula is .

[0124] Feature map after channel attention weighting The main body of the input module's encoding, namely the dual-branch parallel encoding structure, adopts a parallel design of multi-scale local feature extraction branch and Swing Transformer global encoding branch. It can simultaneously capture sub-millimeter-level fine local structures of medical images, such as myocardial microfibrosis and microcalcifications, and their long-distance dependence on the global tissue context. This solves the problems of insufficient local detail capture and global computational redundancy in the general Transformer architecture, and embeds pathological perception constraints throughout the process.

[0125] The multi-scale local feature extraction branch is designed to capture subtle local structural features within different receptive fields. Addressing the significant size variations in pathological structures in medical images, it employs a multi-scale dilated convolution design and dynamically adjusts the receptive field range based on clinical risk levels, with a focus on enhancing the capture of subtle structures in high-risk areas. This branch consists of three parallel sets of multi-scale convolutional units, a batch normalization layer, a GELU activation layer, and a multi-scale feature fusion unit. The three sets of convolutional units are a 3×3 standard convolutional unit, a 5×5 dilated convolutional unit, and a 7×7 grouped convolutional unit.

[0126] During the operation, the feature map after channel optimization is used. Three sets of parallel convolutional units are input to extract local structural features from different receptive fields. The 3×3 standard convolutional units capture basic texture and edge features, covering the entire region for general computation. The 5×5 dilated convolutional units, with a dilation rate of 2, capture medium-sized tissue structure features, primarily acting on the edge regions of medium-risk structures. The 7×7 grouped convolutional units, with 8 groups, capture sub-millimeter-level fine pathological structures within a large receptive field. The mask is applied to high-risk pathological areas, while blocking invalid calculations in low-risk areas.

[0127] Each convolutional unit is equipped with batch normalization and GELU activation functions, outputting a feature map at the corresponding scale. The corresponding formula is: In the formula, These correspond to three groups of convolutional units of different scales. This is for batch normalization operations.

[0128] Then, through a multi-scale feature fusion unit, the three sets of feature maps at different scales are concatenated along the channel dimension, and then channel fusion is completed through 1×1 convolution to obtain local feature maps. It fully preserves local structural information at different scales.

[0129] The global encoding branch of the Swin Transformer, which runs parallel to the multi-scale local feature extraction branch, is used to model the long-distance feature dependence of the global image and solve the problem of strong correlation between local pathological structures and global tissue anatomical structures in medical images. Based on the general Swin Transformer, this branch has performed pathological perception dynamic window division and attention pathological constraint optimization to adapt to the clinical needs of this invention.

[0130] The Swin Transformer global encoding branch consists of four consecutive Transformer Blocks. Each Block uses a paired design of window multi-head self-attention and shifted window multi-head self-attention. Each attention unit is equipped with residual connections, layer normalization and feedforward networks. The basic embedding dimension is set to 96, the number of attention heads is set to 3, and the basic window size is 8×8.

[0131] One of the optimizations in this branch is the dynamic window partitioning for pathological perception. Unlike the fixed-size window partitioning of the general SwinTransformer, this branch uses a clinical risk level mask to perform dynamic window size partitioning for different risk regions, balancing attention accuracy and computational efficiency. A 4×4 small window partitioning is used for high-risk pathological areas to improve the spatial resolution of attention computation and accurately capture the feature dependencies of subtle pathological structures; an 8×8 basic window partitioning is used for the edge areas of medium-risk structures to balance the continuity of edge structures and computational efficiency; and a 16×16 large window partitioning is used for low-risk normal tissue areas to significantly reduce redundant computation and lower computational overhead.

[0132] After window partitioning, features within each window undergo independent self-attention computation, significantly reducing the computational complexity of self-attention. The second optimization term in this branch is self-attention computation with pathological constraints. For the first time in multi-head self-attention computation within a window, a pathological risk mask constraint is introduced to strengthen the attention weights of pixel pairs within high-risk areas and suppress ineffective cross-regional attention computation. The corresponding attention calculation formula is as follows: In the formula, The query matrix, key matrix, and value matrix are generated through linear projection, respectively, and are obtained by linear transformation of the input features. The feature dimension for each attention head is used to scale the dot product result to avoid gradient vanishing. For pathological constraint mask matrix, As a normalization function, the attention weights are mapped to The interval is calculated while ensuring that the sum of the weights is 1.

[0133] Each Transformer Block is divided into two sub-stages, both employing a pre-normalized structure and residual connections. The first sub-stage is the multi-head self-attention computation of the window, and the corresponding process is as follows: , .

[0134] The second sub-stage is the multi-head self-attention computation of the shifted window. This involves shifting the window boundaries from the previous stage through a window shifting operation, enabling information exchange between adjacent windows, and completing global long-distance dependency modeling. The corresponding process is as follows: , .

[0135] In the formula, This is a feedforward network consisting of two linear layers and a GELU activation function. The hidden layer in the middle has a dimension four times that of the embedding dimension. Finally, it is encoded by stacking four blocks to output a global feature map. .

[0136] After completing the bi-branch feature encoding, the local feature map With global feature map The input cross-branch feature fusion layer is used to deeply fuse the features of two branches, taking into account both the local fidelity of subtle pathological structures and the consistency of global tissue anatomy. This addresses the industry pain point of clear local details but distorted global structure in medical image super-resolution. This unit consists of a channel attention weighting unit and a residual fusion unit.

[0137] During the work process, firstly and Concatenating along the channel dimension, a lightweight channel attention unit is used to learn the importance weights of the two branch features, with the corresponding formula being: , In the formula, The fusion weights for local features The fusion weights for global features This is the Sigmoid activation function.

[0138] Then, feature fusion is completed through weighted summation, while residual connections are introduced to preserve the basic information of the original input features, resulting in the final fused encoded feature map. The corresponding formula is .

[0139] Fusion coding feature map The final input space adaptive interpolation kernel generator is the output layer of the module. Its function is to map the fused encoded feature map to the independent interpolation kernel parameters corresponding to each pixel position, realizing the innovation of the present invention of matching different interpolation strategies for different regions. It directly connects to the subsequent adaptive interpolation module. This unit is composed of two 1×1 convolutional layers, a GELU activation layer and a linear projection layer in sequence.

[0140] During the process, the fused encoded features are first processed through two 1×1 convolutional layers. Perform channel transformation to map the number of feature channels to the interpolation kernel parameter dimension, the corresponding formula is: .

[0141] The feature map is mapped to the final spatial adaptive interpolation kernel parameter matrix through a linear projection layer. Its dimensions are: ;in, This is the interpolation kernel size, set to 3×3 by default. Each parameter represents the position of a pixel. The corresponding formula is: .

[0142] Final output The direct input to the adaptive interpolation module is used for subsequent upsampling reconstruction and closed-loop iterative optimization, completing the full forward propagation process of this module.

[0143] The complete forward propagation process of this Transformer module is as follows: First, it receives the fused feature map output by the cross-modal alignment module and the clinical risk level mask output by the pathological semantic coding sub-network. After the input adaptation layer completes the dimensional adaptation and pathological information embedding to obtain the module input features, the pathological perception ECA channel attention layer completes the regional differential channel attention weighting to strengthen the feature channels of high-risk pathological areas. Then, it inputs a dual-branch parallel coding structure, which captures fine structural features through multi-scale local feature extraction branches and models long-distance dependencies through the pathological perception optimized Swin Transformer global coding branch. After the cross-branch feature fusion layer completes the weighted fusion of local and global features to obtain the final encoded features, the spatial adaptive interpolation kernel generation head maps and generates an independent spatial adaptive interpolation kernel for each pixel, and outputs it to the adaptive interpolation module to complete the entire process.

[0144] The adaptive interpolation module generates upsampling features based on the spatial adaptive interpolation kernel, and then inputs the upsampling features back into the pathological semantic coding sub-network to complete iterative optimization. Based on the optimized pathological semantic features, the spatial adaptive interpolation kernel is updated, and finally, super-resolution high-definition medical images are output.

[0145] Furthermore, the process of iteratively optimizing the spatial adaptive interpolation kernel through the adaptive interpolation module includes:

[0146] S61. Based on the spatial adaptive interpolation kernel, the initial feature map of low-quality medical images is upsampled to generate upsampled features of corresponding resolution.

[0147] S62. Input the upsampling features back into the pathological semantic coding sub-network, and re-extract the optimized pathological semantic features through the pathological semantic coding sub-network to generate the optimized clinical risk level mask and the clinical risk level region division results.

[0148] S63. Feed the optimized pathological semantic features and clinical risk level region division results back to the pathological risk level-driven kernel prior dynamic weighting module to iteratively update the parameters of the spatial adaptive interpolation kernel.

[0149] Furthermore, the process by which the adaptive interpolation module outputs super-resolution high-definition medical images includes:

[0150] S611: Receives the iteratively optimized spatial adaptive interpolation kernel and the multi-scale feature map of the low-quality medical image, and inputs it into the structure-aware upsampling module;

[0151] S612. Through the structure-aware upsampling module, based on the iteratively optimized spatial adaptive interpolation kernel, the multi-scale feature map is subjected to regionally differentiated interpolation upsampling processing. The structure-preserving interpolation strategy is executed for high-risk pathological areas and medium-risk structural edge areas, and the fast interpolation strategy is executed for low-risk normal tissue areas.

[0152] S613. Perform feature fusion and pixel reconstruction on the upsampled feature map to generate a super-resolution high-definition medical image that matches the preset super-resolution ratio, and complete the image output.

[0153] In practice, the adaptive interpolation module upsamples the initial feature map of the low-quality medical image based on the generated spatial adaptive interpolation kernel, generating upsampled features of the corresponding resolution. The calculation formula for adaptive interpolation is as follows: In the formula, To upsample the high-resolution feature map at the target coordinates The pixel value at that location is the final output result of the adaptive interpolation calculation; For interpolation neighborhood All source pixel coordinates Perform a summation operation; The pixel coordinates of the target high-resolution image / feature map; The coordinates of the neighboring pixels of the source low-resolution feature map are two-dimensional. The interpolation neighborhood is the range of source low-resolution pixels that participate in the calculation of the target pixel value. In this scheme, a 3×3 or 5×5 square neighborhood is used. For spatial adaptive interpolation kernel in neighborhood coordinates The weight values ​​at each location are interpolation weights generated independently for each target pixel; The input source low-resolution feature map in coordinates The pixel value at that location.

[0154] A two-way closed-loop feedback optimization mechanism is initiated, inverting the upsampled features into the pathological semantic coding sub-network, and then re-extracting the optimized pathological semantic features through the pathological semantic coding sub-network. The system generates optimized clinical risk level masks and clinical risk level region segmentation results. These optimized pathological semantic features and clinical risk level region segmentation results are then fed back to a pathological risk level-driven kernel prior dynamic weighting module. This module iteratively updates the parameters of the spatial adaptive interpolation kernel. The iteration process is set to a maximum of 3 iterations, or it stops when the feature difference between two adjacent iterations is less than a preset threshold. This addresses the root cause of the over-resolution error amplification problem caused by insufficient initial pathological semantic extraction accuracy in low-quality images. This is the high-precision pathological semantic feature matrix obtained after upsampling feature back optimization. This is the iteration convergence threshold, used to determine whether the iteration process has reached a stable convergence state. It has a fixed value. .

[0155] After iterative optimization, the adaptive interpolation module receives the iteratively optimized spatial adaptive interpolation kernel and the multi-scale feature map of the low-quality medical image, inputs it into the built-in structure-aware upsampling module, and performs region-differentiated interpolation upsampling processing on the multi-scale feature map based on the iteratively optimized spatial adaptive interpolation kernel. A structure-preserving adaptive interpolation strategy is applied to high-risk pathological areas and medium-risk structural edge areas, while a fast bilinear interpolation strategy is applied to low-risk normal tissue areas, balancing reconstruction accuracy and computational efficiency. Finally, multi-scale feature fusion and pixel reconstruction are performed on the upsampling feature map. Pixel reconstruction uses a 1×1 convolutional layer with a sigmoid activation function, as shown in the formula: In the formula, The final output is a super-resolution high-definition medical image, which is the result after pixel reconstruction; The Sigmoid activation function is used to map the reconstructed pixel values ​​to... The interval is consistent with the normalization interval during training; It is a two-dimensional convolution operation with a kernel size of 1×1, used to map the upsampled multi-channel feature map into a single-channel / three-channel image pixel matrix to complete pixel reconstruction; This is the final upsampled feature map after closed-loop iterative optimization, and this is the input data for pixel reconstruction.

[0156] The medical image super-resolution system also includes an edge-side adaptation module; the edge-side adaptation module obtains the hardware computing power parameters of the deployed device, matches the model lightweight configuration with the corresponding computing power parameters based on the clinical risk level region division results, and completes the system edge-side compilation based on the model lightweight configuration to generate an executable file adapted to the hardware computing power of the deployed device.

[0157] The edge-side adaptation module is an optional extension module of the system, responsible for model lightweighting and edge-side deployment adaptation. Its execution logic is to obtain the hardware computing power parameters of the deployment device, match the model lightweighting configuration with the corresponding computing power parameters based on the clinical risk level regional division results, and complete the system edge-side compilation based on the model lightweighting configuration to generate an executable file adapted to the hardware computing power of the deployment device.

[0158] In practice, the edge-side adaptation module obtains the hardware computing power parameters of the deployed device, including the number of processors, video memory capacity, and floating-point operation capability.

[0159] Based on the clinical risk level region classification results, a lightweight model configuration with corresponding computing power parameters is matched. Dynamic pruning based on L1 regularization is performed on the model channels corresponding to low-risk normal tissue regions, removing redundant channels with absolute weight values ​​less than a preset threshold. Simultaneously, 8-bit integer quantization is performed. The quantization formula is: In the formula, The eigenvalues ​​are quantized to 8-bit integers, and their values ​​range from 1 to 2. Integers; This is a rounding operation used to convert floating-point calculation results into integer values; The original 32-bit full-precision floating-point feature value to be quantized; The smallest eigenvalue in the feature matrix to be quantized; 255 is the maximum eigenvalue in the feature matrix to be quantized; 255 is the maximum value of an 8-bit unsigned integer, used to map the normalized eigenvalues ​​to the range of 8-bit integer values.

[0160] We perform 16-bit floating-point quantization on the model channels corresponding to the edge regions of medium-risk structures, and retain 32-bit full-precision calculation on the model channels corresponding to high-risk pathological regions. This maximizes the compression of model parameters and computational load while ensuring the reconstruction accuracy of key pathological regions.

[0161] Based on lightweight configuration, the model is compiled on the edge using TensorRT or ONNX Runtime to generate an executable file adapted to the hardware computing power of the deployment device, enabling real-time super-resolution processing of low-quality medical images on the edge.

[0162] A typical application scenario for this implementation is multimodal medical image super-resolution reconstruction in primary healthcare institutions. Low-quality medical images acquired by outdated CT equipment, low-field MRI equipment, and portable ultrasound equipment in primary hospitals can be input into this system without manual intervention or model fine-tuning. The system can automatically complete modality recognition, pathological semantic extraction, prior modeling of degenerate nuclei, cross-modal feature alignment, and adaptive super-resolution reconstruction. The output high-definition images can accurately reconstruct sub-millimeter-level pathological structures such as myocardial microfibrosis, coronary artery microcalcification, and intracerebral microbleeds. The edge preservation rate of high-risk pathological areas and the sensitivity of lesion detection are significantly improved. It can meet the needs of early clinical diagnosis without upgrading high-end imaging equipment, greatly reducing the hardware investment cost of primary healthcare and helping to implement the hierarchical medical system.

[0163] Figure 2 This image shows a comparison of the super-resolution reconstruction results of cardiac MRI images using the system described in this invention. The left side of the image shows the original low-quality cardiac MRI image, which is generally blurry, with low contrast between myocardial microfibrosis lesions and surrounding normal tissue, and the lesion boundaries are difficult to define precisely. After super-resolution reconstruction using the system described in this invention, the image clarity is significantly improved, the lesion boundaries are sharp and clear, and the contrast between the lesions and normal tissue is greatly enhanced, providing a reliable imaging basis for the automated quantitative analysis of myocardial lesions.

[0164] Figure 3 This image shows a comparison of the super-resolution reconstruction results of lung CT images using the system described in this invention. The left side of the image shows the original low-quality lung CT image, which has a low signal-to-noise ratio and insufficient contrast between tiny nodules smaller than 5mm in diameter and the surrounding lung tissue, making it highly susceptible to being missed during diagnosis. After super-resolution reconstruction using the system described in this invention, both the signal-to-noise ratio and spatial resolution of the image are significantly improved. The outlines of tiny lung nodules are sharp and clear, and the contrast with surrounding tissues is significantly enhanced, providing solid data support for the automatic identification and accurate measurement of lung nodules.

[0165] Figure 4 This image shows a comparison of the super-resolution reconstruction results of brain CT images using the system described in this invention. The left side of the image shows the original low-quality brain CT image, where the boundaries of high-density intracranial lesions are blurred, and the internal structures of the lesions are not clearly displayed, making it difficult to meet the needs of clinical qualitative diagnosis. After super-resolution reconstruction using the system described in this invention, the lesion boundaries are clear and sharp, and the density differences within the lesions (such as the characteristic manifestations of hemorrhage combined with calcification) can be clearly displayed, providing effective assistance for the accurate qualitative diagnosis of intracranial lesions.

[0166] Based on the preferred embodiments of the present invention described above, those skilled in the art can make various changes and modifications without departing from the inventive concept. The technical scope of this invention is not limited to the contents of the specification, but must be determined according to the scope of the claims.

Claims

1. A medical image super-resolution system based on kernel prior and adaptive interpolation Transformer, characterized in that, include: The module includes a data processing module, a feature processing module, a kernel prior dynamic weighting module, a cross-modal alignment module, and an adaptive interpolation module. The data processing module acquires low-quality medical images to be processed and constructs a multimodal pathology-degeneration joint annotation dataset covering CT, MRI, and ultrasound modalities, and annotating the physical mechanisms of modal degradation, pathological regions, and clinical diagnostic risk levels. The feature processing module extracts the initial pathological semantic features of the low-quality medical images through the pathological semantic coding sub-network and generates the clinical risk level region division results; simultaneously, through the modality-specific degradation coding sub-network, based on the degradation kernel parameter space of the multimodal pathology-degradation joint annotation dataset, it extracts modality-specific degradation features and generates the initial modality-specific degradation kernel prior. The nuclear prior dynamic weighting module is driven by the case risk level. Based on the clinical risk level regional division results, it assigns differentiated nuclear prior modeling weights and parameter dimensions to different regions, and generates a weighted modality-specific degenerate nuclear prior. The cross-modal alignment module uses the initial modality-specific degenerate kernel prior as the domain alignment anchor point, integrates the modality-specific degenerate kernel prior with the initial pathological semantic features, completes feature distribution alignment, and then inputs it into the Transformer module to generate a spatial adaptive interpolation kernel. The adaptive interpolation module generates upsampling features based on the spatial adaptive interpolation kernel, and inputs the upsampling features back into the pathological semantic coding sub-network to complete iterative optimization. Based on the optimized pathological semantic features, the spatial adaptive interpolation kernel is updated, and finally, super-resolution high-definition medical images are output.

2. The medical image super-resolution system based on kernel prior and adaptive interpolation Transformer according to claim 1, characterized in that: The construction process of the multimodal pathology-degeneration joint annotation dataset includes: S11. Obtain the original high-definition medical image set covering CT, MRI, and ultrasound modalities. Based on the degradation physics generation mechanism, perform degradation simulation processing on the original high-definition medical image set to generate a paired low-quality medical image set. S12. For the paired low-quality medical image set and the original high-definition medical image set, complete the annotation of modal degradation physical mechanism, pathological region, and clinical diagnosis risk level. The clinical diagnosis risk level is divided into three levels: high-risk pathological area, medium-risk structural edge area, and low-risk normal tissue area. S13. Perform data normalization and data augmentation on the labeled and paired low-quality medical image set and the original high-definition medical image set to generate a multimodal pathology-degradation joint labeled dataset.

3. The medical image super-resolution system based on kernel prior and adaptive interpolation Transformer according to claim 1, characterized in that: The process of generating the clinical risk level region classification results includes: S21. Input the low-quality medical image into the pathological semantic coding sub-network, and extract the multi-scale pathological semantic feature map of the low-quality medical image through the multi-layer convolutional coding unit of the pathological semantic coding sub-network. S22. The multi-scale pathological semantic feature map is classified into pathological categories and risk levels pixel by pixel through the classification head unit of the pathological semantic coding sub-network, generating initial pathological semantic features and pixel by pixel clinical risk level masks. S23. Based on the clinical risk level mask, complete the clinical risk level region division of the low-quality medical images and generate the clinical risk level region division result.

4. The medical image super-resolution system based on kernel prior and adaptive interpolation Transformer according to claim 1, characterized in that: The generation process of the initial mode-specific degenerate kernel prior includes: S211. Construct a degenerate kernel parameter space that corresponds one-to-one with CT, MRI, and ultrasound modalities. The CT modal corresponds to the convolutional degeneration and motion artifact parameter space, the MRI modal corresponds to the k-space phase degeneration and magnetic field inhomogeneity noise parameter space, and the ultrasound modal corresponds to the multiplicative speckle noise parameter space. S212. Identify the modality type of the low-quality medical image, call the degradation kernel parameter space that matches the modality type, and input the low-quality medical image into the modality-specific degradation coding subnetwork; S213. Modality-specific degradation features of low-quality medical images are extracted through a modality-specific degradation coding sub-network, and an initial modality-specific degradation kernel prior is generated based on the matching degradation kernel parameter space fitting.

5. The medical image super-resolution system based on kernel prior and adaptive interpolation Transformer according to claim 1, characterized in that: The generation process of the modality-specific degenerate kernel prior includes: S31. Receive the results of the clinical risk level region division and the initial modality-specific degenerate kernel prior. Through the pathological risk level-driven kernel prior dynamic weighting module, match the preset weight and parameter dimension configuration corresponding to each clinical risk level region. S32. Assign a full-dimensional degenerative nuclear parameter space and a first modeling weight to high-risk pathological areas, assign a medium-dimensional degenerative nuclear parameter space and a second modeling weight to medium-risk structural edge areas, and assign a low-dimensional degenerative nuclear parameter space and a third modeling weight to low-risk normal tissue areas. S33. Based on the assigned weights and parameter dimensions, the initial modality-specific degenerate kernel prior is subjected to regional-level weighting to generate a weighted modality-specific degenerate kernel prior.

6. The medical image super-resolution system based on kernel prior and adaptive interpolation Transformer according to claim 1, characterized in that: The cross-modal alignment module performs feature distribution alignment as follows: S41. Receive the initial pathological semantic features and the modality-specific degenerate kernel prior, and set the initial pathological semantic features as fixed anchor point features aligned across modal domains; S42. Extract the common feature space of pathological semantic features corresponding to different modal images, and map the modality-specific degeneration kernel prior to the common feature space to obtain the degeneration kernel prior features; S43. Deeply fuse the degenerated nuclear prior features with the initial pathological semantic features at the channel dimension to generate a fused feature map with complete feature distribution alignment.

7. The medical image super-resolution system based on kernel prior and adaptive interpolation Transformer according to claim 6, characterized in that: The process by which the Transformer module generates the adaptive interpolation kernel includes: S51. Receive the fused feature map with completed feature distribution alignment and input it into the Transformer module. Based on the clinical risk level region division results, dynamically adjust the channel attention weights of the module for different regions. S52. A strong channel attention mechanism is enabled for high-risk pathological areas. Multi-scale local structural features of high-risk pathological areas are captured by multi-scale convolutional units in the Transformer module, and long-distance dependency features are captured. S53. The captured multi-scale local structural features and long-distance dependent features are fused to generate spatial adaptive interpolation kernel parameters that match the pathological attributes and degradation characteristics of each region, and the spatial adaptive interpolation kernel is output.

8. The medical image super-resolution system based on kernel prior and adaptive interpolation Transformer according to claim 1, characterized in that: The process of iteratively optimizing the spatial adaptive interpolation kernel through the adaptive interpolation module includes: S61. Based on the spatial adaptive interpolation kernel, the initial feature map of the low-quality medical image is upsampled to generate upsampled features of corresponding resolution. S62. The upsampling features are input back into the pathological semantic coding sub-network. The optimized pathological semantic features are extracted again through the pathological semantic coding sub-network to generate an optimized clinical risk level mask and clinical risk level region division results. S63. The optimized pathological semantic features and the results of the clinical risk level region division are fed back to the pathological risk level-driven kernel prior dynamic weighting module to iteratively update the parameters of the spatial adaptive interpolation kernel.

9. The medical image super-resolution system based on kernel prior and adaptive interpolation Transformer according to claim 1, characterized in that: The process by which the adaptive interpolation module outputs the super-resolution high-definition medical image includes: S611. Receive the iteratively optimized spatial adaptive interpolation kernel and the multi-scale feature map of the low-quality medical image, and input them into the structure-aware upsampling module; S612. Through the structure-aware upsampling module, based on the iteratively optimized spatial adaptive interpolation kernel, the multi-scale feature map is subjected to regionally differentiated interpolation upsampling processing. A structure-preserving interpolation strategy is executed for high-risk pathological areas and medium-risk structural edge areas, and a fast interpolation strategy is executed for low-risk normal tissue areas. S613. Perform feature fusion and pixel reconstruction on the upsampled feature map to generate a super-resolution high-definition medical image that matches the preset super-resolution ratio, and complete the image output.

10. The medical image super-resolution system based on kernel prior and adaptive interpolation Transformer according to claim 1, characterized in that: The medical image super-resolution system also includes an end-side adaptation module; The edge-side adaptation module obtains the hardware computing power parameters of the deployment device, matches the model lightweight configuration corresponding to the computing power parameters based on the clinical risk level region division results, and completes the system edge-side compilation based on the model lightweight configuration to generate an executable file adapted to the hardware computing power of the deployment device.