A ground image super-resolution reconstruction method, device, equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing multi-scale feature extraction, feature matching, transfer, and fusion, and combining a composite loss function to train the model, the problem of insufficient information utilization in existing ground image super-resolution reconstruction methods is solved, achieving high-precision and robust image reconstruction results.

CN122243740APending Publication Date: 2026-06-19YUNNAN MINZU UNIV

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: YUNNAN MINZU UNIV
Filing Date: 2026-05-21
Publication Date: 2026-06-19

Application Information

Patent Timeline

21 May 2026

Application

19 Jun 2026

Publication

CN122243740A

IPC: G06T3/4046; G06T3/4053; G06V10/44; G06V10/75; G06V10/80

AI Tagging

Application Domain

Geometric image transformation Character and pattern recognition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122243740A_ABST

Patent Text Reader

Abstract

This invention relates to a method, apparatus, device, and storage medium for super-resolution reconstruction of ground images, belonging to the field of information recognition technology. It includes: acquiring a low-resolution ground remote sensing image and its corresponding high-resolution reference image, and extracting multi-scale features from them; performing coarse-to-fine feature matching between the ground image and the reference image based on a candidate feature selection strategy; transferring high-resolution features from the reference image and low-resolution features from the input image to the feature map to be reconstructed using a feature transfer module; fusing the multi-scale features to be reconstructed using a multi-scale feature fusion module to generate a fused feature representation; and training and inferring the model using a composite loss function to output the super-resolution reconstructed ground image. This invention improves the detail recovery and structure preservation capabilities of low-resolution ground images, enhances the imaging quality of ground images in complex environments, and has promising application prospects.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a method, apparatus, device, and storage medium for super-resolution reconstruction of ground images, belonging to the field of information recognition technology. Background Technology

[0002] With the rapid development of remote sensing imaging technology, unmanned aerial vehicle (UAV) platforms, and ground observation equipment, ground images have been widely used in fields such as geographic information acquisition, resource surveys, environmental monitoring, urban planning, and emergency response. However, due to limitations such as imaging equipment resolution, shooting altitude, sensor cost, and complex environmental conditions, the actual acquired ground images often suffer from insufficient spatial resolution, resulting in blurred image details and missing structural information, making it difficult to meet the needs of high-precision analysis and applications.

[0003] To improve the spatial resolution of ground images, traditional methods typically acquire high-resolution images by improving imaging hardware or using repeated shooting. However, in practical applications, these methods are often limited by equipment costs, deployment conditions, and shooting environments, making them difficult to apply flexibly in large-scale or complex scenarios. Therefore, ground image super-resolution reconstruction technology based on software algorithms has gradually become a research hotspot. Its core objective is to recover higher-resolution image content from low-resolution images without increasing hardware costs.

[0004] Existing super-resolution reconstruction methods for ground images mainly include traditional interpolation-based methods and learning-based reconstruction methods. Interpolation-based methods have lower computational complexity but struggle to effectively recover true high-frequency details, and the reconstruction results often suffer from over-smoothing. Learning-based methods, especially deep learning methods, improve reconstruction quality to some extent by building end-to-end models to model low-resolution images. However, existing methods still have the following shortcomings: on the one hand, some methods rely solely on a single input image, lacking effective utilization of external high-resolution information, resulting in limited detail recovery capabilities; on the other hand, degradation features at different scales are not fully characterized during the modeling process, easily leading to structural distortion or texture discontinuities. Furthermore, complex ground scenes contain factors such as viewpoint changes, illumination differences, and environmental interference, making super-resolution reconstruction tasks more difficult and placing higher demands on the robustness and generalization ability of the models.

[0005] Therefore, there is an urgent need for a ground image super-resolution reconstruction method and its corresponding device, equipment and storage medium that can make full use of low-resolution ground images and high-resolution reference image information, and take into account the ability of multi-scale feature modeling and structure preservation, so as to improve the reconstruction quality and practical application value of ground images. Summary of the Invention

[0006] The main objective of this invention is to provide a method, apparatus, device, and storage medium for super-resolution reconstruction of ground images, aiming to solve the technical problem of insufficient super-resolution reconstruction capability of current ground images.

[0007] The technical solution of the present invention is: a method, apparatus, device, and storage medium for super-resolution reconstruction of ground images, wherein the method includes the following steps:

[0008] Acquire low-resolution ground remote sensing images and corresponding high-resolution reference images, and extract multi-scale features from them;

[0009] Based on the candidate feature selection strategy, coarse-to-fine feature matching is performed between the ground image and the reference image;

[0010] The feature transfer module transfers high-resolution features from the reference image and low-resolution features from the input image to the feature map to be reconstructed.

[0011] The multi-scale feature fusion module is used to fuse the multi-scale features to be reconstructed, generating a fused feature representation.

[0012] The model is trained and inferred using a composite loss function, and the super-resolution reconstructed ground image is output.

[0013] Optionally, acquire low-resolution ground remote sensing images and corresponding high-resolution reference images, and perform multi-scale feature extraction on them, specifically:

[0014] The input image includes a low-resolution input image and a high-resolution reference image, wherein the width and height of the low-resolution input image are both 1 / 4 of the width and height of the high-resolution reference image. A ResNetFPN network is used as the feature encoder to perform multi-scale feature extraction on the input image. Specifically: First, a first feature map with a spatial resolution of 1 / 2 of its original size is extracted from the low-resolution input image, and a second feature map with a spatial resolution of 1 / 8 of its original size is extracted from the high-resolution reference image. Since the first and second feature maps maintain the same spatial size, they are used together as input feature maps in the coarse matching stage. Second, a third feature map with a spatial resolution upsampled to twice its original size is extracted from the low-resolution input image, and a fourth feature map with a spatial resolution of 1 / 2 of its original size is extracted from the high-resolution reference image. The third and fourth feature maps also maintain the same spatial size and are used together as input feature maps in the fine matching stage. Through multi-scale feature extraction and alignment, feature matching calculations can be performed on images of different resolutions at a unified spatial scale in both the coarse and fine matching stages, thus balancing matching efficiency and accuracy.

[0015] Optionally, based on a candidate feature selection strategy, coarse-to-fine feature matching is performed between the ground image and the reference image, specifically:

[0016] This invention performs coarse matching after the initial screening of candidate features, including the following core steps: predicting the retention probability of candidate features based on the interaction results of dual image features; generating candidate validity identifiers through a differentiable approximation; and using the identifiers to apply subsequent interaction constraints to candidate features, rather than directly deleting them. This method can achieve refined screening of candidate features while ensuring differentiability during the training phase.

[0017] The candidate feature set obtained after preliminary screening is input into the dual-image feature interaction module to obtain candidate feature representations containing dual-image correlation information, denoted as . For each candidate feature The candidate evaluation submodule predicts the corresponding retention probability distribution. The retention probability is used to characterize whether the candidate feature should continue to participate in subsequent matching in the current dual-image interaction state. In this embodiment, the retention probability is calculated as follows:

[0018]

[0019] in, For candidate evaluation functions, This is the feature normalization function.

[0020] To avoid the non-differentiability problem caused by using discrete thresholds or Top-k operations during training, this embodiment introduces a continuous approximate sampling mechanism to perform differentiable interactive selection and determination of the effectiveness of candidate features. In this embodiment, the effectiveness of candidate features is identified... Obtained in the following ways:

[0021]

[0022] This identifier is used to characterize the retention status of candidate features under dual-image interaction conditions, where a value of 1 indicates that the candidate feature is determined to be valid, and a value of 0 indicates that the candidate feature is determined to be invalid.

[0023] Identify the validity of the candidate features. Initial mask of candidate features By combining the results, we obtain the final candidate validity mask. The calculation method is as follows:

[0024]

[0025] In this way, candidate features are not directly removed, but are used as masks in subsequent processing.

[0026] Furthermore, the candidate features, after being processed by differentiable interactive selection, are used as query features. Key features Sum value characteristics The input is then sent to the feature interaction module. In this embodiment, the feature interaction is calculated using linear attention, and its expression is as follows:

[0027]

[0028] in, This represents the kernel function used for feature mapping.

[0029] The feature interaction process is constrained by combining the generated candidate validity mask.

[0030] This invention employs an implicitly constrained feature interaction method. By modulating the features and candidate validity masks element-wise, it suppresses invalid candidate features without directly removing their feature representations. In this embodiment, the feature interaction calculation method based on mask constraints is as follows:

[0031]

[0032] in, and These represent the candidate validity masks corresponding to the query feature and the key feature, respectively.

[0033] By introducing a feature interaction calculation method based on candidate validity masks, this invention can achieve at least the following technical effects: significantly reduce the computational complexity of the feature interaction stage; effectively suppress the interference of low-quality candidate features on the matching results; and avoid the problem of irreversible information loss caused by directly pruning candidate features.

[0034] Optionally, based on the results of the coarse matching, a fine matching is further performed between the ground image and the reference image, specifically:

[0035] After determining the coarse matching result, the coarse matching result is further refined to obtain a fine matching result consistent with the resolution of the original image. To this end, this invention employs a coarse-to-fine fine matching processing method, achieving high-precision localization of the matching position through local correlation calculation. Specifically, for each pair of coarse matching points, firstly, based on the correspondence of the coarse matching points in the feature space, they are mapped to a fine-scale feature map, and their corresponding positions in the first and second fine-layer feature maps are determined, namely, the first position and the second position.

[0036] Subsequently, taking the corresponding position as the center, a section of size is cropped from the first and second fine-layer feature maps respectively. Local feature windows are obtained. After obtaining the local feature windows, each local feature window is input into the fine-matching feature transformation module for feature enhancement processing. The fine-matching feature transformation module performs multiple feature transformation operations on the local feature windows, thereby generating enhanced local feature maps centered at the first position and the second position, respectively.

[0037] Next, using the central feature vector in the first enhanced local feature map as a reference feature, its correlation with each feature vector in the second enhanced local feature map is calculated to generate a corresponding correlation response map. This correlation response map is used to characterize the matching probability relationship between each pixel position in the second enhanced local feature map and the first position.

[0038] Based on this, by calculating the expectation of the probability distribution represented by the correlation response map, the final matching position in the second image corresponding to the first position is determined, thereby obtaining a matching result with sub-pixel accuracy. The fine matching process is repeated for all coarse matching point pairs, and the sum of all fine matching point pairs constitutes the final set of fine matching results.

[0039] By employing a fine-matching processing method, this invention can achieve precise positioning of the matching position at the original image resolution while maintaining the high efficiency of the coarse-matching stage. This not only improves the accuracy of the matching results but also enhances the robustness of the algorithm in complex scenarios, making it particularly suitable for image matching scenarios with scale changes, viewpoint changes, or local deformations.

[0040] Optionally, a feature transfer module can be used to transfer high-resolution features from the reference image and low-resolution features from the input image to the feature map to be reconstructed. Specifically:

[0041] This invention constructs a texture feature transfer network. By introducing a confidence discrimination mechanism, it enables low-resolution input images to adaptively acquire highly relevant and effective information from high-resolution reference images while suppressing irrelevant feature information, thereby improving the accuracy and stability of feature transfer.

[0042] In the feature transfer process, an index based on a bidirectional normalized exponential function is introduced as a confidence discrimination module to measure the matching reliability between low-resolution input image features and high-resolution reference image features. If the threshold of the bidirectional normalized exponential function is set too high, it will filter out more reference features, causing the model to rely more on low-resolution input image features during reconstruction, thereby weakening the role of the reference image in super-resolution reconstruction. Conversely, if the threshold of the bidirectional normalized exponential function is set too low, it may transfer a large number of reference features with low similarity, introducing too much irrelevant information, thus leading to a decrease in reconstruction quality. In this embodiment of the invention, the threshold of the bidirectional normalized exponential function is set to... .

[0043] In the specific implementation process, when the matching accuracy is greater than or equal to the threshold of the bidirectional normalized exponential function, the corresponding features in the high-resolution reference image are transferred to the reconstructed features; when the matching accuracy is less than the threshold, the features in the low-resolution input image are transferred to the reconstructed features, thereby achieving adaptive selection of feature sources.

[0044] Finally, the output of the feature transfer module is obtained by performing a fold operation on the fused features, which is the inverse of the unfold operation in the block matching stage. Through the feature transfer and recombination process, the detailed information in the high-resolution reference image features and the low-resolution input image features is further enhanced and aggregated, thereby forming reconstructed features with high discriminative power at different scales. This provides a more reliable feature base for subsequent super-resolution reconstruction.

[0045] Optionally, a multi-scale feature fusion module can be used to fuse the multi-scale features to be reconstructed, generating a fused feature representation, specifically as follows:

[0046] In this invention, to obtain the final super-resolution result of the ground image, it is necessary to reconstruct features at different scales. Fusion is performed. Features from three scales are combined. They are simultaneously input into the same feature fusion module, where the superscript They represent , , Scale; specifically, the feature fusion module first performs upsampling or downsampling operations on each input feature map to align it to a uniform spatial resolution, resulting in aligned multi-scale features. :

[0047]

[0048] in, This indicates the upsampling or downsampling operator at the corresponding scale.

[0049] Subsequently, the aligned multi-scale features are mapped to the same feature dimension through a series of convolutions to obtain feature vector representations:

[0050]

[0051] in This indicates a convolution mapping operation.

[0052] To characterize the importance of features at different scales in a multi-scale feature set, the similarity between each scale feature and the other scale features is modeled, and corresponding weight coefficients are generated:

[0053]

[0054] in This represents a similarity evaluation function (such as a relevance measure based on global features). Indicates the first The weight of each scale feature in multi-scale fusion.

[0055] Next, by weighted summation of features at different scales, the fused feature representation is obtained:

[0056]

[0057] Subsequently, a series of convolutional operations are used to map the fused features back to the original feature dimensions, resulting in the final fused feature map:

[0058]

[0059] in This represents the feature remapping convolution module.

[0060] Based on this, the decoder utilizes the fused feature maps Based on the reference image features, and through deconvolution and upsampling operations, the final super-resolution reconstruction result of the ground image is output:

[0061]

[0062] in This indicates the decoding and reconstruction process.

[0063] Optionally, the model can be trained and inferred using a composite loss function to output a super-resolution reconstructed ground image, specifically:

[0064] In this invention, to guide the model to simultaneously consider pixel consistency, structural rationality, and visual perception during the reconstruction process, a composite loss function is used to train the image reconstruction model. This composite loss function is composed of a weighted combination of multiple sub-loss terms, used to optimize the model's output from different constraint perspectives. The overall loss function can be expressed as:

[0065]

[0066] in, Represents pixel reconstruction constraints. Represents the distribution consistency constraint term. Represents feature-aware constraint terms; , and These are the corresponding weighting coefficients.

[0067] The pixel reconstruction constraint term is used to measure the difference between the reconstructed image and the corresponding real image at the pixel level, so as to ensure the consistency of overall brightness and color distribution. In this embodiment, a pixel constraint form based on absolute error is adopted, and its expression is:

[0068]

[0069] in, Represents a true high-resolution image. This represents the reconstructed image output by the model. These represent the number of image channels, height, and width, respectively. Pixel-level constraints can effectively suppress overall reconstruction errors and improve the stability of the reconstruction results.

[0070] The distribution consistency constraint term guides the reconstructed image to maintain consistency with the real image at the overall statistical distribution level, thereby improving the naturalness and reasonableness of the reconstruction result. In this embodiment, a discriminant scoring function is introduced to constrain the generated result, and its loss form can be expressed as:

[0071]

[0072] in, Represents the distribution evaluation function; Represents a real sample; This represents an intermediate sample obtained by proportionally mixing real samples and reconstructed samples. This represents the expectation operation; Indicates gradient operation; This is the regularization coefficient. This constraint term can improve the stability of the training process and prevent abnormal distributions in the model output.

[0073] The feature-aware constraint term is used to constrain the similarity between the reconstructed image and the real image in the high-level feature space, thereby enhancing structural information and semantic consistency. In this embodiment, the image is mapped using a feature extraction operator, and its loss form is defined as:

[0074]

[0075] in, Indicates the first Layer feature extraction mapping, This represents the size parameter of the corresponding feature map. By introducing this feature space constraint, the expressive power of the reconstructed image at the edge, texture, and structure levels can be effectively improved.

[0076] By combining constraints through composite loss functions, the model can simultaneously consider pixel accuracy, distribution consistency, and feature structure rationality during training, thereby generating image reconstruction results with high visual quality and structural integrity.

[0077] Furthermore, the present invention also provides a ground image super-resolution reconstruction apparatus, the ground image super-resolution reconstruction apparatus comprising:

[0078] The multi-scale feature extraction module is used to acquire low-resolution ground remote sensing images and corresponding high-resolution reference images, and to extract multi-scale features from them.

[0079] The feature matching module is used to perform coarse-to-fine feature matching between the ground image and the reference image through a candidate feature selection strategy;

[0080] The feature transfer module is used to transfer high-resolution features from the reference image and low-resolution features from the input image to the feature map to be reconstructed using a feature transfer strategy.

[0081] The feature fusion module is used to fuse the multi-scale features to be reconstructed using the multi-scale feature fusion module to generate a fused feature representation.

[0082] The super-resolution image reconstruction module is used to train and infer the model using a composite loss function, and output super-resolution reconstructed ground images.

[0083] Furthermore, the present invention also provides a ground image super-resolution reconstruction device, the device comprising: a memory, a processor, and a ground image super-resolution reconstruction program stored in the memory and executable on the processor, wherein the ground image super-resolution reconstruction program, when executed by the processor, implements the steps of a ground image super-resolution reconstruction method.

[0084] In addition, the present invention provides a storage medium storing a ground image super-resolution reconstruction program, which, when executed by a processor, implements the steps of the ground image super-resolution reconstruction method.

[0085] The beneficial effects of this invention are:

[0086] This invention proposes a method, apparatus, device, and storage medium for super-resolution reconstruction of ground images. By combining low-resolution ground images with high-resolution reference images, it achieves effective modeling of degradation features at different scales. Furthermore, it constructs a super-resolution reconstruction model by combining feature matching, feature transfer, and multi-scale feature fusion, thereby improving the detail recovery and structure preservation capabilities of low-resolution ground images and enhancing the imaging quality of ground images in complex environments. This invention has promising application prospects. Attached Figure Description

[0087] Figure 1 This is a schematic flowchart of an embodiment of the ground image super-resolution reconstruction method of the present invention.

[0088] Figure 2 This is an architectural diagram of the ground image super-resolution reconstruction method according to an embodiment of the present invention.

[0089] Figure 3 This is a schematic diagram of the candidate feature collaborative screening module based on image interaction in an embodiment of the present invention.

[0090] Figure 4 This is a schematic diagram of the feature transfer module in an embodiment of the present invention.

[0091] Figure 5 This is a schematic diagram of the feature fusion module in an embodiment of the present invention.

[0092] Figure 6 This is an example diagram illustrating the effect of super-resolution reconstruction of ground images according to an embodiment of the present invention.

[0093] Figure 7 This is a structural block diagram of the ground image super-resolution reconstruction device in an embodiment of the present invention.

[0094] The realization of the objective, functional features and advantages of the present invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0095] It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

[0096] With the rapid development of remote sensing imaging technology, unmanned aerial vehicle (UAV) platforms, and ground observation equipment, ground images have been widely used in fields such as geographic information acquisition, resource surveys, environmental monitoring, urban planning, and emergency response. However, due to limitations such as imaging equipment resolution, shooting altitude, sensor cost, and complex environmental conditions, the actual acquired ground images often suffer from insufficient spatial resolution, resulting in blurred image details and missing structural information, making it difficult to meet the needs of high-precision analysis and applications.

[0097] To address this problem, various embodiments of the ground image super-resolution reconstruction method of the present invention are proposed. The ground image super-resolution reconstruction method provided by the present invention achieves effective modeling of degradation features at different scales by combining low-resolution ground images with high-resolution reference images. Furthermore, it constructs a super-resolution reconstruction model by combining feature matching, feature transfer, and multi-scale feature fusion, thereby improving the detail recovery and structure preservation capabilities of low-resolution ground images and enhancing the imaging quality of ground images in complex environments, demonstrating promising application prospects.

[0098] This invention provides a method for super-resolution reconstruction of ground images, referring to... Figure 1 , Figure 1 This is a schematic flowchart of an embodiment of the ground image super-resolution reconstruction method of the present invention.

[0099] The system acquires low-resolution ground remote sensing images and their corresponding high-resolution reference images, and extracts multi-scale features from them. Based on a candidate feature selection strategy, it performs coarse-to-fine feature matching between the ground image and the reference image. Through a feature transfer module, it transfers high-resolution features from the reference image and low-resolution features from the input image to the feature map to be reconstructed. The multi-scale feature fusion module fuses the multi-scale features to be reconstructed to generate a fused feature representation. The model is trained and inferred using a composite loss function to output the super-resolution reconstructed ground image.

[0100] In this embodiment, the ground image super-resolution reconstruction method includes the following steps:

[0101] Step S100 involves acquiring a low-resolution ground remote sensing image and its corresponding high-resolution reference image, and then extracting multi-scale features from them. Specifically, this includes...

[0102] like Figure 2 As shown, in this embodiment, the input image includes a low-resolution input image and a high-resolution reference image, wherein the width and height of the low-resolution input image are both 1 / 4 of the width and height of the high-resolution reference image. A ResNetFPN network is used as the feature encoder to perform multi-scale feature extraction on the input image. Specifically: First, a first feature map with a spatial resolution of 1 / 2 of its original size is extracted from the low-resolution input image, and a second feature map with a spatial resolution of 1 / 8 of its original size is extracted from the high-resolution reference image. Since the first and second feature maps maintain the same spatial size, they are used together as input feature maps in the coarse matching stage. Second, a third feature map with a spatial resolution upsampled to twice its original size is extracted from the low-resolution input image, and a fourth feature map with a spatial resolution of 1 / 2 of its original size is extracted from the high-resolution reference image. The third and fourth feature maps also maintain the same spatial size and are used together as input feature maps in the fine matching stage. Through multi-scale feature extraction and alignment, feature matching calculations can be performed on images of different resolutions at a unified spatial scale in both the coarse and fine matching stages, thus balancing matching efficiency and accuracy.

[0103] Step S200 involves performing coarse-to-fine feature matching between the ground image and the reference image based on a candidate feature selection strategy. Specifically, this includes:

[0104] In detector-free image matching, candidate features are typically initially screened based on a single image. However, relying solely on single-image information can easily lead to the incorrect retention of two types of candidate features: features that respond only in one image but are not matchable in another; and candidate features lacking consistency between the two images due to occlusion, viewpoint changes, or imaging differences. Furthermore, using traditional discrete screening methods (such as fixed-ratio screening or hard thresholding) can result in gradient non-propagation during model training, affecting the overall convergence stability of the model. Therefore, this invention, building upon adaptive candidate feature screening, proposes a differentiable dual-image interactive candidate feature selection method. This method continuously and learnably determines the co-validity of candidate features during the dual-image feature interaction stage.

[0105] This method performs coarse matching after the initial screening of candidate features, including the following core steps: predicting the retention probability of candidate features based on the interaction results of dual image features; generating candidate validity identifiers through a differentiable approximation; and using the identifiers to apply subsequent interaction constraints to candidate features, rather than directly deleting them. This method can achieve refined screening of candidate features while ensuring differentiability during the training phase.

[0106] The candidate feature set obtained after preliminary screening is input into the dual-image feature interaction module to obtain candidate feature representations containing dual-image correlation information, denoted as . For each candidate feature The candidate evaluation submodule predicts the corresponding retention probability distribution. The retention probability is used to characterize whether the candidate feature should continue to participate in subsequent matching in the current dual-image interaction state. In this embodiment, the retention probability is calculated as follows:

[0107]

[0108] in, For candidate evaluation functions, This is the feature normalization function.

[0109] To avoid the non-differentiability problem caused by using discrete thresholds or Top-k operations during training, this embodiment introduces a continuous approximate sampling mechanism to perform differentiable interactive selection and determination of the effectiveness of candidate features. In this embodiment, the effectiveness of candidate features is identified... Obtained in the following ways:

[0110]

[0111] This identifier is used to characterize the retention status of candidate features under dual-image interaction conditions, where a value of 1 indicates that the candidate feature is determined to be valid, and a value of 0 indicates that the candidate feature is determined to be invalid.

[0112] Identify the validity of the candidate features. Initial mask of candidate features By combining the results, we obtain the final candidate validity mask. The calculation method is as follows:

[0113]

[0114] In this way, candidate features are not directly removed, but are used as masks in subsequent processing.

[0115] like Figure 3 As shown, the candidate features after the differentiable interactive selection and determination process are further used as query features. Key features Sum value characteristics The input is then sent to the feature interaction module. In this embodiment, the feature interaction is calculated using linear attention, and its expression is as follows:

[0116]

[0117] in, This represents the kernel function used for feature mapping.

[0118] The feature interaction process is constrained by combining the generated candidate validity mask. Depending on the constraint method, this embodiment provides the following two implementation methods.

[0119] Implementation method 1: Explicit constraint feature interaction method (not preferred).

[0120] In this approach, only candidate features marked as valid are allowed to participate in feature interaction computation. Specifically, before feature interaction computation, query features, key features, and value features are filtered based on a candidate validity mask, retaining only candidate features identified as valid by the mask before performing feature interaction computation. This approach can significantly reduce the number of candidates participating in the computation, but during multiple rounds of interaction, it can easily lead to the inability to recover the spatial location information of filtered candidate features.

[0121] Implementation method 2: Feature interaction method with implicit constraints (preferred).

[0122] In another preferred embodiment, the present invention employs an implicitly constrained feature interaction method. By modulating the features and candidate validity masks element-wise, invalid candidate features are suppressed without directly removing their feature representations. In this embodiment, the feature interaction calculation method based on mask constraints is as follows:

[0123]

[0124] in, and These represent the candidate validity masks corresponding to the query feature and the key feature, respectively.

[0125] Compared with explicit constraints, implicit constraints have at least the following advantages:

[0126] Spatial location information of candidate features is preserved. Suppressed candidate features are not completely deleted; their spatial relationships are still preserved, avoiding information gaps during subsequent interactions.

[0127] It is more suitable for multi-round feature interaction scenarios. During multiple feature interaction iterations, the implicit constraint method can gradually reduce the influence of invalid candidate features, rather than discarding them all at once.

[0128] Improve overall matching stability. By using mask modulation, the feature interaction process is made smoother, which helps the model maintain stable output in complex scenarios.

[0129] In a preferred embodiment, the present invention employs an implicit constraint-based feature interaction method.

[0130] By introducing a feature interaction calculation method based on candidate validity masks, this invention can achieve at least the following technical effects: significantly reduce the computational complexity of the feature interaction stage; effectively suppress the interference of low-quality candidate features on the matching results; and avoid the problem of irreversible information loss caused by directly pruning candidate features.

[0131] In a specific embodiment, after determining the coarse matching result, the coarse matching result is further refined to obtain a fine matching result with the same resolution as the original image. Therefore, this embodiment employs a coarse-to-fine fine matching processing method, achieving high-precision positioning of the matching location through local correlation calculation.

[0132] Specifically, for each pair of coarse matching points, the coarse matching points are first mapped to the fine-scale feature map according to the correspondence in the feature space, and their corresponding positions in the first fine-layer feature map and the second fine-layer feature map are determined respectively, namely the first position and the second position.

[0133] Subsequently, taking the corresponding position as the center, a section of size is cropped from the first and second fine-layer feature maps respectively. Local feature windows are obtained. After obtaining the local feature windows, each local feature window is input into the fine-matching feature transformation module for feature enhancement processing. The fine-matching feature transformation module performs multiple feature transformation operations on the local feature windows, thereby generating enhanced local feature maps centered at the first position and the second position, respectively.

[0134] Next, using the central feature vector in the first enhanced local feature map as a reference feature, its correlation with each feature vector in the second enhanced local feature map is calculated to generate a corresponding correlation response map. This correlation response map is used to characterize the matching probability relationship between each pixel position in the second enhanced local feature map and the first position.

[0135] Based on this, by calculating the expectation of the probability distribution represented by the correlation response map, the final matching position in the second image corresponding to the first position is determined, thereby obtaining a matching result with sub-pixel accuracy. The fine matching process is repeated for all coarse matching point pairs, and the sum of all fine matching point pairs constitutes the final set of fine matching results.

[0136] By using a fine matching process, this embodiment can achieve precise positioning of the matching position at the original image resolution while maintaining the high efficiency of the coarse matching stage. This not only improves the accuracy of the matching results but also enhances the robustness of the algorithm in complex scenes, making it particularly suitable for image matching scenarios with scale changes, viewpoint changes, or local deformations.

[0137] Step S300 involves using a feature transfer module to transfer high-resolution features from the reference image and low-resolution features from the input image to the feature map to be reconstructed. Specifically, this includes:

[0138] Currently, existing super-resolution reconstruction methods typically select texture features from a high-resolution reference image that are highly similar to the low-resolution input image and transfer these texture features to the features to be reconstructed. However, in practical applications, when the overall similarity between the low-resolution input image and the high-resolution reference image is low, directly combining features from the reference image with features from the input image often fails to yield accurate reconstruction results and may even introduce irrelevant or erroneous information, thus adversely affecting the super-resolution reconstruction process.

[0139] This invention constructs a texture feature transfer network. By introducing a confidence discrimination mechanism, it enables low-resolution input images to adaptively acquire highly relevant and effective feature information from high-resolution reference images while suppressing irrelevant feature information, thereby improving the accuracy and stability of feature transfer.

[0140] like Figure 4As shown, during the feature transfer process, an index based on a bidirectional normalized exponential function is introduced as a confidence discrimination module to measure the matching reliability between low-resolution input image features and high-resolution reference image features. If the threshold of the bidirectional normalized exponential function is set too high, it will filter out more reference features, causing the model to rely more on low-resolution input image features during reconstruction, thereby weakening the role of the reference image in super-resolution reconstruction. Conversely, if the threshold of the bidirectional normalized exponential function is set too low, it may transfer a large number of reference features with low similarity, introducing too much irrelevant information, thus leading to a decrease in reconstruction quality. In this embodiment of the invention, the threshold of the bidirectional normalized exponential function is set to... .

[0141] In the specific implementation process, when the matching accuracy is greater than or equal to the threshold of the bidirectional normalized exponential function, the corresponding features in the high-resolution reference image are transferred to the reconstructed features; when the matching accuracy is less than the threshold, the features in the low-resolution input image are transferred to the reconstructed features, thereby achieving adaptive selection of feature sources.

[0142] Finally, the output of the feature transfer module is obtained by performing a fold operation on the fused features, which is the inverse of the unfold operation in the block matching stage. Through the feature transfer and recombination process, the detailed information in the high-resolution reference image features and the low-resolution input image features is further enhanced and aggregated, thereby forming reconstructed features with high discriminative power at different scales. This provides a more reliable feature base for subsequent super-resolution reconstruction.

[0143] Step S400 involves fusing the multi-scale features to be reconstructed using a multi-scale feature fusion module to generate a fused feature representation. Specifically, this includes:

[0144] In this embodiment, to obtain the final super-resolution result of the ground image, it is necessary to reconstruct features at different scales. To merge. For example... Figure 5 As shown, features from three scales They are simultaneously input into the same feature fusion module, where the superscript They represent , , Scale; specifically, the feature fusion module first performs upsampling or downsampling operations on each input feature map to align it to a uniform spatial resolution, resulting in aligned multi-scale features. :

[0145]

[0146] in, This indicates the upsampling or downsampling operator at the corresponding scale.

[0147] Subsequently, the aligned multi-scale features are mapped to the same feature dimension through a series of convolutions to obtain feature vector representations:

[0148]

[0149] in This indicates a convolution mapping operation.

[0150] To characterize the importance of features at different scales in a multi-scale feature set, the similarity between each scale feature and the other scale features is modeled, and corresponding weight coefficients are generated:

[0151]

[0152] in This represents a similarity evaluation function (such as a relevance measure based on global features). Indicates the first The weight of each scale feature in multi-scale fusion.

[0153] Next, by weighted summation of features at different scales, the fused feature representation is obtained:

[0154]

[0155] Subsequently, a series of convolutional operations are used to map the fused features back to the original feature dimensions, resulting in the final fused feature map:

[0156]

[0157] in This represents the feature remapping convolution module.

[0158] Based on this, the decoder utilizes the fused feature maps Based on the reference image features, and through deconvolution and upsampling operations, the final super-resolution reconstruction result of the ground image is output:

[0159]

[0160] in This indicates the decoding and reconstruction process.

[0161] Step S500 involves training and inference on the model using a composite loss function, outputting a super-resolution reconstructed ground image. Specifically, this includes:

[0162] In this embodiment of the invention, to guide the model to simultaneously consider pixel consistency, structural rationality, and visual perception effect during the reconstruction process, a composite loss function is used to train the image reconstruction model. The composite loss function is composed of a weighted combination of multiple sub-loss terms, used to optimize the model's output from different constraint perspectives. The overall loss function can be expressed as:

[0163]

[0164] in, Represents pixel reconstruction constraints. Represents the distribution consistency constraint term. Represents feature-aware constraint terms; , and These are the corresponding weighting coefficients.

[0165] The pixel reconstruction constraint term is used to measure the difference between the reconstructed image and the corresponding real image at the pixel level, so as to ensure the consistency of overall brightness and color distribution. In this embodiment, a pixel constraint form based on absolute error is adopted, and its expression is:

[0166]

[0167] in, Represents a true high-resolution image. This represents the reconstructed image output by the model. These represent the number of image channels, height, and width, respectively. Pixel-level constraints can effectively suppress overall reconstruction errors and improve the stability of the reconstruction results.

[0168] The distribution consistency constraint term guides the reconstructed image to maintain consistency with the real image at the overall statistical distribution level, thereby improving the naturalness and reasonableness of the reconstruction result. In this embodiment, a discriminant scoring function is introduced to constrain the generated result, and its loss form can be expressed as:

[0169]

[0170] in, Represents the distribution evaluation function; Represents a real sample; This represents an intermediate sample obtained by proportionally mixing real samples and reconstructed samples. This represents the expectation operation; Indicates gradient operation; This is the regularization coefficient. This constraint term can improve the stability of the training process and prevent abnormal distributions in the model output.

[0171] The feature-aware constraint term is used to constrain the similarity between the reconstructed image and the real image in the high-level feature space, thereby enhancing structural information and semantic consistency. In this embodiment, the image is mapped using a feature extraction operator, and its loss form is defined as:

[0172]

[0173] in, Indicates the first Layer feature extraction mapping, This represents the size parameter of the corresponding feature map. By introducing this feature space constraint, the expressive power of the reconstructed image at the edge, texture, and structure levels can be effectively improved.

[0174] By employing joint constraints through a composite loss function, the model simultaneously considers pixel accuracy, distribution consistency, and feature structure rationality during training, thereby generating image reconstruction results with high visual quality and structural integrity. For example... Figure 6 As shown, this method effectively improves the detail recovery and structure preservation capabilities of low-resolution ground images, enhances the imaging quality of ground images in complex environments, and has promising application prospects.

[0175] To more clearly demonstrate the improved ground image super-resolution reconstruction capability of this application, comparative experiments are conducted with other methods under the same objective environment to verify the recognition effect of the method designed in this application.

[0176] The evaluation metrics used in this application are Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS).

[0177] Peak Signal-to-Noise Ratio (PSNR) is an objective image quality assessment metric based on pixel error, used to measure the overall similarity between a reconstructed image and a reference image. This metric is calculated by transcribing the mean squared error between the two images and converting it to a logarithmic form. A higher PSNR value indicates a smaller difference between the reconstructed and original images, and thus higher image reconstruction quality. PSNR can be specifically expressed as:

[0178]

[0179] in, This represents the maximum possible value of an image pixel (e.g., for an 8-bit image). ).

[0180] Structural similarity index (SSIM) is an image quality assessment metric based on human visual perception characteristics. It measures the structural similarity between a reconstructed image and a reference image by simultaneously considering brightness, contrast, and structural information. Unlike PSNR, SSIM focuses more on the image's ability to preserve structure, resulting in higher consistency in evaluating visual quality. The structural similarity index can be specifically expressed as:

[0181]

[0182] in: , These represent the mean values of the reference image and the reconstructed image, respectively. , These represent the variances of the corresponding images; This represents the covariance between two images; and It is a stability constant used to avoid the denominator being zero.

[0183] Learning-Perceptual Patch Similarity (LPIPS) is a metric used to evaluate perceptual differences between images. It calculates the similarity between image patches, using a distance metric function to represent the differences between images. A lower LPIPS value indicates better image quality. Its calculation method is as follows:

[0184]

[0185] This invention compares the proposed method with various image super-resolution reconstruction algorithms on the RRSSRD dataset. The algorithms compared include SRGAN, RFDN, BSRN, CrossNet, SRNTT, TTSR, DATSR, RRSR, and FRFSR. These methods are existing techniques with strong image super-resolution reconstruction capabilities; using them as a comparison method provides a more objective demonstration of the capabilities and effectiveness of this application.

[0186] Table 1 shows the Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Learned Perceptual Patch Similarity (LPIPS) of the restored images generated by different methods on the RRSSRD dataset.

[0187]

[0188] This embodiment uses the RRSSRD dataset to conduct comparative experiments on various image super-resolution reconstruction methods. The dataset covers a variety of typical remote sensing ground scenes and can be used to verify the applicability and stability of different image super-resolution reconstruction methods under complex ground conditions. Experimental results, as shown in Table 1, demonstrate that the method of this invention outperforms the comparative methods on the RRSSRD dataset in all objective evaluation dimensions reflecting pixel-level reconstruction accuracy. Furthermore, the method of this invention also achieves high evaluation results in both objective evaluation dimensions reflecting image structural consistency and objective evaluation dimensions measuring the similarity of perceived image patches. Table 1 shows that the method of this invention can balance pixel reconstruction accuracy and image structure preservation during super-resolution reconstruction, exhibiting relatively stable reconstruction results under complex ground scenes. Therefore, the method of this invention can effectively improve the clarity and structural integrity of low-resolution ground images, enhance the usability of image data, and thus alleviate the problem of limited geographic information acquisition quality under complex ground conditions to a certain extent, possessing practical application value.

[0189] Reference Figure 7 , Figure 7 This is a structural block diagram of an embodiment of the ground image super-resolution reconstruction device of the present invention.

[0190] like Figure 7 As shown, the ground image super-resolution reconstruction device proposed in this embodiment of the invention includes:

[0191] The multi-scale feature extraction module 10 is used to acquire low-resolution ground remote sensing images and corresponding high-resolution reference images, and to extract multi-scale features from them.

[0192] The feature matching module 20 is used to perform coarse-to-fine feature matching between the ground image and the reference image through a candidate feature selection strategy;

[0193] The feature transfer module 30 is used to transfer high-resolution features from the reference image and low-resolution features from the input image to the feature map to be reconstructed using a feature transfer strategy.

[0194] The feature fusion module 40 is used to fuse the multi-scale features to be reconstructed using the multi-scale feature fusion module to generate a fused feature representation.

[0195] The super-resolution image reconstruction module 50 is used to train and infer the model using a composite loss function and output a super-resolution reconstructed ground image.

[0196] Furthermore, this embodiment of the invention also provides a computer-readable storage medium storing a ground image super-resolution reconstruction program. When the program is called and executed by a processor, it is used to implement the functional steps corresponding to the ground image super-resolution reconstruction method. Since this storage medium embodiment is consistent with the ground image super-resolution reconstruction method embodiment in terms of technical solution and technical effect, its execution process will not be described again.

[0197] The beneficial effects that can be obtained by using the same technical solution will not be described again in this embodiment. For technical content not described in detail in the computer-readable storage medium embodiment of this application, it can be understood in conjunction with the relevant descriptions of the ground image super-resolution reconstruction method embodiment.

[0198] In practical applications, the program instructions can be deployed and run on a single computing device or on multiple computing devices. These multiple computing devices can be located in the same or different locations and interconnected through a communication network to complete centralized or distributed ground image super-resolution reconstruction processing.

[0199] Those skilled in the art will understand that the method flows described in each embodiment can be implemented by a computer program and completed by program instructions in conjunction with corresponding hardware resources. The program can be stored in a computer-readable storage medium, including but not limited to a disk, optical disk, read-only memory (ROM), random access memory (RAM), or other storage media that can be recognized and read by a computer. When the program is loaded and executed, the various functions described in the embodiments of the ground image super-resolution reconstruction method can be realized.

[0200] It should be noted that the device embodiments are for illustrative purposes only. The functional modules described in this specification do not necessarily correspond to physically independent units and can be implemented independently or integratedly according to actual application requirements. The functional modules can be located in the same device or distributed across multiple network devices. The connection relationships between modules indicate that they have a communication or cooperative relationship, and can be implemented through communication buses, signal lines, or other communication methods. Those skilled in the art can understand and implement the ground image super-resolution reconstruction device of this invention based on the content of this specification without any inventive effort.

[0201] In summary, the technical solution of this invention can be implemented either through a combination of software and general-purpose hardware platforms, or through dedicated hardware, such as application-specific integrated circuits (ASICs), dedicated processors, dedicated memory, or other customized circuit structures. Typically, functions implemented by a program can also be accomplished using various hardware structures, including analog circuits, digital circuits, or dedicated logic circuits. However, in most application scenarios, software-based implementations offer greater flexibility and scalability.

[0202] Based on the implementation of the ground image super-resolution reconstruction method and apparatus of the present invention, the technical solution of the present invention can also be provided in the form of a software product. The software product can be stored in a computer-readable storage medium, such as a floppy disk, USB flash drive, portable hard drive, ROM, RAM, disk, or optical disk, and includes program instructions for causing a computer device (such as a personal computer, server, or network device) to execute the ground image super-resolution reconstruction method described in the embodiments of the present invention.

Claims

1. A method for super-resolution reconstruction of a ground image, characterized in that: The method includes the following steps: Acquire low-resolution ground remote sensing images and corresponding high-resolution reference images, and extract multi-scale features from them; Based on the candidate feature selection strategy, coarse-to-fine feature matching is performed between the ground image and the reference image; The feature transfer module transfers high-resolution features from the reference image and low-resolution features from the input image to the feature map to be reconstructed. The multi-scale feature fusion module is used to fuse the multi-scale features to be reconstructed, generating a fused feature representation. The model is trained and inferred using a composite loss function, and the super-resolution reconstructed ground image is output. Based on a candidate feature selection strategy, coarse-to-fine feature matching is performed between the ground image and the reference image, specifically including: After the initial screening of candidate features, coarse matching is performed, including the following core steps: predicting the retention probability of candidate features based on the interaction results of dual image features; generating candidate validity identifiers through differentiable approximation; and using the identifiers to impose subsequent interaction constraints on candidate features, rather than directly deleting them. The candidate feature set obtained through preliminary screening is respectively input into a double-image feature interaction module to obtain a candidate feature representation containing double-image correlation information, denoted as ; for each candidate feature , a corresponding retention probability distribution of the candidate feature is predicted through a candidate evaluation submodule, the retention probability being used to represent whether the candidate feature should continue to participate in subsequent matching under the current double-image interaction state; the calculation method of the retention probability is: ； wherein, is a candidate evaluation function, is a feature normalization function; A continuous approximation sampling mechanism is introduced to make a differentiable interactive selection decision on the effectiveness of the candidate features; the effectiveness of the candidate features is identified is obtained by the following manner: ； This identifier is used to characterize the retention status of candidate features under dual-image interaction conditions, where a value of 1 indicates that the candidate feature is determined to be valid, and a value of 0 indicates that the candidate feature is determined to be invalid. Identify the validity of the candidate features. Initial mask of candidate features By combining the results, we obtain the final candidate validity mask. The calculation method is as follows: ； Furthermore, the candidate features, after being processed by differentiable interactive selection, are used as query features. Key features Sum value characteristics The input is sent to the feature interaction module; the feature interaction is calculated using linear attention, and its expression is as follows: ； in, This represents the kernel function used for feature mapping; The feature interaction process is constrained by combining the generated candidate validity mask; An implicitly constrained feature interaction method is adopted, which suppresses invalid candidate features by modulating the feature and candidate validity mask element by element, without directly removing their feature representations; the feature interaction calculation method based on mask constraints is as follows: ； in, and These represent the candidate validity masks corresponding to the query feature and the key feature, respectively.

2. The ground image super-resolution reconstruction method as described in claim 1, characterized in that: Acquire low-resolution ground remote sensing images and their corresponding high-resolution reference images, and perform multi-scale feature extraction on them, specifically: The input image includes a low-resolution input image and a high-resolution reference image, wherein the width and height of the low-resolution input image are both 1 / 4 of the width and height of the high-resolution reference image. A ResNetFPN network is used as a feature encoder to perform multi-scale feature extraction on the input image. Specifically: First, a first feature map with a spatial resolution of 1 / 2 of its original size is extracted from the low-resolution input image, and a second feature map with a spatial resolution of 1 / 8 of its original size is extracted from the high-resolution reference image. The first and second feature maps have the same spatial size and are used together as input feature maps in the coarse matching stage. Second, a third feature map with a spatial resolution of 2 times its original size is extracted from the low-resolution input image, and a fourth feature map with a spatial resolution of 1 / 2 of its original size is extracted from the high-resolution reference image. The third and fourth feature maps also have the same spatial size and are used together as input feature maps in the fine matching stage.

3. The ground image super-resolution reconstruction method as described in claim 1, characterized in that, Based on the results of the coarse matching, a fine matching is further performed between the ground image and the reference image, specifically including: For each pair of coarse matching points, firstly, based on the correspondence of the coarse matching points in the feature space, they are mapped to the fine-scale feature map, and their corresponding positions in the first and second fine-scale feature maps are determined, namely the first position and the second position. Subsequently, taking the corresponding position as the center, a section of size is cropped from the first and second fine-layer feature maps respectively. The local feature windows are obtained; after obtaining the local feature windows, each local feature window is input into the fine matching feature transformation module for feature enhancement processing; the fine matching feature transformation module performs multiple feature transformation operations on the local feature windows to generate enhanced local feature maps centered at the first position and the second position respectively. Next, the central feature vector in the first enhanced local feature map is used as a reference feature, and its correlation with each feature vector in the second enhanced local feature map is calculated to generate a corresponding correlation response map; the correlation response map is used to characterize the matching probability relationship between each pixel position in the second enhanced local feature map and the first position. Based on this, by performing expectation calculation on the probability distribution represented by the correlation response map, the final matching position corresponding to the first position in the second image is determined, thereby obtaining a matching result with sub-pixel accuracy; the fine matching process is repeated for all coarse matching point pairs, and all the fine matching point pairs obtained constitute the final fine matching result set.

4. The ground image super-resolution reconstruction method as described in claim 1, characterized in that, The feature transfer module transfers high-resolution features from the reference image and low-resolution features from the input image to the feature map to be reconstructed, specifically as follows: In the feature transfer process, an index based on a bidirectional normalized exponential function is introduced as a confidence discrimination module to measure the matching confidence between low-resolution input image features and high-resolution reference image features; the threshold of the bidirectional normalized exponential function is set as follows: ; When the matching accuracy is greater than or equal to the threshold of the bidirectional normalization exponential function, the corresponding features in the high-resolution reference image are transferred to the reconstructed features; When the matching accuracy is less than the threshold, the features in the low-resolution input image are transferred to the reconstructed features; Finally, the output of the feature transfer module is obtained by performing a fold operation on the fused features, which is the inverse process of the unfold operation in the block matching stage; Through feature transfer and recombination, the detailed information in the features of the high-resolution reference image and the low-resolution input image is further enhanced and aggregated, thereby forming highly discriminative reconstructed features at different scales. .

5. The ground image super-resolution reconstruction method as described in claim 1, characterized in that, The multi-scale feature fusion module is used to fuse the multi-scale features to be reconstructed, generating a fused feature representation, specifically including: Features from three scales They are simultaneously input into the same feature fusion module, where the superscript They represent , , Scale; specifically, the feature fusion module first performs upsampling or downsampling operations on each input feature map to align it to a uniform spatial resolution, resulting in aligned multi-scale features. : ； in, This indicates the upsampling or downsampling operator at the corresponding scale; Subsequently, the aligned multi-scale features are mapped to the same feature dimension through a series of convolutions to obtain feature vector representations: ； in Indicates the convolution mapping operation; Model the similarity between each scale feature and the other scale features, and generate corresponding weight coefficients: ； in This represents the similarity evaluation function. Indicates the first The weight of each scale feature in multi-scale fusion; Next, by weighted summation of features at different scales, the fused feature representation is obtained: ； Subsequently, a series of convolutional operations are used to map the fused features back to the original feature dimensions, resulting in the final fused feature map: ； in This represents the feature remapping convolution module; Based on this, the decoder utilizes the fused feature maps Based on the reference image features, and through deconvolution and upsampling operations, the final super-resolution reconstruction result of the ground image is output: ； in This indicates the decoding and reconstruction process.

6. The ground image super-resolution reconstruction method as described in claim 1, characterized in that, The process of training and inferring the model using a composite loss function to output a super-resolution reconstructed ground image specifically includes: A composite loss function is used to train the image reconstruction model; the composite loss function is composed of a weighted combination of multiple sub-loss terms, used to optimize the model's output from different constraint perspectives; the overall loss function is expressed as: ； in, Represents pixel reconstruction constraints. Represents the distribution consistency constraint term. Represents feature-aware constraint terms; , and These are the corresponding weighting coefficients; The pixel reconstruction constraint term is used to measure the difference between the reconstructed image and the corresponding real image at the pixel level. It adopts a pixel constraint form based on absolute error, and its expression is: ； in, Represents a true high-resolution image. This represents the reconstructed image output by the model. These represent the number of channels, height, and width of the image, respectively. The distribution consistency constraint term is used to guide the reconstructed image to maintain consistency with the real image at the overall statistical distribution level. It is constrained by introducing a discriminant scoring function, and its loss form is expressed as: ； in, Represents the distribution evaluation function; Represents a real sample; This represents an intermediate sample obtained by proportionally mixing real samples and reconstructed samples. This represents the expectation operation; Indicates gradient operation; The regularization coefficient is used. The feature-aware constraint term is used to constrain the similarity between the reconstructed image and the real image in the high-level feature space; the image is mapped through a feature extraction operator, and its loss form is defined as: ； in, Indicates the first Layer feature extraction mapping, This indicates the size parameters of the corresponding feature map.

7. A ground image super-resolution reconstruction device, characterized in that, The ground image super-resolution reconstruction device includes: The multi-scale feature extraction module is used to acquire low-resolution ground remote sensing images and corresponding high-resolution reference images, and to extract multi-scale features from them. The feature matching module is used to perform coarse-to-fine feature matching between the ground image and the reference image through a candidate feature selection strategy; The feature transfer module is used to transfer high-resolution features from the reference image and low-resolution features from the input image to the feature map to be reconstructed using a feature transfer strategy. The feature fusion module is used to fuse the multi-scale features to be reconstructed using the multi-scale feature fusion module to generate a fused feature representation. The super-resolution image reconstruction module is used to train and infer the model using a composite loss function, and output super-resolution reconstructed ground images.

8. A ground image super-resolution reconstruction device, characterized in that, The ground image super-resolution reconstruction device includes: a memory, a processor, and a ground image super-resolution reconstruction program stored in the memory and executable on the processor. When the ground image super-resolution reconstruction program is executed by the processor, it implements the steps of the ground image super-resolution reconstruction method as described in any one of claims 1 to 6.

9. A storage medium, characterized in that, The storage medium stores a ground image super-resolution reconstruction program, which, when executed by a processor, implements the steps of the ground image super-resolution reconstruction method as described in any one of claims 1 to 6.