A multi-level single-image super-resolution spatio-temporal fusion method, device and medium

By employing a multi-level single-image super-resolution spatiotemporal fusion method, and utilizing techniques such as wavelet transform and iterative self-organizing clustering algorithms, the problem of remote sensing image fusion under large spatial resolution differences was solved, achieving high-precision image fusion results.

CN117611462BActive Publication Date: 2026-06-12CHINA UNIV OF GEOSCIENCES (WUHAN)

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA UNIV OF GEOSCIENCES (WUHAN)
Filing Date
2023-11-15
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing remote sensing spatiotemporal fusion methods struggle to effectively fuse high temporal-low spatial resolution images and high spatial-low temporal resolution images when spatial resolution differences are significant, resulting in poor fusion performance.

Method used

A multi-level single-image super-resolution spatiotemporal fusion method is adopted. By predicting change information in the spatial and temporal domains, wavelet transform, iterative self-organizing clustering algorithm and cross-scale internal graph neural network are used, combined with thin plate spline interpolation function to construct spatiotemporal fusion model, determine weight factors, and realize multi-level expansion and fusion of images.

🎯Benefits of technology

It improves the image fusion accuracy under large spatial resolution differences, effectively controls noise accumulation, ensures the accuracy of information and the preservation of details, and solves the limitations of conventional methods in fusion under large resolution differences.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117611462B_ABST
    Figure CN117611462B_ABST
Patent Text Reader

Abstract

The application provides a multi-level single-image super-resolution spatio-temporal fusion method, comprising: obtaining a high-time low-space resolution image, denoted as an HTLS image; obtaining a high-space low-time resolution image, denoted as an HSLT image; constructing an HSLT- HTLS connection graph through a learning-based cross-scale internal graph neural network, and expanding to obtain an image; through an interpolation-based thin-plate spline interpolation function and image expansion, the image is a space prediction; using wavelet transform to process the expanded image to determine an improved space prediction; using an iterative self-organizing clustering algorithm to classify the pixels of the k-1 time phase HSLT image; estimating the time domain information change of each class; distributing the time domain information change to the k-1 time phase HSLT image to obtain the k time phase HTLS image, which is a time prediction; determining a spatio-temporal fusion model according to the space prediction and the time prediction; and determining a fusion image based on the spatio-temporal fusion model and the value of the weight factor w q .
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of remote sensing spatiotemporal fusion, and in particular to a multi-level single-image super-resolution spatiotemporal fusion method, device and medium. Background Technology

[0002] Currently, there is a growing focus on dense time-series satellite data and high spatial resolution imagery. More and more applications now require high temporal and high spatial resolution (HTHS) imagery, particularly for tasks such as land use, cover mapping, change detection, and monitoring ecosystem dynamics.

[0003] Many successful remote sensing spatiotemporal fusion methods have been developed. These methods can be broadly categorized into five types: weight-based methods, unmixing-based methods, Bayesian methods, learning-based methods, and hybrid methods. Weight-based methods resample the HTLS image to the same size as the HSLT image to establish spatial correspondence, using weights to calculate information changes and obtain the HTHS image. Unmixing-based methods clarify spatial correspondence by considering the perspective of mixed pixels; each pixel in the HTLS image is treated as a combination of pixels in the HSLT image. Bayesian methods describe spatial correspondence from a probabilistic perspective, treating the fusion problem as a maximum a posteriori problem of finding the optimal state under known observations. Learning-based methods assume that HTLS and HSLT images have the same or similar spatial characteristics. Hybrid methods combine the advantages of the above methods, achieving better fusion accuracy. However, the spatial correspondence solutions of these methods are mostly only applicable to applications with small spatial resolution differences, such as 4x or 8x.

[0004] Some applications require spatiotemporal fusion under large spatial differences. If the spatial resolution difference is large, such as more than 16 times, the fusion effect may be poor. This is because, under large spatial resolution differences, weight-based methods face the situation where a single pixel value is resampled into hundreds of values, demixing methods need to handle the situation where a single pixel is mixed with hundreds of pixels. The difference in spatial information inevitably leads to errors in the fusion result. Bayesian methods need to address the lack of prior knowledge caused by spatial resolution differences, and learning-based methods face the problem of being unable to extract HTLS image features due to a lack of information. Summary of the Invention

[0005] To address the aforementioned technical problems, this application provides a spatiotemporal fusion method, apparatus, and medium for multi-level single-image super-resolution.

[0006] The above-mentioned objective of this application is achieved through the following technical solution:

[0007] S1: Acquire HTLS and HSLT images; predict change information from the spatial domain to determine spatial predictions:

[0008] Based on the HTLS image and the HSLT image, a two-level expansion is performed to generate an image.

[0009] S2: Process the image using wavelet transform. Determine the improved spatial prediction

[0010] S3: Predict changes from the time domain to determine time forecasts:

[0011] The pixels of the HSLT image are classified using an iterative self-organizing clustering algorithm, the abundance of each class is calculated, and the temporal information changes of each class are estimated.

[0012] Based on the abundance and the temporal information changes, the HTLS image of phase k is obtained.

[0013] S4: Based on the spatial prediction and the temporal prediction, determine the spatiotemporal fusion model and the weighting factor w. q Based on the spatiotemporal fusion model and the weighting factor w q Determine the fused image F k (R ij ,L ij ).

[0014] Optional, first-level extension: Construct an HSLT-HTLS connection graph using a learning-based cross-scale internal graph neural network to extend the image.

[0015] Second-level extension: using an interpolation-based thin-plate spline interpolation function and the image. Expand to obtain image This refers to the spatial prediction;

[0016] In the first level of expansion:

[0017] The nonlocal graph convolutional aggregation module of the cross-scale internal graph neural network is used to match k nearest neighbor HSLT image blocks for each HTLS image block to construct the HSLT-HTLS connection graph;

[0018] Aggregate the corresponding image patch information in the HSLT-HTLS connection graph to generate images with a resolution of 1 to 4 times.

[0019]

[0020] Among them, Fk-1 The HSLT image representing the phase preceding the phase to be predicted, C k-1 Represents the HTLS image of the k-1 phase; C k ψ represents the HTLS image of phase k, i.e., the HTLS image of the phase to be predicted; IGNN Represents the IGNN network;

[0021] In the second-level extension:

[0022] Using the interpolation-based thin plate spline interpolation function, a surface passing through all known points is established, the value of each coarse pixel is assigned to the center position, and a regular set of point data is obtained by fitting the spline function.

[0023] The thin-plate spline interpolation function based on interpolation minimizes the energy function and fits a spatially correlated function using known pixels to minimize the gradient changes of all known points.

[0024] Given N known points, the estimated value C(R) of the thin-plate spline interpolation function is... i ,L i ),as follows:

[0025]

[0026] Where a, b, and c are scalars; A i It is band A at the i-th point; d i It is the distance between the coordinates of the interpolation point and the control point, and it satisfies three constraints: ∑ i A i =0,∑ i A i R i =0,∑ i A i L i =0; Represents the square of the distance; (R,L) represents the coordinates of a known pixel in the image.

[0027]

[0028] Among them, C2(R) i ,L i A) represents the location (R) in the HTLS image of the phase to be predicted. i ,L i The value of the A-th band, E TPS-b Indicates the interpolation result;

[0029] After optimizing the parameters in the thin-plate spline interpolation function by minimizing it, the pixel values ​​of each HSLT image are predicted using the thin-plate spline interpolation function.

[0030] Obtain images with 4 to 16 times the resolution as follows:

[0031]

[0032] Where, ψ TPS This represents the thin-plate spline interpolation function.

[0033] Optionally, step S2 includes:

[0034] The HTLS image C is extracted using the wavelet transform. k Frame information and the image Detailed information;

[0035] By utilizing the inverse wavelet transform, the detailed information is fused into the frame information to ensure the accuracy of the acquired information, thus improving spatial prediction. It can be represented as:

[0036]

[0037] Where, ψ WAV This represents wavelet transform.

[0038] Optionally, step S3 includes:

[0039] S31: Assume that the percentage of each type of substance in a pixel of an HTLS image does not change over time. Suppose that a pixel of an HTLS image corresponds to a pixel of m HSLT images, and is divided into l classes.

[0040] In l classification results, if the c-th class has N[c] pixels, the quantity is represented by abundance, and the abundance is represented by the variable A, as follows:

[0041]

[0042] Where l represents the number of categories; A[c] represents the abundance of the c-th category; (R i ,L i () represents the coordinate value of the i-th HTLS image pixel;

[0043] S32: Estimate the temporal information change ΔF[c] of class c using the abundance:

[0044] Based on the fundamental assumptions of linear mixing theory, the change in temporal information within an HTLS cell can be defined as the cumulative effect of the change in temporal information among all HSLT cells within an HTLS cell.

[0045] ΔC(R i ,L i ) = C k (Ri ,L i )-C k-1 (R i ,L i )

[0046]

[0047] Wherein, ΔC represents the temporal information of the spatiotemporal fusion model, that is, the difference between the two HTLS images at time phase k-1 and time phase k, ΔC(R i ,L i ) indicates (R i ,L i Temporal variation information within the HTLS image at the location;

[0048] S33: The image of time phase k is obtained by assigning the temporal information change ΔF[c] of each class to the HSLT image of time phase k-1. as follows:

[0049]

[0050] in, Indicates the phase k in coordinates (R) i ,L i The temporal predicted value of class c in the j-th HSLT pixel of the HTLS pixel on the image; F k-1 [c](R ij ,L ij ) indicates that the phase at time k-1 is in coordinate (R i ,L i The pixel value of category c in the j-th HSLT pixel of the HTLS pixels on the )

[0051] Optionally, the spatiotemporal fusion model is as follows:

[0052] F k =F k-1 +(w q ΔF SP +(1-w q )ΔF TP )

[0053] Among them, F k Represents the k-phase HSLT image; F k-1 HSLT image representing the k-1 phase; w q Represents the weighting factor; ΔF SP ΔF represents the amount of information change under the influence of spatial scale. TP It represents the amount of information change under the influence of time scale.

[0054] Optionally, step S4 includes:

[0055] S41: According to the demixing theory, one pixel of the HTLS image is considered as a fusion of m pixels of the corresponding HSLT image, as follows:

[0056]

[0057] Among them, (R) ij ,L ij ) indicates the number of pixels (R) in an HTLS image. i ,L i The coordinates of the j-th HSLT image pixel at position F; k-1 (R ij ,L ij ) indicates that the phase at time k-1 is in coordinate (R i ,L i The value of the j-th HSLT pixel in the HTLS pixels on the image; F k (R ij ,L ij ) indicates that the k-phase is in coordinate (R) i ,L i The j-th HSLT pixel value in the HTLS pixels of ); ξ represents the system difference between the two sensors caused by differences in bandwidth and solar geometry, and is a constant; C k-1 (R i ,L i ) indicates that the k-1 phase HTLS image is in (R i ,L i ) pixel value at position; C k (R i ,L i ) indicates that the k-phase HTLS image is in (R i ,L i The pixel value at the location;

[0058] S42: Within the heterogeneous region, the residual E of the heterogeneous region of the HTLS image he (R i ,L i This can be represented as:

[0059]

[0060] The residual E in the heterogeneous region can be obtained from the above formula. he (R i ,L i ):

[0061]

[0062] In homogeneous regions, the spatial prediction represents the actual value of the HSLT image, and the residual E within the homogeneous region. ho (Rij ,L ij )as follows:

[0063]

[0064] in, Indicates in (R) ij ,L ij Location spatial prediction values; Indicates in (R) ij ,L ij Location and time prediction values;

[0065] S43: Using the FSDAF spatiotemporal fusion algorithm, calculate the weighting factors related to the residual distributions of heterogeneous and homogeneous regions to determine the spatial prediction. and time prediction The degree of influence on the final pixel value HI(R) ij ,L ij ),as follows:

[0066]

[0067] Among them, variable I q The value ranges from 0 to 1. When the q-th pixel in the HSLT image is adjacent to the center HSLT image pixel (R... ij ,L ij When the corresponding land cover types are the same, I q The value is 1;

[0068] S44: Combine weights by calculating weighting factors related to the residual distributions of heterogeneous and homogeneous regions:

[0069] CW(R ij ,L ij ) = E ho (R ij ,L ij )HI(R ij ,L ij )+E he (R ij ,L ij )(1-HI(R ij ,L ij ))

[0070] The weights are normalized to:

[0071]

[0072] Among them, CW(R) ij ,L ij ) indicates the number of pixels (R) in an HTLS image. i ,L iThe merging weights corresponding to the j-th HSLT image pixel at position ); W(R ij ,L ij ) indicates the number of pixels (R) in an HTLS image. i ,L i The normalized weights corresponding to the j-th HSLT image pixel at position )

[0073] S45: Add the residual distribution and the time variation to get the actual change ΔF(R) within a pixel of the HSLT image. ij ,L i j) The calculation is as follows:

[0074] ΔF(R ij ,L ij )=m*E he (R ij ,L ij )W(R ij ,L ij )+ΔF[c]

[0075] S46: Set a window centered on the prediction point, integrate the information of neighboring pixels within the window, and the weight of the influence of neighboring pixels on the prediction point depends on the distance between pixels, and the spatial relative distance D of the q-th similar pixel. q ,as follows:

[0076]

[0077] Where w is the window size; R q L represents the x-coordinate of the q-th similar pixel; q R represents the ordinate of the q-th similar pixel; ij L represents the x-coordinate of the predicted point location; ij The vertical coordinate represents the predicted point's location;

[0078] Therefore, the weighting factor w q and the fused image F k (R ij ,L ij ) is represented as:

[0079]

[0080]

[0081] Where n represents the total number of pixels within the window. Indicates spatial prediction values, Time forecast value.

[0082] An electronic device includes a processor, a memory, a user interface, and a network interface. The memory is used to store instructions, the user interface and the network interface are used to communicate with other devices, and the processor is used to execute the instructions stored in the memory to enable the electronic device to perform a multi-level single-image super-resolution spatiotemporal fusion method.

[0083] A computer-readable storage medium storing instructions that, when executed, perform a multi-level single-image super-resolution spatiotemporal fusion method.

[0084] The beneficial effects of the technical solution provided in this application are:

[0085] To predict changes in the spatial domain, spatial predictions are determined, and the HSLT-HTLS connectivity graph is expanded using wavelet transform to process the expanded image. A learning-based cross-scale internal graph neural network (IGNN) and an interpolation-based thin-plate spline interpolation function (TPS) are employed to address the noise accumulation problem caused by multi-level random sampling. Simultaneously, wavelet transform is used to extract HTLS image frame information and detailed information from the 16x image obtained through multi-level single-image super-resolution, further effectively controlling noise generation.

[0086] The pixels of the HSLT image at time k-1 are classified using the Iterative Self-Organizing Clustering Algorithm (ISODATA) to obtain the HTLS image at time k. The temporal prediction is determined by predicting changes in the temporal domain. Finally, the temporal and spatial predictions are fused using a spatiotemporal fusion model to determine the final fused image. By extending the traditional FSDAF spatiotemporal fusion algorithm, a novel multi-level single-image super-resolution spatiotemporal fusion algorithm is proposed. This addresses the limitations of conventional single-level super-resolution methods when dealing with large spatial resolution differences, and avoids mutual interference between temporal and spatial information. It improves the spatial information fusion accuracy between two types of high temporal-low spatial resolution images (HTLS) and high spatial-low temporal resolution images (HSLT) with significant spatial resolution differences. Attached Figure Description

[0087] The present application will be further described below with reference to the accompanying drawings and embodiments. In the accompanying drawings:

[0088] Figure 1 This is an example diagram of the network structure of the spatiotemporal fusion method for multi-level single-image super-resolution in the embodiments of this application;

[0089] Figure 2 This is an example diagram of a remote sensing spatiotemporal fusion task of the multi-level single-image super-resolution spatiotemporal fusion method in the embodiments of this application;

[0090] Figure 3This is a schematic diagram of the electronic device structure of the spatiotemporal fusion method for multi-level single-image super-resolution in the embodiments of this application. Detailed Implementation

[0091] To provide a clearer understanding of the technical features, objectives, and effects of this application, the specific embodiments of this application will now be described in detail with reference to the accompanying drawings.

[0092] The embodiments of this application provide a spatiotemporal fusion method for multi-level single-image super-resolution, specifically including the following steps:

[0093] S1: Acquire HTLS and HSLT images; predict change information from the spatial domain to determine spatial predictions:

[0094] Based on the HTLS image and the HSLT image, a two-level expansion is performed to generate an image.

[0095] S2: Process the image using wavelet transform. Determine the improved spatial prediction

[0096] S3: Predict changes from the time domain to determine time forecasts:

[0097] The pixels of the HSLT image are classified using an iterative self-organizing clustering algorithm, the abundance of each class is calculated, and the temporal information changes of each class are estimated.

[0098] Based on the abundance and the temporal information changes, the HTLS image of phase k is obtained.

[0099] S4: Based on the spatial prediction and the temporal prediction, determine the spatiotemporal fusion model and the weighting factor w. q Based on the spatiotemporal fusion model and the weighting factor w q Determine the fused image F k (R ij ,L ij ).

[0100] Specifically, endmembers are a concept in demixing, representing the basic building blocks of each mixed pixel. Once the number of endmembers is determined, the abundance value of each category can be calculated. Based on the assumptions of linear mixing theory, the change in temporal information within an HTLS pixel can be obtained through the cumulative effect of the temporal information changes of all HSLT pixels within the HTLS pixel.

[0101] Specifically, high temporal-low spatial resolution images are denoted as HTLS images; high spatial-low temporal resolution images are denoted as HSLT images.

[0102] Specifically, the methods employed include cross-scale internal graph neural networks (IGNN), thin-plate spline interpolation (TPS), and iterative self-organizing clustering algorithms (ISODATA). This invention extends the classic hybrid algorithm FSDAF by first predicting change information from both the spatial and temporal domains, then calculating the actual change information through weights to obtain a fusion result. IGNN is used as the first-level single-frame super-resolution extension method to search for similar blocks to acquire information. TPS is used as the second-level single-frame super-resolution extension method to supplement and transmit gain information. Wavelet transform is used to process the spatial prediction results after multi-level extension to ensure the effectiveness of the gain information.

[0103] Specifically, this paper addresses the difficulty of accurately predicting spatial information when there are large differences in spatial resolution by utilizing multi-level single-image super-resolution and constraint information. Large spatial resolution differences inherently pose challenges when attempting to fuse heterogeneous remote sensing data sources. To achieve perfect fusion when integrating high-resolution and low-resolution sensor data, this application preserves more spatial details and spectral features, resulting in a more accurate fused image, even with significant resolution differences. Compared with state-of-the-art spatiotemporal fusion methods, this method demonstrates superior performance in terms of root mean square error, correlation coefficient, structural similarity, and spectral angle.

[0104] Step S1 includes:

[0105] First-level expansion: An HSLT-HTLS connection graph is constructed using a learning-based cross-scale internal graph neural network, expanding the image to obtain...

[0106] Second-level extension: using an interpolation-based thin-plate spline interpolation function and the image. Expand to obtain image This refers to the spatial prediction;

[0107] In the first level of expansion:

[0108] The nonlocal graph convolutional aggregation module of the cross-scale internal graph neural network is used to match k nearest neighbor HSLT image blocks for each HTLS image block to construct the HSLT-HTLS connection graph;

[0109] Aggregate the corresponding image patch information in the HSLT-HTLS connection graph to generate images with a resolution of 1 to 4 times.

[0110]

[0111] Among them, F k-1 The HSLT image representing the phase preceding the phase to be predicted, C k-1 Represents the HTLS image of the k-1 phase; Ck ψ represents the HTLS image of phase k, i.e., the HTLS image of the phase to be predicted; IGNN Represents the IGNN network;

[0112] In the second-level extension:

[0113] Using the interpolation-based thin plate spline interpolation function, a surface passing through all known points is established, the value of each coarse pixel is assigned to the center position, and a regular set of point data is obtained by fitting the spline function.

[0114] The thin-plate spline interpolation function based on interpolation minimizes the energy function and fits a spatially correlated function using known pixels to minimize the gradient changes of all known points.

[0115] Given N known points, the estimated value C(R) of the thin-plate spline interpolation function is... i ,L i ),as follows:

[0116]

[0117] Where a, b, and c are scalars; A i It is band A at the i-th point; d i It is the distance between the coordinates of the interpolation point and the control point, and it satisfies three constraints: ∑ i A i =0,∑ i A i R i =0,∑ i A i L i =0; Represents the square of the distance; (R,L) represents the coordinates of a known pixel in the image.

[0118]

[0119] Among them, C2(R) i ,L i A) represents the location (R) in the HTLS image of the phase to be predicted. i ,L i The value of the A-th band, E TPS-b Indicates the interpolation result;

[0120] After optimizing the parameters in the thin-plate spline interpolation function by minimizing it, the pixel values ​​of each HSLT image are predicted using the thin-plate spline interpolation function.

[0121] Obtain images with 4 to 16 times the resolution as follows:

[0122]

[0123] Where, ψ TPS This represents the thin-plate spline interpolation function.

[0124] Step S2 includes:

[0125] The HTLS image C is extracted using the wavelet transform. k Frame information and the image Detailed information;

[0126] By utilizing the inverse wavelet transform, the detailed information is fused into the frame information to ensure the accuracy of the acquired information, thus improving spatial prediction. It can be represented as:

[0127]

[0128] Where, ψ WAV This represents wavelet transform.

[0129] Specifically, the Interscale Intrinsic Graph Neural Network (IGNN) proposes a nonlocal graph convolutional aggregation module (GraphAgg). This network uses an Enhanced Deep Residual Single-Frame Super-Resolution Network (EDSR) as its backbone, inserting the GraphAgg module to perform interscale high-resolution block aggregation. The aggregated high-resolution features can be directly passed to the backend high-scale network layers through interscale connections, enabling the network to directly perceive the high-resolution texture hidden in the image features. The GraphAgg module mainly includes two steps: graph construction and block aggregation.

[0130] Graph Construction: The HTLS image is downsampled. For each image patch in the HTLS image, the k nearest neighbor HSLT image patches (NN blocks) are searched from the downsampled image using block matching. Each NN block is considered a vertex, and each edge represents the direct similarity between the HSLT image patch and the k NN blocks.

[0131] Block aggregation: Edge aggregation weights are defined based on the similarity between the HSLT image and the NN blocks. Inspired by Adaptive Instance Normalization (AdaIN), IGNN proposes an Adaptive Patch Normalization (AdaPN) method for image blocks, aligning the low-frequency signals of adjacent blocks and the HSLT image without altering high-frequency texture information. Finally, IGNN is used to obtain an image with a 4x resolution improvement.

[0132] Step S3 includes:

[0133] S31: Assume that the percentage of each type of substance in a pixel of an HTLS image does not change over time. Suppose that a pixel of an HTLS image corresponds to a pixel of m HSLT images, and is divided into l classes.

[0134] In l classification results, if the c-th class has N[c] pixels, the quantity is represented by abundance, and the abundance is represented by the variable A, as follows:

[0135]

[0136] Where l represents the number of categories; A[c] represents the abundance of the c-th category; (R i ,L i () represents the coordinate value of the i-th HTLS image pixel;

[0137] S32: Estimate the temporal information change ΔF[c] of class c using the abundance:

[0138] Based on the fundamental assumptions of linear mixing theory, the change in temporal information within an HTLS cell can be defined as the cumulative effect of the change in temporal information among all HSLT cells within an HTLS cell.

[0139] ΔC(R i ,L i ) = C k (R i ,L i )-C k-1 (R i ,L i )

[0140]

[0141] Wherein, ΔC represents the temporal information of the spatiotemporal fusion model, that is, the difference between the two HTLS images at time phase k-1 and time phase k, ΔC(R i ,L i ) indicates (R i ,L i Temporal variation information within the HTLS image at the location;

[0142] S33: The image of time phase k is obtained by assigning the temporal information change ΔF[c] of each class to the HSLT image of time phase k-1. as follows:

[0143]

[0144] in, Indicates the phase k in coordinates (R) i ,L iThe temporal predicted value of class c in the j-th HSLT pixel of the HTLS pixel on the image; F k-1 [c](R ij ,L ij ) indicates that the phase at time k-1 is in coordinate (R i ,L i The pixel value of category c in the j-th HSLT pixel of the HTLS pixels on the )

[0145] Specifically, to match similar blocks, a downsampled HTLS image processing step was added to the cross-scale internal graph neural network (IGNN). This resulted in mixed pixels that affected the selection of similar blocks, and different objects with the same spectrum also influenced the obtained information. To address this issue and ensure the effectiveness of the gain information, wavelet transform was used for the improved spatial prediction. Further processing is then performed. Wavelet transform decomposes the original function using a series of wavelets at different scales, thus obtaining the coefficients of the original function at different scales. In digital image processing, continuous wavelets need to be discretized in wavelet transform. Discretizing the scale and shift of the continuous wavelet transform by powers of two yields the discrete wavelet transform. The discrete wavelet transform decomposes the image into corresponding low-frequency and high-frequency signals through low-pass and high-pass filters. The low-frequency signals represent the frame information of the image, while the high-frequency signals represent the detailed information of the image.

[0146] The spatiotemporal fusion model is as follows:

[0147] F k =F k-1 +(w q ΔF SP +(1-w q )ΔF TP )

[0148] Among them, F k Represents the k-phase HSLT image; F k-1 HSLT image representing the k-1 phase; w q Represents the weighting factor; ΔF SP ΔF represents the amount of information change under the influence of spatial scale. TP It represents the amount of information change under the influence of time scale.

[0149] Step S4 includes:

[0150] S41: According to the demixing theory, one pixel of the HTLS image is considered as a fusion of m pixels of the corresponding HSLT image, as follows:

[0151]

[0152] Among them, (R) ij ,Lij ) indicates the number of pixels (R) in an HTLS image. i ,L i The coordinates of the j-th HSLT image pixel at position F; k-1 (R ij ,L ij ) indicates that the phase at time k-1 is in coordinate (R i ,L i The value of the j-th HSLT pixel in the HTLS pixels on the image; F k (R ij ,L ij ) indicates that the k-phase is in coordinate (R) i ,L i The j-th HSLT pixel value in the HTLS pixels of ); ξ represents the system difference between the two sensors caused by differences in bandwidth and solar geometry, and is a constant; C k-1 (R i ,L i ) indicates that the k-1 phase HTLS image is in (R i ,L i ) pixel value at position; C k (R i ,L i ) indicates that the k-phase HTLS image is in (R i ,L i The pixel value at the location;

[0153] S42: Within the heterogeneous region, the residual E of the heterogeneous region of the HTLS image he (R i ,L i This can be represented as:

[0154]

[0155] The residual E in the heterogeneous region can be obtained from the above formula. he (R i ,L i ):

[0156]

[0157] In homogeneous regions, the spatial prediction represents the actual value of the HSLT image, and the residual E within the homogeneous region. ho (R ij ,L ij )as follows:

[0158]

[0159] in, Indicates in (R) ij ,L ijLocation spatial prediction values; Indicates in (R) ij ,L ij Location and time prediction values;

[0160] S43: Using the FSDAF spatiotemporal fusion algorithm, calculate the weighting factors related to the residual distributions of heterogeneous and homogeneous regions to determine the spatial prediction. and time prediction The degree of influence on the final pixel value HI(R) ij ,L ij ),as follows:

[0161]

[0162] Among them, variable I q The value ranges from 0 to 1. When the q-th pixel in the HSLT image is adjacent to the center HSLT image pixel (R... ij ,L ij When the corresponding land cover types are the same, I q The value is 1;

[0163] S44: Combine weights by calculating weighting factors related to the residual distributions of heterogeneous and homogeneous regions:

[0164] CW(R ij ,L ij ) = E ho (R ij ,L ij )HI(R ij ,L ij )+E he (R ij ,L ij )(1-HI(R ij ,L ij ))

[0165] The weights are normalized to:

[0166]

[0167] Among them, CW(R) ij ,L ij ) indicates the number of pixels (R) in an HTLS image. i ,L i The merging weights corresponding to the j-th HSLT image pixel at position ); W(R ij ,L ij ) indicates the number of pixels (R) in an HTLS image. i ,L i The normalized weights corresponding to the j-th HSLT image pixel at position )

[0168] S45: Add the residual distribution and the time variation to get the actual change ΔF(R) within a pixel of the HSLT image. ij ,L i j) The calculation is as follows:

[0169] ΔF(R ij ,L ij )=m*E he (R ij ,L ij )W(R ij ,L ij )+ΔF[c]

[0170] S46: Set a window centered on the prediction point, integrate the information of neighboring pixels within the window, and the weight of the influence of neighboring pixels on the prediction point depends on the distance between pixels, and the spatial relative distance D of the q-th similar pixel. q ,as follows:

[0171]

[0172] Where w is the window size; R q L represents the x-coordinate of the q-th similar pixel; q R represents the ordinate of the q-th similar pixel; ij L represents the x-coordinate of the predicted point location; ij The vertical coordinate represents the predicted point's location;

[0173] Therefore, the weighting factor w q and the fused image F k (R ij ,L ij ) is represented as:

[0174]

[0175]

[0176] Where n represents the total number of pixels within the window. Indicates spatial prediction values, Time forecast value.

[0177] Specifically, the final pixel value is the result of prediction in both the spatial and temporal domains.

[0178] Figure 2This is an example of a remote sensing spatiotemporal fusion method task. Given a high spatial low temporal image (HSLT) of time phase k-1, a high temporal low spatial image (HTLS) of time phase k-1, and a high temporal low spatial image (HTLS) of time phase k to be predicted, the task is to fuse the HSLT images that were not observed in time phase k by establishing the correspondence between the image pairs of time phase k-1. Typically, spatiotemporal fusion can be represented as the image F to be predicted... k Image F from the previous moment k-1 The sum of the information content change ΔF during that period: F k =F k-1 +ΔF. However, ΔF cannot be directly calculated, as this change information is highly dependent on the HTLS image. Therefore, F k =F k-1 +λΔC, where λ represents the degradation relationship between the HSLT and HTLS images, which is related to the spatial model, and ΔC represents the difference between the two HTLS images at times k-1 and k, which is related to the temporal model. Further, F can be obtained. k =F k-1 +(wΔF SP +(1-w)ΔF TP ).

[0179] Figure 1 As an example of a network architecture for spatial domain prediction, the non-local graph convolutional aggregation module (GraphAgg) proposed by IGNN is used to find the k nearest neighbors (NNs) of HSLT image patches for each HTLS image patch, constructing an HSLT-HTLS connection graph. Then, the texture information of the k corresponding HSLT image patches is aggregated. The aggregated high-resolution features can be directly passed to the backend high-scale network layer through cross-scale connections, enabling the network to directly perceive the high-resolution texture hidden in the image features. IGNN is used to obtain images with resolutions from 1 to 4x, and then TPS is used to obtain images with resolutions from 4 to 16x.

[0180] It should be noted that the above embodiments of the apparatus are only illustrated by the division of the above functional modules. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.

[0181] This application also discloses an electronic device. (See reference...) Figure 3 , Figure 3This is a schematic diagram of the structure of an electronic device disclosed in an embodiment of this application. The electronic device 500 may include: at least one processor 501, at least one network interface 504, a user interface 503, a memory 505, and at least one communication bus 502.

[0182] The communication bus 502 is used to enable communication between these components.

[0183] The user interface 503 may include a display screen and a camera. Optionally, the user interface 503 may also include a standard wired interface and a wireless interface.

[0184] The network interface 504 may optionally include a standard wired interface or a wireless interface (such as a Wi-Fi interface).

[0185] The processor 501 may include one or more processing cores. The processor 501 connects to various parts of the server using various interfaces and lines, and performs various server functions and processes data by running or executing instructions, programs, code sets, or instruction sets stored in memory 505, and by calling data stored in memory 505. Optionally, the processor 501 may be implemented using at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), or Programmable Logic Array (PLA). The processor 501 may integrate one or a combination of several of the following: Central Processing Unit (CPU), Graphics Processing Unit (GPU), and modem. The CPU primarily handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content required for display; and the modem handles wireless communication. It is understood that the modem may also not be integrated into the processor 501 and may be implemented as a separate chip.

[0186] The memory 505 may include random access memory (RAM) or read-only memory.

[0187] Optionally, the memory 505 includes a non-transitory computer-readable storage medium. The memory 505 can be used to store instructions, programs, code, code sets, or instruction sets. The memory 505 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as touch functionality, sound playback functionality, image playback functionality, etc.), instructions for implementing the various method embodiments described above, etc.; the data storage area may store data involved in the various method embodiments described above, etc. The memory 505 may also optionally include, but is not limited to, at least one storage device located remotely from the aforementioned processor 501. (Refer to...) Figure 3 The memory 505, which serves as a computer storage medium, may include an operating system, a network communication module, a user interface module, and an application program for a multi-level single-image super-resolution spatiotemporal fusion method.

[0188] exist Figure 3 In the illustrated electronic device 500, the user interface 503 is mainly used to provide an input interface for the user and acquire user input data; while the processor 501 can be used to call an application program stored in the memory 505 for a multi-level single-image super-resolution spatiotemporal fusion method. When executed by one or more processors 501, the electronic device 500 performs one or more methods as described in the above embodiments. It should be noted that, for the foregoing method embodiments, for the sake of simplicity, they are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, because according to this application, some steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily essential to this application.

[0189] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.

[0190] In the various embodiments provided in this application, it should be understood that the disclosed apparatus can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the couplings or direct couplings or communication connections shown or discussed include, but are not limited to, indirect couplings or communication connections between apparatuses or units through some service interfaces, including but not limited to electrical or other forms.

[0191] The units described as separate components include, but are not limited to, physically separate units. The components shown as units include, but are not limited to, physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of these units can be selected to achieve the purpose of this embodiment according to actual needs.

[0192] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or may exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0193] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage device (CMD). Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a memory and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned memory includes various media capable of storing program code, such as USB flash drives, portable hard drives, magnetic disks, or optical disks.

[0194] The above are merely exemplary embodiments of this disclosure and should not be construed as limiting the scope of this disclosure. Any equivalent changes and modifications made in accordance with the teachings of this disclosure shall still fall within the scope of this disclosure. Other embodiments of this disclosure will readily conceive of those skilled in the art upon consideration of the specification and the disclosure of practical truths.

[0195] This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not described in this disclosure. The specification and embodiments are to be considered exemplary only, and the scope and spirit of this disclosure are defined by the claims.

Claims

1. A spatiotemporal fusion method for multi-level single-image super-resolution, characterized in that, The method includes the following steps: S1: Acquire HTLS and HSLT images; predict change information from the spatial domain to determine spatial predictions: Based on the HTLS image and the HSLT image, a two-level expansion is performed to generate an image. ; First-level expansion: An HSLT-HTLS connection graph is constructed using a learning-based cross-scale internal graph neural network, expanding the image to obtain... ; Second-level extension: using an interpolation-based thin-plate spline interpolation function and the image. Expand to obtain the image This is the spatial prediction; S2: Process the image using wavelet transform. Determine the improved spatial prediction ; S3: Predict changes from the time domain to determine time forecasts: The pixels of the HSLT image are classified using an iterative self-organizing clustering algorithm, the abundance of each class is calculated, and the temporal information changes of each class are estimated. Based on the abundance and the temporal information changes, the HTLS image of phase k is obtained. ; S4: Based on the spatial prediction and the temporal prediction, determine the spatiotemporal fusion model and weighting factors. ; Based on the spatiotemporal fusion model and the weighting factors Determine the fused image .

2. The spatiotemporal fusion method for multi-level single-image super-resolution as described in claim 1, characterized in that, Step S1 includes: In the first level of expansion: The nonlocal graph convolutional aggregation module of the cross-scale internal graph neural network is used to match k nearest neighbor HSLT image blocks for each HTLS image block to construct the HSLT-HTLS connection graph; Aggregate the corresponding image patch information in the HSLT-HTLS connection graph to generate images with a resolution of 1 to 4 times. ; in, The HSLT image representing the phase preceding the phase to be predicted. The HTLS image representing the k-1 phase; This represents the HTLS image of phase k, i.e., the HTLS image of the phase to be predicted; Represents the IGNN network; In the second-level extension: Using the interpolation-based thin plate spline interpolation function, a surface passing through all known points is established, the value of each coarse pixel is assigned to the center position, and a regular set of point data is obtained by fitting the spline function. The thin-plate spline interpolation function based on interpolation minimizes the energy function and fits a spatially correlated function using known pixels to minimize the gradient changes of all known points. Given N known points, what is the estimated value of the thin-plate spline interpolation function? ,as follows: Where a, b, and c are scalars; It is the first Band A at each point; It is the distance between the coordinates of the interpolation point and the control point, and it satisfies three constraints: , , ; Represents the square of the distance; Represents the coordinates of a known pixel in an image; in, The location in the HTLS image representing the phase to be predicted The value of the Ath band, Indicates the interpolation result; After optimizing the parameters in the thin-plate spline interpolation function by minimizing it, the pixel values ​​of each HSLT image are predicted using the thin-plate spline interpolation function. Obtain images with 4 to 16 times the resolution ,as follows: in, This represents the thin-plate spline interpolation function.

3. The spatiotemporal fusion method for multi-level single-image super-resolution as described in claim 2, characterized in that, Step S2 includes: The HTLS image is extracted using the wavelet transform. Frame information and the image Detailed information; By utilizing the inverse wavelet transform, the detailed information is fused into the frame information to ensure the accuracy of the acquired information, thus improving spatial prediction. It can be represented as: in, This represents wavelet transform.

4. A spatiotemporal fusion method for multi-level single-image super-resolution as described in claim 3, characterized in that, Step S3 includes: S31: Assuming that the percentage of each substance class in a pixel of an HTLS image does not change over time, and that a pixel in an HTLS image corresponds to pixels in m HSLT images, divided into... Each class; exist In the classification results, if the c-th class has The number of pixels is represented by abundance, denoted by variable A, as follows: in, Indicates the number of categories; Indicates the abundance of category c; Indicates the first The coordinates of each pixel in the HTLS image; S32: Estimate the temporal information change of class c using the abundance. : Based on the fundamental assumptions of linear mixing theory, the change in temporal information within an HTLS cell is defined as the cumulative effect of the change in temporal information among all HSLT cells within an HTLS cell. in, This represents the temporal information of the spatiotemporal fusion model, i.e., the two HTLS images at k The difference between phase 1 and phase k, express Temporal variation information within the HTLS image of the location; S33: By changing the time-domain information of each class The HSLT image assigned to the k-1 phase is used to obtain the image of the k-phase. ,as follows: in, Indicates the k-phase in coordinates The temporal predicted value of category c in the j-th HSLT pixel of the HTLS pixels; Indicates the k-1 phase in coordinates The pixel value of category c in the j-th HSLT pixel of the HTLS pixels.

5. A spatiotemporal fusion method for multi-level single-image super-resolution as described in claim 4, characterized in that, The spatiotemporal fusion model is as follows: in, Represents the k-phase HSLT image; HSLT image representing the k-1 phase; Indicates the weighting factor; This indicates the amount of information change under the influence of spatial scale. It represents the amount of information change under the influence of time scale.

6. A spatiotemporal fusion method for multi-level single-image super-resolution as described in claim 5, characterized in that, Step S4 includes: S41: According to the unmixing theory, one pixel of the HTLS image is considered as the corresponding pixel of the HSLT image. The merging of individual pixels is as follows: in, Indicates pixels in HTLS image The first in position The coordinates of each pixel in the HSLT image; Indicates the k-1 phase in coordinates The value of the j-th HSLT pixel in the HTLS pixels; Indicates the k-phase in coordinates The value of the j-th HSLT pixel in the HTLS pixels; The system difference between the two sensors, caused by differences in bandwidth and solar geometry, is a constant. Indicates the k-1 phase HTLS image in The pixel value of the location; Indicates the k-phase HTLS image in The pixel value of the location; S42: Residual of the heterogeneous region of the HTLS image within the heterogeneous region. It can be represented as: From the formula and The residuals in the heterogeneous region are obtained. : In homogeneous regions, the spatial prediction represents the actual value of the HSLT image, and the residual within the homogeneous region. as follows: in, Indicates in Location spatial prediction; Indicates in Location-time prediction; S43: Using the FSDAF spatiotemporal fusion algorithm, calculate the weighting factors related to the residual distributions of heterogeneous and homogeneous regions to determine the spatial prediction. and time prediction The degree of influence on the final pixel value ,as follows: Among them, variables The value ranges from 0 to 1, when the HSLT image is in the 1st... Individual pixels and the central HSLT image pixel When the corresponding land cover types are the same, The value is 1; S44: Combine weights by calculating weighting factors related to the residual distributions of heterogeneous and homogeneous regions: The weights are normalized to: in, Indicates pixels in HTLS image The merging weights corresponding to the j-th HSLT image pixel at position; Indicates pixels in HTLS image The normalized weight corresponding to the j-th HSLT image pixel at position; S45: Add the residual distribution and the time variation to get the actual change within a pixel of the HSLT image. The calculation is as follows: S46: Set a window centered on the prediction point, integrate information of neighboring pixels within the window, and the weight of the influence of neighboring pixels on the prediction point depends on the distance between pixels. The spatial relative distance of similar pixels ,as follows: in, It refers to the window size; Indicates the first The x-coordinates of similar pixels; Indicates the first The ordinates of similar pixels; The x-coordinate represents the predicted point's location; The vertical coordinate represents the predicted point's location; Therefore, weighting factor and the fused image Represented as: in, This indicates the total number of pixels within the window. , , Indicates spatial prediction values, Time forecast value.

7. An electronic device, characterized in that, The device includes a processor (501), a memory (505), a user interface (503), and a network interface (504). The memory (505) is used to store instructions. The user interface (503) and the network interface (504) are used to communicate with other devices. The processor (501) is used to execute the instructions stored in the memory (505) to cause the electronic device (500) to perform the spatiotemporal fusion method of multi-level single image super-resolution as described in any one of claims 1-6.

8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores instructions that, when executed, perform the spatiotemporal fusion method for multi-level single-image super-resolution as described in any one of claims 1-6.