An image fusion method, device and storage medium
By constructing a feature analysis network and a vector field construction network to process hyperspectral and multispectral images, the nonlinear modeling and robustness problems of spectral-spatial fusion in existing technologies are solved, achieving image fusion with high spectral fidelity and spatial clarity, and avoiding problems of spectral distortion and insufficient stability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- WUHAN INST OF TECH
- Filing Date
- 2026-02-13
- Publication Date
- 2026-06-23
AI Technical Summary
Existing hyperspectral-multispectral image fusion techniques face trade-offs between nonlinear expressive power, long-range dependency modeling, generalization robustness, and computational efficiency. In particular, they are prone to spectral shifts, structural artifacts, and loss of detail in complex scenes with co-occurring ground features, rich textures, or drastic spectral variations.
By constructing a training model, including a feature analysis network, a target vector field construction network, and a prediction vector field construction network, downsampling, conditional vector analysis, target vector field analysis, and prediction vector field analysis are performed on high-resolution hyperspectral and multispectral images to generate an image fusion model, thereby achieving consistency optimization of spectral features and spatial structure.
It improves the spectral fidelity and spatial clarity of fused images, avoids the spectral distortion problem caused by differences in sensor spectral response in traditional methods, and overcomes the shortcomings of single-step regression methods, such as lack of constraints in the intermediate process and insufficient stability of results.
Smart Images

Figure CN122265060A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image fusion technology, specifically to an image fusion method, apparatus, and storage medium. Background Technology
[0002] Remote sensing, as an advanced high-tech method of Earth observation, is one of the important ways for humankind to understand the objective world. Hyperspectral imaging (HSI) can accurately describe the characteristics of ground objects in continuous and fine spectral dimensions. By capturing rich spectral subdivision information, it provides a solid data foundation for remote sensing applications such as image classification, target recognition, environmental monitoring, and change detection. However, due to the limitations of sensor imaging principles, hyperspectral images inevitably sacrifice spatial resolution while pursuing high spectral resolution. In contrast, multispectral imaging (MSI) technology has significant advantages in preserving spatial details, but its spectral resolution is lower and it lacks the rich spectral information possessed by hyperspectral images. Therefore, in practical remote sensing systems, multispectral images are often used to accurately supplement the spatial information of a scene. Under this natural complementary characteristic, HSI–MSI fusion becomes an effective and feasible way to improve the spatial resolution of hyperspectral images. The core objective of this fusion task is to fully utilize the high spatial resolution details provided by MSI to enhance the low spatial resolution of HSI, thereby generating a reconstructed image with both high spectral fidelity and high spatial resolution. Essentially, hyperspectral images provide precise and rich spectral information, while multispectral images contribute clear spatial structure. The key to HSI–MSI fusion lies in achieving efficient alignment and reliable information reconstruction between the two, so as to preserve their respective advantages to the greatest extent and suppress information loss during the fusion process.
[0003] Early image fusion methods primarily focused on linear modeling paradigms, such as matrix factorization, sparse representation, Bayesian statistics, and subspace projection. These methods typically model the physical imaging process and strive to recover true spectral information, possessing strong interpretability and theoretical foundation. However, the linear assumption fails to adequately characterize the nonlinear components in complex scenes, especially under practical conditions such as inconsistencies in cross-modal features, noise interference, and mixed pixels, where recovery performance is often significantly limited. With the rapid development of deep learning, end-to-end fusion models based on convolutional neural networks (CNNs) have significantly improved spatial detail compensation capabilities, achieving higher-quality spatial-spectral reconstruction through powerful feature extraction and nonlinear mapping capabilities. However, the local receptive field characteristics of convolutional structures limit their ability to model long-range spectral dependencies, easily leading to problems such as spectral shift, structural artifacts, and loss of detail in scenes with complex co-occurrence of ground features, rich textures, or drastic spectral variations. In recent years, the Transformer architecture has been gradually introduced into hyperspectral-multispectral fusion tasks, effectively alleviating the bottleneck of insufficient long-range dependency modeling. Its self-attention mechanism can capture global contextual information, achieving significant performance improvements on many benchmark datasets. However, the Transformer method also has several limitations: the model's generalization ability is highly dependent on the distribution stability and scale of the training data, the computational complexity is high, the hardware resource requirements are large, and its practicality in large-scale high-resolution remote sensing data processing still faces challenges.
[0004] Overall, from early linear methods to end-to-end CNN models, and then to the global modeling stage dominated by Transformers, hyperspectral-multispectral fusion technology has made great progress. However, there are still challenges in balancing nonlinear expressive power, long-range dependency modeling, generalization robustness, and computational efficiency, which provides important directions for future research. Summary of the Invention
[0005] The technical problem to be solved by the present invention is to provide an image fusion method, apparatus and storage medium to address the shortcomings of the prior art.
[0006] The technical solution of the present invention to solve the above-mentioned technical problems is as follows: an image fusion method, comprising the following steps: Import the high-resolution hyperspectral image to be fused, the high-resolution multispectral image to be fused, multiple original high-resolution hyperspectral images, and the original high-resolution multispectral images corresponding to each of the original high-resolution hyperspectral images; Each of the original high-resolution hyperspectral images is downsampled to obtain an original low-resolution hyperspectral image corresponding to each of the original high-resolution hyperspectral images. A training model is constructed, which includes a feature analysis network, a target vector field construction network, and a prediction vector field construction network. The feature analysis network is used to perform conditional vector analysis on each of the original high-resolution multispectral images to obtain the target conditional vector corresponding to each of the original high-resolution hyperspectral images. The target vector field is used to construct a network to perform target vector field analysis on each of the original high-resolution hyperspectral images and the original low-resolution hyperspectral images corresponding to each of the original high-resolution hyperspectral images, so as to obtain the target time step vector and the target vector field corresponding to each of the original high-resolution hyperspectral images. The prediction vector field is used to construct a network to perform prediction vector field analysis on each of the target time step vectors, the original high-resolution multispectral images corresponding to each of the original high-resolution hyperspectral images, and the target condition vectors corresponding to each of the original high-resolution hyperspectral images, so as to obtain the prediction vector field corresponding to each of the original high-resolution hyperspectral images. Based on all the target vector fields and all the predicted vector fields, the training model is analyzed for parameter updates to obtain the image fusion model. The image fusion model is used to fuse the high-resolution hyperspectral image and the high-resolution multispectral image to be fused, and the image fusion result is obtained.
[0007] Another technical solution of the present invention to solve the above-mentioned technical problems is as follows: an image fusion device, comprising: The import module is used to import the high-resolution hyperspectral image to be fused, the high-resolution multispectral image to be fused, multiple original high-resolution hyperspectral images, and the original high-resolution multispectral images corresponding to each of the original high-resolution hyperspectral images. The downsampling processing module is used to perform downsampling processing on each of the original high-resolution hyperspectral images to obtain the original low-resolution hyperspectral images corresponding to each of the original high-resolution hyperspectral images. The model building module is used to build a training model, which includes a feature analysis network, a target vector field construction network, and a prediction vector field construction network. The conditional vector analysis module is used to perform conditional vector analysis on each of the original high-resolution multispectral images through the feature analysis network to obtain the target conditional vector corresponding to each of the original high-resolution hyperspectral images. The target vector field analysis module is used to construct a network through the target vector field to perform target vector field analysis on each of the original high-resolution hyperspectral images and the original low-resolution hyperspectral images corresponding to each of the original high-resolution hyperspectral images, so as to obtain the target time step vector corresponding to each of the original high-resolution hyperspectral images and the target vector field corresponding to each of the original high-resolution hyperspectral images. The prediction vector field analysis module is used to perform prediction vector field analysis on each of the target time step vectors, the original high-resolution multispectral images corresponding to each of the original high-resolution hyperspectral images, and the target condition vectors corresponding to each of the original high-resolution hyperspectral images through the prediction vector field construction network, so as to obtain the prediction vector field corresponding to each of the original high-resolution hyperspectral images. The parameter update analysis module is used to perform parameter update analysis on the training model based on all the target vector fields and all the prediction vector fields to obtain the image fusion model. The image fusion result acquisition module is used to perform image fusion on the high-resolution hyperspectral image to be fused and the high-resolution multispectral image to be fused through the image fusion model to obtain the image fusion result.
[0008] Based on the above-mentioned image fusion method, the present invention also provides an image fusion system.
[0009] Another technical solution of the present invention to solve the above-mentioned technical problems is as follows: an image fusion system, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the image fusion method described above is implemented.
[0010] Based on the above-described image fusion method, the present invention also provides a computer-readable storage medium.
[0011] Another technical solution of the present invention to solve the above-mentioned technical problems is as follows: a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the image fusion method as described above.
[0012] The beneficial effects of this invention are as follows: A low-resolution hyperspectral image is obtained by downsampling the original high-resolution hyperspectral image; a target condition vector is obtained by analyzing the condition vector of the original high-resolution multispectral image using a feature analysis network; a target time step vector and target vector field are obtained by analyzing the target vector field of the original high-resolution hyperspectral image and the original low-resolution hyperspectral image using a target vector field construction network; a prediction vector field is obtained by analyzing the prediction vector field of the target time step vector, the original high-resolution multispectral image, and the target condition vector using a prediction vector field construction network; an image fusion model is obtained by updating the parameters of the training model based on the target vector field and the prediction vector field; and an image fusion result is obtained by fusing the high-resolution hyperspectral image to be fused and the high-resolution multispectral image to be fused using the image fusion model. This achieves consistent optimization of spectral features and spatial structure, improving the spectral fidelity of the fused image while ensuring spatial clarity. It effectively avoids the spectral distortion problem caused by differences in sensor spectral response in traditional fusion methods and overcomes the shortcomings of existing single-step regression methods, such as lack of constraints in the intermediate process and insufficient stability of the results. Attached Figure Description
[0013] Figure 1 A schematic flowchart of the image fusion method provided in an embodiment of the present invention; Figure 2 This is a schematic diagram of the training model for the image fusion method provided in an embodiment of the present invention; Figure 3 A schematic diagram of the CAVE dataset visualization results of the image fusion method provided in this embodiment of the invention; Figure 4 This is a connection diagram of the apparatus for the image fusion method provided in an embodiment of the present invention; Figure 5 This is a block diagram of an image fusion apparatus provided in an embodiment of the present invention. Detailed Implementation
[0014] The principles and features of the present invention are described below with reference to the accompanying drawings. The examples given are only for explaining the present invention and are not intended to limit the scope of the present invention.
[0015] Figure 1 This is a schematic flowchart of an image fusion method provided in an embodiment of the present invention.
[0016] like Figure 1 As shown, an image fusion method includes the following steps: S1: Import the high-resolution hyperspectral image to be fused, the high-resolution multispectral image to be fused, multiple original high-resolution hyperspectral images, and the original high-resolution multispectral images corresponding to each of the original high-resolution hyperspectral images; S2: Perform downsampling processing on each of the original high-resolution hyperspectral images to obtain the original low-resolution hyperspectral images corresponding to each of the original high-resolution hyperspectral images; S3: Construct a training model, which includes a feature analysis network, a target vector field construction network, and a prediction vector field construction network; S4: Perform conditional vector analysis on each of the original high-resolution multispectral images through the feature analysis network to obtain the target conditional vector corresponding to each of the original high-resolution hyperspectral images; S5: The target vector field network is used to perform target vector field analysis on each of the original high-resolution hyperspectral images and the original low-resolution hyperspectral images corresponding to each of the original high-resolution hyperspectral images, so as to obtain the target time step vector corresponding to each of the original high-resolution hyperspectral images and the target vector field corresponding to each of the original high-resolution hyperspectral images. S6: The prediction vector field is used to construct a network to perform prediction vector field analysis on each of the target time step vectors, the original high-resolution multispectral images corresponding to each of the original high-resolution hyperspectral images, and the target condition vectors corresponding to each of the original high-resolution hyperspectral images, so as to obtain the prediction vector field corresponding to each of the original high-resolution hyperspectral images. S7: Perform parameter update analysis on the trained model based on all the target vector fields and all the predicted vector fields to obtain the image fusion model; S8: The image fusion model is used to perform image fusion on the high-resolution hyperspectral image to be fused and the high-resolution multispectral image to be fused to obtain the image fusion result.
[0017] It should be understood that data processing is performed on the acquired hyperspectral (i.e., the high-resolution hyperspectral image to be fused and multiple original high-resolution hyperspectral images) and multispectral (i.e., the high-resolution multispectral image to be fused and multiple original high-resolution multispectral images) images. A low-resolution hyperspectral image is generated by performing a four-fold spatial downsampling on the original high-resolution hyperspectral image (i.e., the high-resolution hyperspectral image to be fused and multiple original high-resolution hyperspectral images). (i.e., the low-resolution hyperspectral image to be fused and multiple original low-resolution hyperspectral images), and the corresponding original high-resolution multispectral image. (i.e., the high-resolution multispectral image to be fused and multiple original high-resolution multispectral images).
[0018] Specifically, the size is Low spatial resolution hyperspectral images are denoted as (i.e., the low-resolution hyperspectral image to be fused and multiple original low-resolution hyperspectral images), whose corresponding high-resolution hyperspectral and multispectral images can be denoted as follows: (i.e., the high-resolution hyperspectral image to be fused and multiple original high-resolution hyperspectral images) and (i.e., the high-resolution multispectral image to be fused and multiple original high-resolution multispectral images), where and These are the height and width of the image, respectively. It represents the number of bands in the image. (i.e., the high-resolution hyperspectral image to be fused and multiple original high-resolution hyperspectral images) and (That is, the low-resolution hyperspectral image to be fused and multiple original low-resolution hyperspectral images) have the same spectral resolution, but The spatial resolution of (i.e., the high-resolution hyperspectral image to be fused and multiple original high-resolution hyperspectral images) is (i.e., the low-resolution hyperspectral image to be fused and multiple original low-resolution hyperspectral images) is 4 times that, (i.e., the low-resolution hyperspectral image to be fused and multiple original low-resolution hyperspectral images) Size: . (i.e., the high-resolution hyperspectral image to be fused and multiple original high-resolution hyperspectral images) and (That is, the high-resolution multispectral image to be fused and multiple original high-resolution multispectral images) have the same spatial resolution, but The number of bands in the image (i.e., the high-resolution multispectral image to be fused and multiple original high-resolution multispectral images) is [number missing]. ,Right now (i.e., the high-resolution multispectral image to be fused and multiple original high-resolution multispectral images) Size: .
[0019] In the above embodiments, a low-resolution hyperspectral image is obtained by downsampling the original high-resolution hyperspectral image. A target condition vector is obtained by analyzing the condition vector of the original high-resolution multispectral image through a feature analysis network. A target time step vector and a target vector field are obtained by analyzing the target vector field of the original high-resolution hyperspectral image and the original low-resolution hyperspectral image through a target vector field construction network. A prediction vector field is obtained by analyzing the prediction vector field of the target time step vector, the original high-resolution multispectral image, and the target condition vector through a prediction vector field construction network. An image fusion model is obtained by updating the parameters of the training model based on the target vector field and the prediction vector field. The image fusion model is used to fuse the high-resolution hyperspectral image to be fused and the high-resolution multispectral image to be fused to obtain the image fusion result. This achieves consistency optimization of spectral features and spatial structure, improves the spectral fidelity of the fused image, and ensures spatial clarity. It effectively avoids the spectral distortion problem caused by the difference in sensor spectral response in traditional fusion methods, and overcomes the defects of existing single-step regression methods, such as lack of constraints in the intermediate process and insufficient stability of the results.
[0020] Optionally, as an embodiment of the present invention, the feature analysis network includes multiple downsampling blocks, a global average pooling layer, a first linear projection layer, a second linear projection layer, and a third linear projection layer; The process of performing conditional vector analysis on each of the original high-resolution multispectral images through the feature analysis network to obtain the target conditional vector corresponding to each of the original high-resolution hyperspectral images includes: Importing multiple frequency indices, the time vector is obtained by calculating all the frequency indices using the first equation: , in, , , in, For time vectors, For time steps, For the first Frequency parameters of a frequency index, For frequency index, It is an exponential function; Each of the original high-resolution multispectral images is downsampled by multiple downsampling blocks to obtain the original high-resolution multispectral vector corresponding to each of the original high-resolution hyperspectral images. The global average pooling layer is used to pool each of the original high-resolution multispectral vectors to obtain pooled high-resolution multispectral vectors corresponding to each of the original high-resolution hyperspectral images. The first linear projection layer is used to project each pooled high-resolution multispectral vector to obtain a global feature vector corresponding to each original high-resolution hyperspectral image. The time vector and each of the global feature vectors are concatenated to obtain a first concatenated vector corresponding to each of the original high-resolution hyperspectral images. The second linear projection layer is used to project each of the first stitched vectors to obtain the projected vectors corresponding to each of the original high-resolution hyperspectral images. The third linear projection layer is used to project each of the projected vectors to obtain global condition vectors corresponding to each of the original high-resolution hyperspectral images. Dimension matching is performed on each of the global condition vectors to obtain the target condition vectors corresponding to each of the original high-resolution hyperspectral images.
[0021] Specifically, since the ordinary differential equation solver needs to smoothly interpolate the image state between different time points in the continuous time domain during the inference process, this invention uses fixed sinusoidal embedding to... Encoded as (i.e., the time vector), as shown in the following formula: , , here This indicates temporal embedding, while This controls the frequency (i.e., frequency parameter) of each sine and cosine component. Simultaneously, the encoder transmits... (i.e., the original high-resolution multispectral image) is encoded into a global feature vector. Then, space pooling is performed, as shown in the following formula: , in It consists of downsampling blocks and undergoes three downsampling processes. Represented as a global average pooling layer, For a linear projection layer, here Represented as the embedding of global multispectral information. Obtained. (i.e., time vector) and After generating the global feature vectors, they are concatenated along the channel dimension and then linearly projected to match the channel size of the bottleneck module, as shown in the following formula: , , here This represents the global condition vector, which is broadcast spatially. Conditional vectors are then formed. (i.e., the target condition vector).
[0022] In the above embodiments, conditional vector analysis is performed on each original high-resolution multispectral image through a feature analysis network to obtain the target conditional vector, thereby achieving consistency optimization of spectral features and spatial structure. This improves the spectral fidelity of the fused image while also ensuring spatial clarity.
[0023] Optionally, as an embodiment of the present invention, the target vector field construction network includes a bicubic upsampling layer; The process of constructing a network using the target vector field to perform target vector field analysis on each of the original high-resolution hyperspectral images and the corresponding original low-resolution hyperspectral images, to obtain the target time step vector and the target vector field corresponding to each of the original high-resolution hyperspectral images, includes: The bicubic upsampling layer is used to upsample each of the original low-resolution hyperspectral images to obtain an initial time step vector corresponding to each of the original high-resolution hyperspectral images. The target time step vector corresponding to each of the original high-resolution hyperspectral images is obtained by calculating each of the initial time step vectors and the original high-resolution hyperspectral images corresponding to each of the original high-resolution hyperspectral images using the second equation. The second equation is: , in, For the first The target time step vector corresponding to the original high-resolution hyperspectral image. For time steps, For the first The initial time step vector corresponding to each original high-resolution hyperspectral image. For the first The original high-resolution hyperspectral image corresponding to the original high-resolution hyperspectral image; The target vector field corresponding to each of the original high-resolution hyperspectral images is obtained by calculating the target time step vector using the third equation. The third equation is as follows: , in, For the first The target vector field corresponding to the original high-resolution hyperspectral image. For the first The target time step vector corresponding to the original high-resolution hyperspectral image. For time steps.
[0024] Specifically, this invention describes the fusion of hyperspectral and multispectral images as a continuous transmission process based on conditional flow matching. The initial state is set to... Bi-cubic upsampling to preserve its spectral structure, while using As a guiding condition, it drives the continuous process toward the target. The gradual evolution. Given an initial state. and Construct a linear interpolation path as shown in the following formula: , in During the training process Uniform sampling express The interpolated state at time t is represented by the time derivative of the target vector field for that path, as shown in the following equation: , in It is a constant vector field used to simplify model learning, which simplifies the learning objective. Continuous evolution is governed by the following ordinary differential equation, as shown below: .
[0025] In the above embodiments, the target vector field is constructed by the target vector field network to perform target vector field analysis on the original high-resolution hyperspectral image and the original low-resolution hyperspectral image respectively to obtain the target time step vector and the target vector field. This achieves consistency optimization of spectral features and spatial structure, which improves the spectral fidelity of the fused image while also ensuring spatial clarity.
[0026] Optionally, as an embodiment of the present invention, the prediction vector field construction network includes an encoder, a bottleneck module, and a decoder; The process of constructing a network using the predicted vector field to perform predicted vector field analysis on each of the target time step vectors, the original high-resolution multispectral images corresponding to each of the original high-resolution hyperspectral images, and the target condition vectors corresponding to each of the original high-resolution hyperspectral images, to obtain the predicted vector field corresponding to each of the original high-resolution hyperspectral images, includes: Each target time step vector and the original high-resolution multispectral image corresponding to each original high-resolution hyperspectral image are stitched together to obtain a second stitched vector corresponding to each original high-resolution hyperspectral image. The encoder performs downsampling processing on each of the second stitched vectors to obtain downsampled stitched vectors corresponding to each of the original high-resolution hyperspectral images. The bottleneck module adjusts the channel dimension of each downsampled stitched vector to obtain the adjusted stitched vector corresponding to each original high-resolution hyperspectral image. Each of the adjusted stitched vectors and the target condition vectors corresponding to each of the original high-resolution hyperspectral images are added element by element to obtain the added vectors corresponding to each of the original high-resolution hyperspectral images. The decoder performs upsampling on each of the summed vectors to obtain a prediction vector field corresponding to each of the original high-resolution hyperspectral images.
[0027] It should be understood that learnable time-varying vector fields The network model (i.e., predicting vector fields) adopts an encoder-bottleneck-decoder structure, which will determine the time step... (i.e., the target time step vector) and Conditional information obtained from the original high-resolution multispectral image is injected into the bottleneck layer to modulate the feature representation. Assume... Representing the basic hidden dimension. The encoder, bottleneck, and decoder modules are represented as follows: The formula is expressed as follows: , in, Indicates channel connection, This invention represents conditional embedding (i.e., target conditional vector), used to encode temporal and spatial information. At the encoder input, this invention... (i.e., the original high-resolution multispectral image) and features at the current time step (i.e., the target time step vector) are concatenated, and at the bottleneck layer, the condition vector is used. (i.e., the target condition vector) is subjected to global modulation.
[0028] Specifically, encoder Concatenation input It also applies convolutional blocks to progressively downsample the spatial resolution while expanding the channel dimension, as shown in the following equation: , Then the bottleneck module Further expand the feature channel dimension to And as the injection point for conditional information, as shown in the following formula: , Will (i.e., the target condition vector) and The features obtained from the (adjusted concatenated vector) are subjected to element-wise addition operations by the decoder. The obtained image features (i.e., the summed vector) are then upsampled to restore the original resolution, as shown in the following equation: , Here It is represented as a predicted vector field, which provides an update direction for the dynamics of the ordinary differential equations during the generation process.
[0029] In the above embodiments, the prediction vector field is obtained by constructing a network to perform prediction vector field analysis on the target time step vector, the original high-resolution multispectral image and the target condition vector. This effectively avoids the spectral distortion problem caused by the difference in sensor spectral response in traditional fusion methods, and also overcomes the defects of existing single-step regression methods, such as lack of constraints in the intermediate process and insufficient stability of the results.
[0030] Optionally, as an embodiment of the present invention, the process of performing parameter update analysis on the trained model based on all the target vector fields and all the predicted vector fields to obtain the image fusion model includes: By calculating the target vector field and the prediction vector field corresponding to each of the original high-resolution hyperspectral images using the fourth equation, the loss function corresponding to each of the original high-resolution hyperspectral images is obtained. The fourth equation is: , in, For the first The loss function corresponding to the original high-resolution hyperspectral image, For the expected operation, For the first The predicted vector field corresponding to the original high-resolution hyperspectral image. For the first The target vector field corresponding to the original high-resolution hyperspectral image. For time steps, For the first The target time step vector corresponding to the original high-resolution hyperspectral image. For the first The original high-resolution hyperspectral image corresponds to the original high-resolution multispectral image. It is the Euclidean norm; The parameters of the training model are updated according to all the loss functions to obtain the updated training model. This process continues until a preset number of iterations is reached, at which point the updated training model is used as the image fusion model.
[0031] Specifically, in the training process of model-based deep learning networks, the learnable time-varying vector field is minimized. (i.e., the predicted vector field) and the true vector field The mean square error between (i.e., the target vector field) is as follows: , in To ensure uniform sampling, calculation is performed using the formula. By taking an expectation of the training dataset and the sampling time step, costly ordinary differential equation simulations are avoided during backpropagation, and efficient training is achieved.
[0032] In the above embodiments, the training model is updated and analyzed based on all target vector fields and all prediction vector fields to obtain an image fusion model. This avoids the expensive simulation of ordinary differential equations during backpropagation and achieves effective training, overcoming the shortcomings of existing single-step regression methods, such as lack of constraints in the intermediate process and insufficient stability of the results.
[0033] Optionally, as an embodiment of the present invention, after the process of using the updated trained model as an image fusion model, the method further includes: The fifth equation is used to calculate the predicted vector fields and the initial time step vectors corresponding to the original high-resolution hyperspectral images, respectively, to obtain the target fused image corresponding to each of the original high-resolution hyperspectral images. The fifth equation is: , in, For the first The target fused image corresponding to the original high-resolution hyperspectral image For the first The initial time step vector corresponding to each original high-resolution hyperspectral image. For the first The predicted vector field corresponding to the original high-resolution hyperspectral image. For time steps, For the first The target time step vector corresponding to the original high-resolution hyperspectral image. For the first The original high-resolution hyperspectral image corresponds to the original high-resolution multispectral image.
[0034] Specifically, the continuous evolution process is controlled by ordinary differential equations. After obtaining the learnable time-varying vector field, the ordinary differential equations are defined and solved, as follows: , in Represents a learnable time-varying vector field (i.e., a prediction vector field), by The neural network is parameterized under the condition of (i.e., the original high-resolution multispectral image). During the inference phase, the learned model learns the dynamic equations through numerical integration, from the initial estimate... (i.e., the initial time step vector), evolves step by step, and finally generates the fusion result. (i.e., target fusion image), the process is as follows: .
[0035] In the above embodiments, the target fused image is obtained by calculating each prediction vector field and the initial time step vector through the fifth equation, thereby achieving consistency optimization of spectral features and spatial structure. While improving the spectral fidelity of the fused image, it also ensures the spatial clarity, effectively avoiding the spectral distortion problem caused by the difference in sensor spectral response in traditional fusion methods, and overcoming the defects of existing single-step regression methods such as lack of constraints in the intermediate process and insufficient stability of results.
[0036] Optionally, as another embodiment of the present invention, the present invention provides a hyperspectral and multispectral image fusion method based on conditional flow matching. This method transforms the fusion problem into a deterministic generative reconstruction task conditioned on multispectral images. By constructing a deterministic continuous flow from the interpolation initial state to the high-fidelity fusion result, the consistency optimization of spectral features and spatial structure is achieved, thereby improving the spectral fidelity of the fused image while ensuring spatial clarity.
[0037] Optionally, as another embodiment of the present invention, the present invention introduces conditional flow matching theory into the task of hyperspectral-multispectral image fusion of remote sensing images, and establishes a new conditional flow matching network model framework that does not require simulation, by learning the continuous transformation between distributions through a learnable time-varying vector field.
[0038] Alternatively, as another embodiment of the present invention, the specific steps of the present invention are as follows: Step a: Process the acquired hyperspectral and multispectral images by performing a four-fold spatial downsampling on the original high-resolution hyperspectral image to generate a low-resolution hyperspectral image. and the corresponding original high-resolution multispectral image ; Step b: Construction of the conditional embedding system, integrating temporal processes and spatial structure; Step c: Construct a conditional flow matching network model framework based on conditional flow matching theory to obtain a learnable time-varying vector field; Step d: Obtain the pre-learnable time-varying vector field and perform iterative processing simultaneously; Step e: Define the ordinary differential equation and solve it using the Dopri5 solver; Step f: Construct a loss function and optimize the parameters of the remote sensing image fusion network until the upper limit of the number of training iterations is reached, and obtain a well-trained remote sensing image fusion network.
[0039] Optionally, as another embodiment of the present invention, compared with the prior art, the present invention models the fusion process of hyperspectral and multispectral images as a continuous-time evolution process, thereby achieving spatial resolution enhancement of the hyperspectral image. Specifically, using a bicubic interpolated hyperspectral image that retains complete spectral information as the initial state, the image state is gradually updated in the continuous time domain, and the spatial structure information of the multispectral image is injected into the evolution process as a constraint by introducing time-related conditional embedding. In this way, the present invention uses a conditional vector field to constrain the image evolution direction, so that the spectral information is mainly maintained by the initial state, while the spatial details are gradually enhanced during the evolution process, thereby achieving relatively independent control of spectral fidelity and spatial resolution enhancement, effectively avoiding the spectral distortion problem caused by differences in sensor spectral response in traditional fusion methods. In addition, the present invention learns the deterministic continuous evolution trajectory from the initial state to the target high-resolution hyperspectral image and applies constraints in the continuous time dimension, making the changes in the fusion result in the spectral and spatial dimensions smoother and more controllable, overcoming the shortcomings of existing single-step regression methods that lack constraints in the intermediate process and have insufficient stability.
[0040] Optionally, as another embodiment of the present invention, the environment used in this embodiment of the present invention is as follows: the server CPU is Intel Xeon E5-2665, the GPU is NVIDIA GTX2080Ti, the operating system is Ubuntu 18.04, and the compilation environment is PyTorch 1.1.0, Python 3.5, CUDA 9.0, and CUDNN 7.1.
[0041] Alternatively, as another embodiment of the present invention, such as Figure 3 As shown, Figure 3 This presents the subjective results of this invention on the CAVE dataset. The comparison methods selected are representative deep learning methods from recent years. Figure 3The first and third rows show the fusion results, while the second and fourth rows show the corresponding error maps. Warm colors indicate larger deviations from the original high spatial spectral resolution image. The scene contains narrow connection regions between balloons. In the error maps, classical methods (such as PNN and MSDCNN) show large areas of high error on the balloon surface, indicating significant spectral distortion. Although SSR and TFNet mitigate some errors, noticeable warm-colored patches are still observed in the shadow areas. The error maps generated by MOG, PSRT, and DCT are generally cooler in tone, but still retain some local yellow spots. In contrast, the error map generated by the proposed method exhibits the darkest and most spatially uniform distribution on the balloon surface. Crucially, the connection structures maintain clear boundary contours in both the fused image and the error maps, rather than diffuse transitions. These results demonstrate that the present invention, based on a continuous evolution framework driven by ordinary differential equations, can effectively preserve fine structural details and maintain high spectral fidelity.
[0042] Optionally, as another embodiment of the present invention, compared with existing methods, the advantages and positive effects of the present invention are: the fusion process of hyperspectral and multispectral images is described as a continuous transmission process based on conditional flow matching theory. The initial state is set to... Bicubic upsampling is used to preserve its spectral structure. Used to guide towards the target The evolution of the spectrum. This continuous, regression-based trajectory facilitates smooth spectral spatial evolution and is beneficial for maintaining spectral fidelity throughout the image fusion process.
[0043] Alternatively, as another embodiment of the present invention, such as Figure 4 As shown, the present invention includes the following modules: The first module performs data processing on the acquired hyperspectral and multispectral images; The second module is the construction of the conditional embedding system; The third module involves constructing a conditional flow matching and fusion network model. The fourth module involves acquiring learnable time-varying vector fields. Simultaneously perform iterative processing; The fifth module defines ordinary differential equations and uses the Dopri5 solver to solve them. The sixth module constructs a loss function to optimize the network model and updates the weights and biases in the network.
[0044] Figure 5 This is a block diagram of an image fusion device provided in an embodiment of the present invention.
[0045] Alternatively, as another embodiment of the present invention, such as Figure 5As shown, an image fusion apparatus includes: The import module is used to import the high-resolution hyperspectral image to be fused, the high-resolution multispectral image to be fused, multiple original high-resolution hyperspectral images, and the original high-resolution multispectral images corresponding to each of the original high-resolution hyperspectral images. The downsampling processing module is used to perform downsampling processing on each of the original high-resolution hyperspectral images to obtain the original low-resolution hyperspectral images corresponding to each of the original high-resolution hyperspectral images. The model building module is used to build a training model, which includes a feature analysis network, a target vector field construction network, and a prediction vector field construction network. The conditional vector analysis module is used to perform conditional vector analysis on each of the original high-resolution multispectral images through the feature analysis network to obtain the target conditional vector corresponding to each of the original high-resolution hyperspectral images. The target vector field analysis module is used to construct a network through the target vector field to perform target vector field analysis on each of the original high-resolution hyperspectral images and the original low-resolution hyperspectral images corresponding to each of the original high-resolution hyperspectral images, so as to obtain the target time step vector corresponding to each of the original high-resolution hyperspectral images and the target vector field corresponding to each of the original high-resolution hyperspectral images. The prediction vector field analysis module is used to perform prediction vector field analysis on each of the target time step vectors, the original high-resolution multispectral images corresponding to each of the original high-resolution hyperspectral images, and the target condition vectors corresponding to each of the original high-resolution hyperspectral images through the prediction vector field construction network, so as to obtain the prediction vector field corresponding to each of the original high-resolution hyperspectral images. The parameter update analysis module is used to perform parameter update analysis on the training model based on all the target vector fields and all the prediction vector fields to obtain the image fusion model. The image fusion result acquisition module is used to perform image fusion on the high-resolution hyperspectral image to be fused and the high-resolution multispectral image to be fused through the image fusion model to obtain the image fusion result.
[0046] Optionally, as an embodiment of the present invention, the feature analysis network includes multiple downsampling blocks, a global average pooling layer, a first linear projection layer, a second linear projection layer, and a third linear projection layer; The condition vector analysis module is specifically used for: Importing multiple frequency indices, the time vector is obtained by calculating all the frequency indices using the first equation: , in, , , in, For time vectors, For time steps, For the first Frequency parameters of a frequency index, For frequency index, It is an exponential function; Each of the original high-resolution multispectral images is downsampled by multiple downsampling blocks to obtain the original high-resolution multispectral vector corresponding to each of the original high-resolution hyperspectral images. The global average pooling layer is used to pool each of the original high-resolution multispectral vectors to obtain pooled high-resolution multispectral vectors corresponding to each of the original high-resolution hyperspectral images. The first linear projection layer is used to project each pooled high-resolution multispectral vector to obtain a global feature vector corresponding to each original high-resolution hyperspectral image. The time vector and each of the global feature vectors are concatenated to obtain a first concatenated vector corresponding to each of the original high-resolution hyperspectral images. The second linear projection layer is used to project each of the first stitched vectors to obtain the projected vectors corresponding to each of the original high-resolution hyperspectral images. The third linear projection layer is used to project each of the projected vectors to obtain global condition vectors corresponding to each of the original high-resolution hyperspectral images. Dimension matching is performed on each of the global condition vectors to obtain the target condition vectors corresponding to each of the original high-resolution hyperspectral images.
[0047] Optionally, another embodiment of the present invention provides an image fusion system, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the image fusion method described above. This system can be a computer or similar system.
[0048] Optionally, another embodiment of the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the image fusion method as described above.
[0049] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.
[0050] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the above-described apparatus and unit can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0051] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative. For instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed.
[0052] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of the embodiments of the present invention, depending on actual needs.
[0053] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0054] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0055] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. An image fusion method, characterized in that, Includes the following steps: Import the high-resolution hyperspectral image to be fused, the high-resolution multispectral image to be fused, multiple original high-resolution hyperspectral images, and the original high-resolution multispectral images corresponding to each of the original high-resolution hyperspectral images; Each of the original high-resolution hyperspectral images is downsampled to obtain an original low-resolution hyperspectral image corresponding to each of the original high-resolution hyperspectral images. A training model is constructed, which includes a feature analysis network, a target vector field construction network, and a prediction vector field construction network. The feature analysis network is used to perform conditional vector analysis on each of the original high-resolution multispectral images to obtain the target conditional vector corresponding to each of the original high-resolution hyperspectral images. The target vector field is used to construct a network to perform target vector field analysis on each of the original high-resolution hyperspectral images and the original low-resolution hyperspectral images corresponding to each of the original high-resolution hyperspectral images, so as to obtain the target time step vector and the target vector field corresponding to each of the original high-resolution hyperspectral images. The prediction vector field is used to construct a network to perform prediction vector field analysis on each of the target time step vectors, the original high-resolution multispectral images corresponding to each of the original high-resolution hyperspectral images, and the target condition vectors corresponding to each of the original high-resolution hyperspectral images, so as to obtain the prediction vector field corresponding to each of the original high-resolution hyperspectral images. Based on all the target vector fields and all the predicted vector fields, the training model is analyzed for parameter updates to obtain the image fusion model. The image fusion model is used to fuse the high-resolution hyperspectral image and the high-resolution multispectral image to be fused, and the image fusion result is obtained.
2. The image fusion method according to claim 1, characterized in that, The feature analysis network includes multiple downsampling blocks, a global average pooling layer, a first linear projection layer, a second linear projection layer, and a third linear projection layer; The process of performing conditional vector analysis on each of the original high-resolution multispectral images through the feature analysis network to obtain the target conditional vector corresponding to each of the original high-resolution hyperspectral images includes: Importing multiple frequency indices, the time vector is obtained by calculating all the frequency indices using the first equation: , in, , , in, For time vectors, For time steps, For the first Frequency parameters of a frequency index, For frequency index, It is an exponential function; Each of the original high-resolution multispectral images is downsampled by multiple downsampling blocks to obtain the original high-resolution multispectral vector corresponding to each of the original high-resolution hyperspectral images. The global average pooling layer is used to pool each of the original high-resolution multispectral vectors to obtain pooled high-resolution multispectral vectors corresponding to each of the original high-resolution hyperspectral images. The first linear projection layer is used to project each pooled high-resolution multispectral vector to obtain a global feature vector corresponding to each original high-resolution hyperspectral image. The time vector and each of the global feature vectors are concatenated to obtain a first concatenated vector corresponding to each of the original high-resolution hyperspectral images. The second linear projection layer is used to project each of the first stitched vectors to obtain the projected vectors corresponding to each of the original high-resolution hyperspectral images. The third linear projection layer is used to project each of the projected vectors to obtain global condition vectors corresponding to each of the original high-resolution hyperspectral images. Dimension matching is performed on each of the global condition vectors to obtain the target condition vectors corresponding to each of the original high-resolution hyperspectral images.
3. The image fusion method according to claim 1, characterized in that, The target vector field construction network includes a bicubic upsampling layer; The process of constructing a network using the target vector field to perform target vector field analysis on each of the original high-resolution hyperspectral images and the corresponding original low-resolution hyperspectral images, to obtain the target time step vector and the target vector field corresponding to each of the original high-resolution hyperspectral images, includes: The bicubic upsampling layer is used to upsample each of the original low-resolution hyperspectral images to obtain an initial time step vector corresponding to each of the original high-resolution hyperspectral images. The target time step vector corresponding to each of the original high-resolution hyperspectral images is obtained by calculating each of the initial time step vectors and the original high-resolution hyperspectral images corresponding to each of the original high-resolution hyperspectral images using the second equation. The second equation is: , in, For the first The target time step vector corresponding to the original high-resolution hyperspectral image. For time steps, For the first The initial time step vector corresponding to each original high-resolution hyperspectral image. For the first The original high-resolution hyperspectral image corresponding to the original high-resolution hyperspectral image; The target vector field corresponding to each of the original high-resolution hyperspectral images is obtained by calculating the target time step vector using the third equation. The third equation is as follows: , in, For the first The target vector field corresponding to the original high-resolution hyperspectral image. For the first The target time step vector corresponding to the original high-resolution hyperspectral image. For time steps.
4. The image fusion method according to claim 1, characterized in that, The prediction vector field construction network includes an encoder, a bottleneck module, and a decoder; The process of constructing a network using the predicted vector field to perform predicted vector field analysis on each of the target time step vectors, the original high-resolution multispectral images corresponding to each of the original high-resolution hyperspectral images, and the target condition vectors corresponding to each of the original high-resolution hyperspectral images, to obtain the predicted vector field corresponding to each of the original high-resolution hyperspectral images, includes: Each target time step vector and the original high-resolution multispectral image corresponding to each original high-resolution hyperspectral image are stitched together to obtain a second stitched vector corresponding to each original high-resolution hyperspectral image. The encoder performs downsampling processing on each of the second stitched vectors to obtain downsampled stitched vectors corresponding to each of the original high-resolution hyperspectral images. The bottleneck module adjusts the channel dimension of each downsampled stitched vector to obtain the adjusted stitched vector corresponding to each original high-resolution hyperspectral image. Each of the adjusted stitched vectors and the target condition vectors corresponding to each of the original high-resolution hyperspectral images are added element by element to obtain the added vectors corresponding to each of the original high-resolution hyperspectral images. The decoder performs upsampling on each of the summed vectors to obtain a prediction vector field corresponding to each of the original high-resolution hyperspectral images.
5. The image fusion method according to claim 3, characterized in that, The process of performing parameter update analysis on the trained model based on all the target vector fields and all the predicted vector fields to obtain the image fusion model includes: By calculating the target vector field and the prediction vector field corresponding to each of the original high-resolution hyperspectral images using the fourth equation, the loss function corresponding to each of the original high-resolution hyperspectral images is obtained. The fourth equation is: , in, For the first The loss function corresponding to the original high-resolution hyperspectral image, For the expected operation, For the first The predicted vector field corresponding to the original high-resolution hyperspectral image. For the first The target vector field corresponding to the original high-resolution hyperspectral image. For time steps, For the first The target time step vector corresponding to the original high-resolution hyperspectral image. For the first The original high-resolution hyperspectral image corresponds to the original high-resolution multispectral image. It is the Euclidean norm; The parameters of the training model are updated according to all the loss functions to obtain the updated training model. This process continues until a preset number of iterations is reached, at which point the updated training model is used as the image fusion model.
6. The image fusion method according to claim 5, characterized in that, Following the process of using the updated trained model as an image fusion model, the method further includes: The fifth equation is used to calculate the predicted vector fields and the initial time step vectors corresponding to the original high-resolution hyperspectral images, respectively, to obtain the target fused image corresponding to each of the original high-resolution hyperspectral images. The fifth equation is: , in, For the first The target fused image corresponding to the original high-resolution hyperspectral image For the first The initial time step vector corresponding to each original high-resolution hyperspectral image. For the first The predicted vector field corresponding to the original high-resolution hyperspectral image. For time steps, For the first The target time step vector corresponding to the original high-resolution hyperspectral image. For the first The original high-resolution hyperspectral image corresponds to the original high-resolution multispectral image.
7. An image fusion apparatus, characterized in that, include: The import module is used to import the high-resolution hyperspectral image to be fused, the high-resolution multispectral image to be fused, multiple original high-resolution hyperspectral images, and the original high-resolution multispectral images corresponding to each of the original high-resolution hyperspectral images. The downsampling processing module is used to perform downsampling processing on each of the original high-resolution hyperspectral images to obtain the original low-resolution hyperspectral images corresponding to each of the original high-resolution hyperspectral images. The model building module is used to build a training model, which includes a feature analysis network, a target vector field construction network, and a prediction vector field construction network. The conditional vector analysis module is used to perform conditional vector analysis on each of the original high-resolution multispectral images through the feature analysis network to obtain the target conditional vector corresponding to each of the original high-resolution hyperspectral images. The target vector field analysis module is used to construct a network through the target vector field to perform target vector field analysis on each of the original high-resolution hyperspectral images and the original low-resolution hyperspectral images corresponding to each of the original high-resolution hyperspectral images, so as to obtain the target time step vector corresponding to each of the original high-resolution hyperspectral images and the target vector field corresponding to each of the original high-resolution hyperspectral images. The prediction vector field analysis module is used to perform prediction vector field analysis on each of the target time step vectors, the original high-resolution multispectral images corresponding to each of the original high-resolution hyperspectral images, and the target condition vectors corresponding to each of the original high-resolution hyperspectral images through the prediction vector field construction network, so as to obtain the prediction vector field corresponding to each of the original high-resolution hyperspectral images. The parameter update analysis module is used to perform parameter update analysis on the training model based on all the target vector fields and all the prediction vector fields to obtain the image fusion model. The image fusion result acquisition module is used to perform image fusion on the high-resolution hyperspectral image to be fused and the high-resolution multispectral image to be fused through the image fusion model to obtain the image fusion result.
8. The image fusion apparatus according to claim 7, characterized in that, The feature analysis network includes multiple downsampling blocks, a global average pooling layer, a first linear projection layer, a second linear projection layer, and a third linear projection layer; The condition vector analysis module is specifically used for: Importing multiple frequency indices, the time vector is obtained by calculating all the frequency indices using the first equation: , in, , , in, For time vectors, For time steps, For the first Frequency parameters of a frequency index, For frequency index, It is an exponential function; Each of the original high-resolution multispectral images is downsampled by multiple downsampling blocks to obtain the original high-resolution multispectral vector corresponding to each of the original high-resolution hyperspectral images. The global average pooling layer is used to pool each of the original high-resolution multispectral vectors to obtain pooled high-resolution multispectral vectors corresponding to each of the original high-resolution hyperspectral images. The first linear projection layer is used to project each pooled high-resolution multispectral vector to obtain a global feature vector corresponding to each original high-resolution hyperspectral image. The time vector and each of the global feature vectors are concatenated to obtain a first concatenated vector corresponding to each of the original high-resolution hyperspectral images. The second linear projection layer is used to project each of the first stitched vectors to obtain the projected vectors corresponding to each of the original high-resolution hyperspectral images. The third linear projection layer is used to project each of the projected vectors to obtain global condition vectors corresponding to each of the original high-resolution hyperspectral images. Dimension matching is performed on each of the global condition vectors to obtain the target condition vectors corresponding to each of the original high-resolution hyperspectral images.
9. An image fusion apparatus, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the image fusion method as described in any one of claims 1 to 6.
10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the image fusion method as described in any one of claims 1 to 6.