A multi-source image fusion method based on residual dense block network

CN117745559BActive Publication Date: 2026-06-23XIDIAN UNIV

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: XIDIAN UNIV
Filing Date: 2023-12-20
Publication Date: 2026-06-23

Smart Images

Figure CN117745559B_ABST

Patent Text Reader

Abstract

The application discloses a multi-source image fusion method based on a residual dense block network, and comprises the following steps: acquiring a first input image and a second input image; the first input image and the second input image are different types of images; performing block processing on the first input image and the second input image to obtain a plurality of first sub-input images and a plurality of second sub-input images; for each pair of first sub-input image and second sub-input image, comprising: pre-processing the first sub-input image and the second sub-input image to obtain a first intermediate input image and a second intermediate input image; inputting the first intermediate input image and the second intermediate input image into a fusion network to obtain an image fusion result; wherein the fusion network comprises a first feature extraction network, a second feature extraction network, an information transmission network and an image reconstruction network, and the first feature extraction network and the second feature extraction network each comprise a residual dense block. The application has better fusion effect and realizes information enhancement of the image.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of image processing technology, specifically relating to a multi-source image fusion method based on residual dense block networks. Background Technology

[0002] Due to hardware limitations, information captured by a single type of sensor or a single shooting setup cannot effectively and comprehensively describe an imaging scene. On the one hand, different types of sensors typically capture specific information from multiple angles. On the other hand, sensors with different shooting settings usually obtain limited information from the imaging scene. Images captured by different sensors or multiple shooting settings often contain complementary information; merging these complementary features into a single image can greatly enhance the ability to describe the scene. Therefore, research on image fusion technology is crucial. Multi-source image fusion extracts the most meaningful information from images acquired by different sensors and combines this information into a single image. It contains richer information, is more beneficial for subsequent processing or decision-making, and is currently a research hotspot in the field of image processing technology.

[0003] Multi-source image fusion includes pixel-level, feature-level, and decision-level fusion. Pixel-level fusion makes the most full and thorough use of source image information and is also the most popular research direction. Based on different source image decomposition methods, pixel-level multi-source image fusion methods include image fusion methods based on multi-scale transformation, image fusion methods based on sparse representation, and image fusion methods based on fuzzy logic. Specifically, these include wavelet transform, nonsubsampled contourlet transform (NSCT), nonsubsampled shearlet transform (NSST), and latent low-rank representation (LatLRR), etc.

[0004] Wavelet transform-based fusion algorithms typically select different fusion rules to calculate the high-frequency and low-frequency coefficients of the fused images based on their different characteristics. The entire wavelet transform-based fusion process can be represented in three steps: First, wavelet decomposition is used to decompose the two registered fusion images into different frequency bands. Then, the high-frequency and low-frequency sub-images of the two images are fused according to the set fusion rules. Finally, the inverse transform is performed on the fused wavelet pyramid to obtain the reconstructed fusion image, which is the final result. NSCT not only possesses excellent multi-directional and multi-scale properties but also translation invariance, effectively extracting image contour features. It is an image sparse representation technique composed of a non-subsampled pyramid filter bank and a non-subsampled directional filter bank. First, the non-subsampled pyramid filter bank is used to decompose the original image into a first-level scale decomposition, obtaining a low-frequency subband coefficient and a bandpass subband coefficient. Then, the non-subsampled directional filter bank is used to perform multi-directional decomposition on the obtained bandpass subband coefficient. Subsequently, the non-subsampled pyramid filter bank is used cyclically to perform scale decomposition on the low-frequency subband coefficients obtained from the previous level, and the non-subsampled directional filter bank is used again to perform directional decomposition on the bandpass coefficients at the current decomposition scale to generate high-frequency subband coefficients, thus completing the NSCT transformation of the image. NSST, based on non-subsampled pyramid filters and translation-invariant shear wave filter banks, has no limitations on decomposition direction and size, achieving flexibility and efficiency in scale, position, and orientation. Combining non-subsampled pyramid transform with different shear filters provides multi-scale and multi-directional characteristics. Furthermore, since the size of the shear filter is smaller than that of the directional filter, NSST can represent smaller scales, making it superior to NSCT. The implementation of NSST transform mainly includes two steps: performing n-level multi-scale decomposition of the source image using a non-subsampled pyramid to obtain n high-frequency sub-images and one low-frequency sub-image of the same size as the source image; mapping the shear wave from the pseudo-polarized coordinate system to the Cartesian coordinate system, and then performing an inverse Fourier transform to achieve translation-invariant NSST directional localization. The core idea of Latent Low-Rank Representation (LATLRR) is to represent the data matrix as a linear superposition of low-rank components, salient components, and noise. As an unsupervised algorithm, LATLRR can extract salient features from the data, utilizing the original data vector as a benchmark for classification features better than low-rank decomposition, without changing the rank of the data matrix, and is more robust to noise.

[0005] However, while the above-mentioned fusion methods can achieve good results in certain fusion tasks, they still have some shortcomings: First, the above-mentioned fusion methods usually require manual design of fusion rules, and considering the diversity of source images, the design of fusion rules will become increasingly complex; second, most of the above-mentioned fusion methods are only applicable to specific fusion tasks and do not have universality; the above-mentioned fusion methods have a large amount of computation and a large number of parameters, and are often not competitive in terms of time. Summary of the Invention

[0006] To address the aforementioned problems in the existing technology, this invention provides a multi-source image fusion method based on residual dense block networks. The technical problem to be solved by this invention is achieved through the following technical solution:

[0007] This invention provides a multi-source image fusion method based on residual dense block networks, comprising:

[0008] Acquire a first input image and a second input image; the first input image and the second input image are images of different types.

[0009] The first input image and the second input image are divided into blocks to obtain several first sub-input images and several second sub-input images;

[0010] For each pair of first sub-input images and second sub-input images, including:

[0011] Preprocessing the first sub-input image and the second sub-input image yields a first intermediate input image and a second intermediate input image;

[0012] The first intermediate input image and the second intermediate input image are input into a fusion network to obtain the image fusion result; wherein...

[0013] The fusion network includes a first feature extraction network, a second feature extraction network, an information transmission network, and an image reconstruction network. Both the first feature extraction network and the second feature extraction network include a residual dense block. The first feature extraction network is used to extract gradient information of the first intermediate input image based on the residual dense block, and the second feature extraction network is used to extract grayscale information of the second intermediate input image based on the residual dense block. The information transmission network is used to realize information exchange during the extraction of the gradient information and the grayscale information. The image reconstruction network is used to reconstruct the extracted gradient information and the grayscale information to obtain the image fusion result.

[0014] In one embodiment of the present invention, preprocessing the first sub-input image and the second sub-input image to obtain a first intermediate input image and a second intermediate input image includes:

[0015] The first intermediate input image is obtained by stitching together two first sub-input images and one second sub-input image;

[0016] The second intermediate input image is obtained by stitching together one first sub-input image and two second sub-input images.

[0017] In one embodiment of the present invention, the first feature extraction network and the second feature extraction network have the same network structure; the first feature extraction network includes a convolutional module and a residual dense block connected in sequence; wherein...

[0018] The convolutional module includes a convolutional layer with a 3×3 kernel;

[0019] The residual dense block comprises two 3×3 convolutional layers and one 1×1 convolutional layer connected in sequence. The input of the first 3×3 convolutional layer in the residual dense block is connected to the output of the 3×3 convolutional layer in the convolutional module. The input of the second 3×3 convolutional layer in the residual dense block is connected to the output of the 3×3 convolutional layer in the convolutional module, and the output of the first 3×3 convolutional layer in the residual dense block is also connected. The 1×1 convolutional layer in the residual dense block... The input of the 1×1 convolutional layer is connected to the output of the 3×3 convolutional layer in the convolutional module, the output of the first 3×3 convolutional layer in the residual dense block, and the output of the second 3×3 convolutional layer in the residual dense block. The output of the 1×1 convolutional layer in the residual dense block is connected to the image reconstruction network. A residual gradient module is connected between the input of the first 3×3 convolutional layer in the residual dense block and the output of the second 3×3 convolutional layer in the residual dense block.

[0020] In one embodiment of the present invention, the information transmission network includes a first sub-information transmission block and a second sub-information transmission block; wherein,

[0021] The input and output of the first sub-information transmission block are both connected to the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the first feature extraction network and the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the second feature extraction network.

[0022] The input and output of the second sub-information transmission block are both connected to the output of the second convolutional layer with a 3×3 kernel in the residual dense block of the first feature extraction network and the output of the second convolutional layer with a 3×3 kernel in the residual dense block of the second feature extraction network.

[0023] In one embodiment of the present invention, the network structures of the first sub-information transmission block and the second sub-information transmission block are identical; the first sub-information transmission block includes a splicing layer and two convolutional layers with 1×1 kernels; wherein,

[0024] The input of the splicing layer in the first sub-information transmission block is connected to the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the first feature extraction network, and the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the second feature extraction network; the output of the splicing layer in the first sub-information transmission block is connected to the input of the first convolutional layer with a 1×1 kernel in the first sub-information transmission block, and the input of the second convolutional layer with a 1×1 kernel in the first sub-information transmission block; the output of the first convolutional layer with a 1×1 kernel in the first sub-information transmission block is connected to the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the first feature extraction network; the output of the second convolutional layer with a 1×1 kernel in the first sub-information transmission block is connected to the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the second feature extraction network.

[0025] In one embodiment of the present invention, the image reconstruction network includes a stitching layer, two convolutional layers with 3×3 kernels, and a convolutional layer with 1×1 kernel connected in sequence; wherein,

[0026] The input of the stitching layer in the image reconstruction network is connected to the output of the first feature extraction network and the output of the second feature extraction network; the output of the stitching layer in the image reconstruction network is sequentially connected to two convolutional layers with a kernel size of 3×3 and one convolutional layer with a kernel size of 1×1 in the image reconstruction network.

[0027] In one embodiment of the present invention, the fusion network is a pre-trained network structure; the corresponding fusion network training process includes:

[0028] Obtain the training input image set;

[0029] Each training input image in the training input image set is divided into blocks to obtain several corresponding training sub-input images, and all training sub-input images are combined to form a new training input image set.

[0030] The initial fusion network is trained by using every two training sub-input images from the new training input image set as inputs to the initial fusion network. During the training process, the construction of the total loss function takes into account grayscale loss and structural similarity loss.

[0031] In one embodiment of the present invention, the total loss function formula is expressed as:

[0032] L=λL ssim +L p ;

[0033] Where L represents the total loss function, λ represents the balance factor, and L ssimL represents the structural similarity loss function. ssim =ω1·(1-SSIM(O,I1))+ω2·(1-SSIM(O,I2)), where SSIM(·) represents the similarity function, O represents the fusion result of the training images after each training iteration, I1 and I2 represent the two training sub-input images input during training, ω1 and ω2 represent the weight factors, and L p Represents the grayscale loss function. ||·||1 represents calculating the L1 norm, max(·) represents calculating the maximum value, and H and W represent the height and width of each training sub-input image.

[0034] In one embodiment of the present invention, before training the initial fusion network based on a new training image set, the method further includes:

[0035] For every two training sub-input images in the new training input image set, including:

[0036] Two intermediate training input images are obtained by preprocessing the two training sub-input images.

[0037] In one embodiment of the present invention, preprocessing two training sub-input images to obtain two intermediate training input images includes:

[0038] The first intermediate training input image is obtained by stitching together the two first training sub-input images and the second training sub-input image;

[0039] The second intermediate training input image is obtained by stitching together one first training sub-input image and two second training sub-input images.

[0040] The beneficial effects of this invention are:

[0041] The multi-source image fusion method based on residual dense block networks proposed in this invention can achieve fast and high-quality multi-source image fusion. The method includes: acquiring a first input image and a second input image; the first and second input images are images of different types; dividing the first and second input images into blocks to obtain several first sub-input images and several second sub-input images; for each pair of first and second sub-input images, preprocessing the first and second sub-input images to obtain a first intermediate input image and a second intermediate input image; inputting the first and second intermediate input images into a fusion network to obtain an image fusion result; wherein the fusion network includes a first feature extraction network, a second feature extraction network, an information transmission network, and an image reconstruction network. Both the first and second feature extraction networks include a residual dense block; the first feature extraction network is used to extract gradient information of the first intermediate input image based on the residual dense block; the second feature extraction network is used to extract grayscale information of the second intermediate input image based on the residual dense block; the information transmission network is used to realize information exchange during the extraction of gradient and grayscale information; and the image reconstruction network is used to reconstruct the extracted gradient and grayscale information to obtain the image fusion result. As can be seen, this invention solves the problems of traditional image fusion methods, which only exchange information between adjacent pixels, cannot perceive the global environment, and require manual design of complex fusion rules. It can achieve high-quality multi-source image fusion. During the image fusion process, the fusion network extracts information from the gradient path and grayscale path respectively. On the same path, it performs dense connections based on residual dense blocks to achieve feature reuse, avoiding the information loss problem caused by convolution and enhancing the fusion network's ability to describe image details. At the same time, an information transmission network is introduced during the feature extraction process to exchange information between different paths. This can not only pre-fuse gradient and grayscale information, but also enhance the image information in subsequent processing, thereby improving the image fusion effect. Compared with traditional image fusion methods, the fusion network proposed in this invention has strong versatility and significantly reduces the amount of computation, enabling rapid multi-source image fusion.

[0042] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0043] Figure 1 This is a flowchart illustrating a multi-source image fusion method based on residual dense block networks provided in an embodiment of the present invention.

[0044] Figure 2 This is a schematic diagram of the structure of the fusion network provided in an embodiment of the present invention;

[0045] Figure 3 This is a schematic diagram of the structure of the first feature extraction network provided in an embodiment of the present invention;

[0046] Figure 4 This is a schematic diagram of the structure of the information transmission network provided in an embodiment of the present invention;

[0047] Figure 5 This is a schematic diagram of the structure of the first sub-information transmission block provided in an embodiment of the present invention;

[0048] Figure 6 This is a schematic diagram of the image reconstruction network provided in an embodiment of the present invention;

[0049] Figure 7 This is a schematic diagram of the training process of the fusion network provided in an embodiment of the present invention;

[0050] Figure 8 (a)~ Figure 8 (b) is a schematic diagram of two input test images provided in an embodiment of the present invention;

[0051] Figure 9 (a)~ Figure 9 (d) is a schematic diagram of the fusion result of LatLRR, NSCT, NSST and the method proposed in this invention. Detailed Implementation

[0052] The present invention will be further described in detail below with reference to specific embodiments, but the implementation of the present invention is not limited thereto.

[0053] Please see Figure 1 This invention provides a multi-source image fusion method based on residual dense block networks, comprising the following steps:

[0054] S10. Obtain the first input image and the second input image; the first input image and the second input image are images of different types.

[0055] In this embodiment of the invention, the first input image and the second input image are two images of different types. For example, the first input image can be a visible light image and the second input image can be an infrared image. The specific image types of the first input image and the second input image are not limited, and two different types of images can be selected according to the actual situation.

[0056] S20. Divide the first input image and the second input image into blocks to obtain several first sub-input images and several second sub-input images.

[0057] To improve image processing efficiency, in this embodiment of the invention, before image fusion, the first input image and the second input image are first divided into blocks. For example, the first input image and the second input image are respectively cropped into several image blocks of 120 pixels × 120 pixels, thus obtaining several first sub-input images and several second sub-input images.

[0058] S30. For each pair of first sub-input images and second sub-input images, the process includes: preprocessing the first sub-input images and second sub-input images to obtain a first intermediate input image and a second intermediate input image; and inputting the first intermediate input image and the second intermediate input image into a fusion network to obtain an image fusion result.

[0059] In this embodiment of the invention, preprocessing the first sub-input image and the second sub-input image to obtain the first intermediate input image and the second intermediate input image includes:

[0060] The first intermediate input image is obtained by stitching together two first sub-input images and one second sub-input image; the second intermediate input image is obtained by stitching together one first sub-input image and two second sub-input images.

[0061] Further, please see Figure 2 The fusion network used in this embodiment of the invention includes a first feature extraction network, a second feature extraction network, an information transmission network, and an image reconstruction network. Both the first and second feature extraction networks include a residual dense block. The first feature extraction network is used to extract gradient information from a first intermediate input image based on the residual dense block; the second feature extraction network is used to extract grayscale information from a second intermediate input image based on the residual dense block; the information transmission network is used to exchange information during the gradient and grayscale information extraction process; and the image reconstruction network is used to reconstruct the extracted gradient and grayscale information to obtain the image fusion result. The specific design for each network module is as follows:

[0062] The first feature extraction network and the second feature extraction network in this embodiment of the invention have the same network structure; please refer to [link to relevant documentation]. Figure 3 , Figure 4The first feature extraction network includes a convolutional module and a residual dense block connected in sequence. Each convolutional module includes a convolutional layer with a 3×3 kernel. The residual dense block includes two convolutional layers with 3×3 kernels and one convolutional layer with a 1×1 kernel, connected in sequence. The input of the first convolutional layer with a 3×3 kernel in the residual dense block is connected to the output of the convolutional layer with a 3×3 kernel in the convolutional module. Similarly, the input of the second convolutional layer with a 3×3 kernel in the residual dense block is connected to the output of the convolutional layer with a 3×3 kernel in the convolutional module. The output of the 3×1 convolutional layer is connected to the input of the 1×1 convolutional layer in the residual dense block, and the outputs of the 3×3 convolutional layers in the convolutional module, the first 3×3 convolutional layer in the residual dense block, and the second 3×3 convolutional layer in the residual dense block. The output of the 1×1 convolutional layer in the residual dense block is connected to the image reconstruction network. A residual gradient module is connected between the input of the first 3×3 convolutional layer in the residual dense block and the output of the second 3×3 convolutional layer in the residual dense block. Figure 4 In the diagram, 3×3Covn represents a convolutional layer with a 3×3 kernel, BN (Batch Normalization) represents a batch normalization layer, LReLU (Leaky Rectified Linear Unit) represents a leaky rectified linear unit activation layer, and 1×1Covn represents a convolutional layer with a 1×1 kernel. Similar representations will be used in subsequent diagrams without further explanation.

[0063] The information transmission network of this embodiment includes a first sub-information transmission block and a second sub-information transmission block; wherein, please refer to Figure 4 The input and output of the first sub-information transmission block are both connected to the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the first feature extraction network and the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the second feature extraction network; the input and output of the second sub-information transmission block are both connected to the output of the second convolutional layer with a 3×3 kernel in the residual dense block of the first feature extraction network and the output of the second convolutional layer with a 3×3 kernel in the residual dense block of the second feature extraction network.

[0064] In this embodiment of the invention, the network structures of the first sub-information transmission block and the second sub-information transmission block are the same; please refer to [link / reference]. Figure 5The first sub-information transmission block includes a splicing layer and two convolutional layers with 1×1 kernels. The input of the splicing layer in the first sub-information transmission block is connected to the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the first feature extraction network, and the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the second feature extraction network. The output of the splicing layer in the first sub-information transmission block is connected to the input of the first convolutional layer with a 1×1 kernel and the input of the second convolutional layer with a 1×1 kernel in the first sub-information transmission block. The output of the first convolutional layer with a 1×1 kernel in the first sub-information transmission block is connected to the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the first feature extraction network. The output of the second convolutional layer with a 1×1 kernel in the first sub-information transmission block is connected to the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the second feature extraction network.

[0065] Please see Figure 6 The image reconstruction network of this invention includes a stitching layer, two convolutional layers with 3×3 kernels and one convolutional layer with 1×1 kernel, connected in sequence. The input of the stitching layer in the image reconstruction network is connected to the output of the first feature extraction network and the output of the second feature extraction network. The output of the stitching layer in the image reconstruction network is connected in sequence to the two convolutional layers with 3×3 kernels and one convolutional layer with 1×1 kernel in the image reconstruction network.

[0066] The preprocessed first and second intermediate input images in this embodiment of the invention can better enhance the image information of the first and second input images. The fusion network then extracts corresponding information from the gradient path and grayscale path, respectively. Specifically, the first feature extraction network extracts gradient information from the first intermediate input image at the gradient path level, and the second feature extraction network extracts grayscale information from the second intermediate input image at the grayscale path level. The gradient information gives the fused image more accurate texture details, and the grayscale information gives the fused image a histogram distribution more similar to the source image. The information transmission network exchanges information between the gradient path and the grayscale path, which can pre-fuse the gradient and grayscale information of the image to achieve the purpose of enhancing image information. More specifically:

[0067] In both the gradient path and the grayscale path, convolutional modules are first used for feature extraction, followed by residual dense block processing on the same path to achieve feature reuse. Information loss is unavoidable during convolution; residual dense block processing can reduce information loss and improve feature utilization to some extent. Furthermore, the first and second sub-information transfer blocks are used for interactive communication during the residual dense block processing of the gradient path and the grayscale path, respectively. In this embodiment, the input of the 3×3 convolutional layer in the residual dense block depends not only on the output of all preceding 3×3 convolutional layers but also on the output of the 3×3 convolutional layer on the other path. A Batch Normalization (BN) layer and an activation layer are sequentially connected after the 3×3 convolutional layer in the convolutional module and the 3×3 convolutional layer in the residual dense block, respectively; a BN layer and an LReLU activation layer are sequentially connected after the 1×1 convolutional layer in the residual dense block. The structures of the first and second sub-information transfer blocks are as follows: Figure 5 As shown, a regular convolutional layer with a 1×1 kernel is used, followed by a BN batch normalization layer and an LReLU activation layer.

[0068] This invention utilizes residual dense blocks to enhance the fusion network's ability to describe image details and improve the quality of the fused image. In the residual dense block network, the main dense stream employs dense connections, while the residual gradient stream integrates gradient operations. The main dense stream uses two 3×3 convolutional layers, an LReLU activation function, and a regular convolutional layer with a 1×1 kernel. Dense connections are introduced into the main dense stream to fully utilize the features extracted from each convolutional layer. The residual gradient stream is implemented using a residual gradient module, specifically utilizing the Sobel operator as the gradient operator to calculate the gradient magnitude of the image, and using a regular convolutional layer with a 1×1 kernel to eliminate channel dimension differences. Then, the outputs of the main dense stream and the residual gradient stream are added together using an element-wise stacking method to integrate the depth and detail features of the image.

[0069] This invention employs a strategy of concatenation and convolutional reconstruction to fuse image features extracted from gradient paths and grayscale paths. It concatenates intermediate feature maps from the two paths along the channel direction, utilizing the idea of feature reuse. Specifically, the image reconstruction network in this invention consists of two cascaded convolutional layers with 3×3 kernels, a batch normalization (BN) layer, an LReLU activation layer, and a regular convolutional layer with a 1×1 kernel, a batch normalization (BN) layer, and a hyperbolic tangent (Tanh) activation layer.

[0070] In this embodiment of the invention, all convolutional layers in the fusion network are padded with pixels using the same method, meaning the output size is the same as the input size, and the stride is set to 1. Therefore, the convolutional layers in the image reconstruction network do not change the size of the intermediate feature maps.

[0071] Furthermore, the embodiments of the present invention employ, as follows: Figure 2 The fusion network shown is a pre-trained network structure; please refer to [link / reference]. Figure 7 The corresponding fusion network training process includes:

[0072] S301. Obtain the training input image set;

[0073] S302. Divide each training input image in the training input image set into blocks to obtain several corresponding training sub-input images, and form a new training input image set from all the training sub-input images.

[0074] S303. Take every two training sub-input images from the new training input image set as input to the initial fusion network, and train the initial fusion network to obtain the trained fusion network; wherein, during the training process, the construction of the total loss function takes into account grayscale loss and structural similarity loss.

[0075] In embodiments of the present invention, before training the initial fusion network based on a new training image set, the method further includes:

[0076] For each pair of training sub-input images in the new training input image set, the process includes: preprocessing the two training sub-input images to obtain two intermediate training input images. In this embodiment, preprocessing the two training sub-input images to obtain two intermediate training input images includes: concatenating two first training sub-input images and one second training sub-input image to obtain a first intermediate training input image; and concatenating one first training sub-input image and two second training sub-input images to obtain a second intermediate training input image. Therefore, during training, each training input image in the training input image set undergoes the same processing as the actual input image in S10, specifically as described in S20 and S30.

[0077] The loss function determines the type of information extracted and the proportional relationship between various types of information. Each batch of training input images is fed into the fusion network, and predicted values are output through forward propagation. The difference between the predicted and true values, i.e., the loss value, is calculated using the loss function. The fusion network updates its parameters through backpropagation, reducing the loss between the predicted and true values, thus making the predicted values generated by the fusion network move closer to the true values, thereby achieving the learning objective. According to the inventors' research, the total loss function used in the training part of the fusion network in this embodiment consists of two types of loss terms: grayscale loss and structural similarity loss, both constructed for the two source images. Furthermore, grayscale constraints provide a coarse pixel distribution, while structural similarity constraints enhance texture details. The joint constraints of both can achieve a better fusion effect, resulting in a fused image with a reasonable grayscale distribution and rich texture details. The fused image cannot retain all the information of the source image; a trade-off must be made between grayscale distribution and texture details to retain more image information. Therefore, the proportion of various types of information can be changed by adjusting the weight of each loss term, thereby adapting to different image fusion tasks. Specifically:

[0078] The total loss function in this embodiment of the invention comprises two terms, which are expressed as follows:

[0079] L=λL ssim +L p ;

[0080] Where L represents the total loss function, λ represents the balance factor, and L ssim L represents the structural similarity loss function. ssim =ω1·(1-SSIM(O,I1))+ω2·(1-SSIM(O,I2)), where SSIM(·) represents the similarity function, O represents the fusion result of the training images after each training iteration, I1 and I2 represent the two training sub-input images input during training, ω1 and ω2 represent the weight factors, and L p Represents the grayscale loss function. ||·||1 represents calculating the L1 norm, which is the sum of the absolute values of all elements; max(·) represents finding the maximum value; H and W represent the height and width of each training sub-input image. This invention defines a unified total loss function based on gradient and grayscale information. By adjusting the balance factor λ, the proportion of various types of information in the fused image is changed. Using a unified loss function adapts to different fusion tasks, demonstrating strong versatility and improving the robustness of the algorithm.

[0081] For example, the training input image set used in this embodiment of the invention is the publicly available TNO dataset, which is a dataset used for research on infrared and visible light image fusion. To obtain more training input images, a cropping and decomposition strategy is adopted, cropping each image in the TNO dataset into 120-pixel × 120-pixel image blocks. All the cropped image blocks constitute a new training input image set. During training, the gradient and grayscale information obtained by the first and second feature extraction networks are processed by the image reconstruction network to generate a fused image during training. This fused image, along with the two input images, calculates the fusion loss using the total loss function formula, and backpropagates it back to the fusion network to guide network training. During training, the fusion network updates its parameters in the direction of decreasing fusion loss. After multiple iterations, the total loss function gradually converges, and the network parameters at this point become the final network parameters of the fusion network, thus obtaining the trained fusion network. Finally, the first and second input images, after being segmented and preprocessed, are input into the trained fusion network to obtain the final image fusion result.

[0082] To verify the effectiveness of the multi-source image fusion method based on residual dense block network provided in this embodiment of the invention, the following experiments were conducted.

[0083] Experiment 1 is a subjective evaluation experiment, which uses two images from the public dataset "TNO" for testing. One of them is a visible light image, such as... Figure 8 As shown in (a), one is an infrared image. Figure 8 As shown in (b) Figure 9 (a) is the fused image obtained using the LatLRR method. Figure 9 (b) is the fused image obtained using the NSCT method. Figure 9 (c) is the fused image obtained using the NSST method. Figure 9 (d) is the fused image obtained using the method proposed in this invention. Figure 9 (a)~ Figure 9 (d) As can be seen, the image fusion results of LatLRR, NSCT, and NSST are not ideal: the LatLRR method, for example... Figure 9 As shown in (a), the overall image contrast is low and the edges are blurred; the NSCT method is as follows: Figure 9 As shown in (b), the image is generally dark and partially distorted; the NSST method is as follows: Figure 9 As shown in (c), the edge details are not prominent enough; while the fusion result of the method proposed in this invention is relatively ideal. The method proposed in this invention is as follows: Figure 9 As shown in (d), the edge structure is clear, the target is prominent and the texture information is complete. It can better preserve the detailed information in the two source images, without producing defects such as artifacts or strong noise.

[0084] Experiment 2 is an objective quantitative evaluation experiment, using standard deviation (SD), average gradient (AG), and spatial frequency (SF) as metrics. Standard deviation is an indicator reflecting the contrast and distribution of the fused image. It represents the degree to which the value of a single pixel in the image deviates from the mean pixel value. The human visual system is often more attracted to areas with high contrast. The larger the standard deviation, the more dispersed the gray levels, indicating that the image has higher contrast and better visual effect. Average gradient is an indicator that measures the gradient information of the fused image and uses it to characterize the texture details of the fused image. A higher average gradient of the fused image means that it contains richer gradient information. Spatial frequency is an indicator that reveals the details and texture information of the fused image by measuring the gradient distribution. A higher spatial frequency means that the image has richer edge and texture details. The method proposed in this invention was compared with LatLRR, NSCT, and NSST methods. The quantitative experimental results are shown in Table 1. The test images are 20 sets of images from the public dataset "TNO". These 20 sets of images contain different scene information. The values in Table 1 are the average values of the corresponding indicators of the fusion results of these 20 sets of images.

[0085] Table 1 Quantitative Evaluation Results

[0086] method LatLRR NSCT NSST Method of the present invention SD index 8.58852 8.62650 8.68704 9.01087 AG Indicators 6.59239 6.27221 6.98100 7.26317 SF Index 0.03499 0.05789 0.04315 0.07403

[0087] As shown in Table 1, the method proposed in this invention achieves higher values for standard deviation (SD), average gradient (AG), and spatial frequency (SF), indicating better performance and superiority over other existing fusion methods. This demonstrates that the proposed method performs well in edge structure preservation, texture information extraction, and contrast enhancement, resulting in superior visual effects. Therefore, the effectiveness of the proposed method is verified. The LatLRR, NSCT, and NSST methods do not handle noise interference meticulously during image decomposition, exhibiting severe edge blurring and lacking detail representation; their overall performance is inferior to the method proposed in this invention.

[0088] By combining subjective and objective evaluation criteria, the method proposed in this invention has better fusion effect and effectively achieves image information enhancement.

[0089] In summary, the multi-source image fusion method based on residual dense block networks proposed in this invention can achieve multi-source image fusion quickly and with high quality. The method includes: acquiring a first input image and a second input image; the first input image and the second input image are images of different types; dividing the first input image and the second input image into blocks to obtain several first sub-input images and several second sub-input images; for each pair of first sub-input images and second sub-input images, preprocessing the first sub-input images and the second sub-input images to obtain a first intermediate input image and a second intermediate input image; inputting the first intermediate input image and the second intermediate input image into a fusion network to obtain an image fusion result; wherein, the fusion network includes a first feature extraction network, a second feature extraction network, an information transmission network, and an image reconstruction network, and both the first feature extraction network and the second feature extraction network include a residual dense block; the first feature extraction network is used to extract gradient information of the first intermediate input image based on the residual dense block, the second feature extraction network is used to extract grayscale information of the second intermediate input image based on the residual dense block, the information transmission network is used to realize information exchange during the extraction of gradient information and grayscale information, and the image reconstruction network is used to reconstruct the extracted gradient information and grayscale information to obtain the image fusion result. As can be seen, the embodiments of the present invention solve the problems of traditional image fusion methods that only exchange information between adjacent pixels, cannot perceive the global environment, and require manual design of complex fusion rules. It can quickly and with high quality achieve multi-source image fusion. In the image fusion process, the fusion network extracts information from the gradient path and the grayscale path respectively. On the same path, it performs dense connections based on residual dense blocks to achieve feature reuse, avoiding the information loss problem caused by convolution and enhancing the fusion network's ability to describe image details. At the same time, an information transmission network is introduced in the feature extraction process to exchange information between different paths. This can not only pre-fuse gradient information and grayscale information, but also enhance the image information in subsequent processing, thereby improving the image fusion effect. Compared with traditional image fusion methods, the fusion network proposed in the embodiments of the present invention has strong versatility and significantly reduces the amount of computation, enabling rapid multi-source image fusion.

[0090] In the description of this invention, it should be understood that the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.

[0091] Although the invention has been described herein in conjunction with various embodiments, those skilled in the art, by reviewing the specification and accompanying drawings, will understand and implement other variations of the disclosed embodiments in carrying out the claimed invention. In the specification, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. While certain measures are described in different embodiments, this does not mean that these measures cannot be combined to produce good results.

[0092] The above description, in conjunction with specific preferred embodiments, provides a further detailed explanation of the present invention. It should not be construed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art, various simple deductions or substitutions can be made without departing from the concept of the present invention, and all such modifications and substitutions should be considered within the scope of protection of the present invention.

Claims

1. A multi-source image fusion method based on residual dense block networks, characterized in that, include: Obtain the first input image and the second input image; The first input image and the second input image are images of different types; The first input image and the second input image are divided into blocks to obtain several first sub-input images and several second sub-input images; For each pair of first sub-input images and second sub-input images, including: Preprocessing the first sub-input image and the second sub-input image yields a first intermediate input image and a second intermediate input image; The first intermediate input image and the second intermediate input image are input into a fusion network to obtain the image fusion result; wherein... The fusion network includes a first feature extraction network, a second feature extraction network, an information transmission network, and an image reconstruction network. Both the first and second feature extraction networks include a residual dense block. The first feature extraction network extracts gradient information of the first intermediate input image based on the residual dense block, and the second feature extraction network extracts grayscale information of the second intermediate input image based on the residual dense block. The information transmission network facilitates information exchange during the extraction of the gradient information and the grayscale information. The image reconstruction network reconstructs the extracted gradient information and grayscale information to obtain an image. The fusion result; the information transmission network includes a first sub-information transmission block and a second sub-information transmission block; wherein, the input and output of the first sub-information transmission block are both connected to the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the first feature extraction network and the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the second feature extraction network; the input and output of the second sub-information transmission block are both connected to the output of the second convolutional layer with a 3×3 kernel in the residual dense block of the first feature extraction network and the output of the second convolutional layer with a 3×3 kernel in the residual dense block of the second feature extraction network.

2. The multi-source image fusion method based on residual dense block networks according to claim 1, characterized in that, Preprocessing the first sub-input image and the second sub-input image to obtain a first intermediate input image and a second intermediate input image includes: The first intermediate input image is obtained by stitching together two first sub-input images and one second sub-input image; The second intermediate input image is obtained by stitching together one first sub-input image and two second sub-input images.

3. The multi-source image fusion method based on residual dense block networks according to claim 1, characterized in that, The first feature extraction network has the same network structure as the second feature extraction network; the first feature extraction network includes convolutional modules and residual dense blocks connected in sequence; wherein, The convolutional module includes a convolutional layer with a 3×3 kernel; The residual dense block comprises two 3×3 convolutional layers and one 1×1 convolutional layer connected in sequence. The input of the first 3×3 convolutional layer in the residual dense block is connected to the output of the 3×3 convolutional layer in the convolutional module. The input of the second 3×3 convolutional layer in the residual dense block is connected to the output of the 3×3 convolutional layer in the convolutional module, and the output of the first 3×3 convolutional layer in the residual dense block is also connected. The 1×1 convolutional layer in the residual dense block... The input of the 1×1 convolutional layer is connected to the output of the 3×3 convolutional layer in the convolutional module, the output of the first 3×3 convolutional layer in the residual dense block, and the output of the second 3×3 convolutional layer in the residual dense block. The output of the 1×1 convolutional layer in the residual dense block is connected to the image reconstruction network. A residual gradient module is connected between the input of the first 3×3 convolutional layer in the residual dense block and the output of the second 3×3 convolutional layer in the residual dense block.

4. The multi-source image fusion method based on residual dense block networks according to claim 1, characterized in that, The first sub-information transmission block and the second sub-information transmission block have the same network structure; the first sub-information transmission block includes a splicing layer and two convolutional layers with 1×1 kernels; wherein, The input of the splicing layer in the first sub-information transmission block is connected to the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the first feature extraction network, and the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the second feature extraction network; the output of the splicing layer in the first sub-information transmission block is connected to the input of the first convolutional layer with a 1×1 kernel in the first sub-information transmission block, and the input of the second convolutional layer with a 1×1 kernel in the first sub-information transmission block; the output of the first convolutional layer with a 1×1 kernel in the first sub-information transmission block is connected to the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the first feature extraction network; the output of the second convolutional layer with a 1×1 kernel in the first sub-information transmission block is connected to the output of the first convolutional layer with a 3×3 kernel in the residual dense block of the second feature extraction network.

5. The multi-source image fusion method based on residual dense block networks according to claim 1, characterized in that, The image reconstruction network comprises a stitching layer, two 3×3 convolutional layers, and a 1×1 convolutional layer connected in sequence; wherein, The input of the stitching layer in the image reconstruction network is connected to the output of the first feature extraction network and the output of the second feature extraction network; the output of the stitching layer in the image reconstruction network is sequentially connected to two convolutional layers with a kernel size of 3×3 and one convolutional layer with a kernel size of 1×1 in the image reconstruction network.

6. The multi-source image fusion method based on residual dense block networks according to claim 1, characterized in that, The fusion network is a pre-trained network structure; The corresponding fusion network training process includes: Obtain the training input image set; Each training input image in the training input image set is divided into blocks to obtain several corresponding training sub-input images, and all training sub-input images are combined to form a new training input image set. The initial fusion network is trained by using every two training sub-input images from the new training input image set as inputs to the initial fusion network. During the training process, the construction of the total loss function takes into account grayscale loss and structural similarity loss.

7. The multi-source image fusion method based on residual dense block networks according to claim 6, characterized in that, The total loss function formula is expressed as follows: ； in, Represents the total loss function. Represents the balance factor. Represents the structural similarity loss function. , This represents the similarity function. This represents the fused training images output after each training session. , This represents the two training sub-input images input during the training process. , Indicates the weighting factor. Represents the grayscale loss function. , This indicates that we are looking for the L1 norm. This indicates finding the maximum value. , This represents the height and width of each training sub-input image.

8. The multi-source image fusion method based on residual dense block networks according to claim 6, characterized in that, Before training the initial fusion network based on the new training image set, the following steps are also included: For every two training sub-input images in the new training input image set, including: Two intermediate training input images are obtained by preprocessing the two training sub-input images.

9. The multi-source image fusion method based on residual dense block networks according to claim 8, characterized in that, Preprocessing the two training sub-input images yields two intermediate training input images, including: The first intermediate training input image is obtained by stitching together the two first training sub-input images and the second training sub-input image; The second intermediate training input image is obtained by stitching together one first training sub-input image and two second training sub-input images.