Image fusion method and device based on feedback type light adaptation
By adopting a feedback-based illumination-adaptive image fusion method, and combining the joint training of the illumination-adaptive network and the fusion network, the quality problem of infrared and visible light image fusion under low illumination is solved, achieving high-quality image fusion and illumination level optimization, which is suitable for image processing in terminal devices.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- WUHAN UNIV
- Filing Date
- 2024-11-15
- Publication Date
- 2026-06-16
AI Technical Summary
Existing infrared and visible light image fusion methods struggle to maintain high-quality fusion in low-light or nighttime environments, exhibiting problems such as low brightness, insufficient contrast, loss of texture details, and color distortion. Furthermore, existing methods are not ideal in terms of lighting adjustment, resulting in uneven lighting levels in the fused images.
A feedback-based illumination-adaptive image fusion method is adopted. By jointly training the illumination-adaptive network and the fusion network, the feedback loss function is calculated using the prediction probability array of the feedback network. Combined with spatial consistency, exposure control, color constancy and smoothing loss functions, the illumination level of the image is optimized. Sobel/Laplacian residual blocks and multi-scale spatial/channel attention modules are designed to extract and fuse features.
It achieves high-quality image fusion under various lighting conditions, improves the brightness, texture and color performance of the fused image, enhances the quality of the fused image, has strong adaptability, and can promote applications such as pedestrian detection in advanced computer vision tasks.
Smart Images

Figure CN119323522B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image enhancement technology, and in particular to an image fusion method, apparatus, storage medium and electronic device based on feedback-based adaptive illumination. Background Technology
[0002] Infrared and visible light image fusion can effectively combine the advantages of both source images, preserving important target information and rich texture details. In recent years, significant progress has been made in the field of infrared and visible light image fusion, with main methods falling into two categories: traditional methods and deep learning-based methods.
[0003] Traditional methods rely on multi-scale transformations, sparse representations, subspace clustering, optimization algorithms, and hybrid methods. Although existing image fusion methods have achieved certain results, several problems still exist. For example, due to ambient lighting limitations, visible light images often exhibit low brightness, insufficient contrast, loss of texture details, and color distortion. These problems are only applicable to well-lit scenes, and it is difficult to maintain high-quality image fusion in nighttime or low-light environments. Summary of the Invention
[0004] This application provides an image fusion method, apparatus, storage medium, and electronic device based on feedback-based adaptive illumination, which can obtain high-quality fused images.
[0005] This application provides an image fusion method based on feedback-based adaptive illumination, including:
[0006] Obtain visible light image datasets and infrared image datasets;
[0007] The visible light image dataset is input into the illumination adaptive network to obtain the enhanced image dataset;
[0008] The infrared image dataset and the enhanced image dataset are input into the fusion network to obtain the fused image dataset;
[0009] The fused image dataset is input into a pre-trained feedback network to obtain multiple sets of probability arrays. A feedback loss function is calculated based on the multiple sets of probability data. The illumination adaptive network and the fusion network are jointly trained based on the feedback loss function.
[0010] A visible light image and an infrared image to be fused are acquired. The visible light image to be fused is input into the jointly trained illumination adaptive network to obtain an enhanced image. The infrared image to be fused and the enhanced image are input into the jointly trained fusion network to obtain a fused image.
[0011] Furthermore, in the above-mentioned image fusion method based on feedback-based adaptive illumination, the fusion network includes a multi-scale convolutional module, a convolutional layer, a global average pooling layer, a fully connected layer, and an output layer;
[0012] The fused image dataset is input into a pre-trained feedback network to obtain multiple probability arrays, including:
[0013] The fused image dataset is input into a multi-scale convolution module for feature extraction and stitching to obtain a first feature map.
[0014] The first feature map is sequentially passed through multiple convolutional layers to extract features, resulting in a second feature map;
[0015] The second feature map is input into the global average pooling layer for dimensionality reduction.
[0016] The second feature map after dimensionality reduction is linearly transformed through multiple fully connected layers;
[0017] The second feature map after linear transformation is input into the output layer, and the values of the output nodes are converted into the probability array through an activation function. The probability array includes the probability that the fused image is in an overexposed state, the probability that it is in a normally exposed state, and the probability that it is underexposed.
[0018] Furthermore, in the above-mentioned image fusion method based on feedback-based adaptive illumination, the step of calculating the feedback loss function based on the multiple sets of probability data includes:
[0019]
[0020] in, The probability of overexposure This represents the probability of normal exposure. Let r be the probability of underexposure, and r be the number of iterations during training. It is a minimum value to prevent the base of the logarithmic function from being 0, and H and W are the height and width of the fused image, respectively.
[0021] Furthermore, in the above-described image fusion method based on feedback-based illumination adaptation, the joint training of the illumination adaptation network and the fusion network based on the feedback loss function includes:
[0022] A first loss function is calculated based on the feedback loss function. This first loss function is used to train the illumination adaptive network. The calculation method for the first loss function is as follows:
[0023]
[0024] in, , and It's a hyperparameter. For the first loss function, Let be the spatial consistency loss function. For the exposure control loss function, Color constant loss function, Smoothing loss function This is the feedback loss function.
[0025] Furthermore, in the above-mentioned image fusion method based on feedback-based adaptive illumination, the spatial consistency loss function is calculated as follows:
[0026]
[0027] Where M is the number of local regions, and the size of the local region is set to M. , express and The adjacent regions are divided into four areas: top, bottom, left, and right. I and E represent the average intensity values of local regions in the infrared image and the enhanced image, respectively.
[0028] The exposure control loss function is calculated as follows:
[0029]
[0030] Where N is the number of local regions, and the size of the local region is set to N. E represents the average intensity value of a local region in the enhanced image. The preset good exposure level has a value range of [value range missing]. ;
[0031] The color constant loss function is calculated as follows:
[0032]
[0033] in, , and These represent the mean values of the red, green, and blue channels, respectively.
[0034] The smoothing loss function is calculated as follows:
[0035]
[0036] in, These are the x and y coordinates of the pixel, respectively.
[0037] Furthermore, in the above-described image fusion method based on feedback-based illumination adaptation, the joint training of the illumination adaptation network and the fusion network based on the feedback loss function includes:
[0038] A second loss function is calculated based on the feedback loss function. This second loss function is used to train the fusion network. The calculation method for the second loss function is as follows:
[0039]
[0040] in, , and It's a hyperparameter. It is the second loss function. Let be the gradient loss function. The structural similarity loss function is... Let the color be constant loss function. This is the feedback loss function.
[0041] Furthermore, in the above-mentioned image fusion method based on feedback-based adaptive illumination, the gradient loss function is calculated as follows:
[0042]
[0043] Where H and W are the height and width of the infrared image, respectively. This represents the Sobel gradient operator, used to calculate the spatial derivative of image intensity. , and These represent the fused image, the infrared image, and the enhanced image after passing through the illumination adaptive network, respectively.
[0044] The structural similarity loss function is calculated as follows:
[0045]
[0046] in, This represents a structural similarity index measure.
[0047] This application also provides an image fusion device based on feedback-based adaptive illumination, including:
[0048] The training module is used to acquire visible light image datasets and infrared image datasets; input the visible light image datasets into an illumination adaptive network to obtain an enhanced image dataset; input the infrared image datasets and the enhanced image datasets into a fusion network to obtain a fused image dataset; and input the fused image datasets into a pre-trained feedback network to obtain multiple sets of probability arrays, calculate a feedback loss function based on the multiple sets of probability data, and jointly train the illumination adaptive network and the fusion network based on the feedback loss function.
[0049] The application module is used to acquire a visible light image and an infrared image to be fused, input the visible light image to be fused into the jointly trained illumination adaptive network to obtain an enhanced image, and input the infrared image to be fused and the enhanced image into the jointly trained fusion network to obtain a fused image.
[0050] This application also provides a computer-readable storage medium storing a plurality of instructions adapted for loading by a processor to execute any of the above-described image fusion methods based on feedback-based adaptive illumination.
[0051] This application also provides an electronic device, including a processor and a memory, wherein the processor is electrically connected to the memory, the memory is used to store instructions and data, and the processor is used in the steps of the image fusion method based on feedback-based adaptive illumination described above.
[0052] This application provides a feedback-based image fusion method, apparatus, storage medium, and electronic device based on adaptive illumination. The application establishes a feedback network, calculates a feedback loss function using the probability array predicted by the feedback network, and jointly trains the adaptive illumination network and the fusion network based on the feedback loss function and other functions to obtain the final fused image. This application improves the model's performance by using a feedback network to train the adaptive illumination network and the fusion network, enabling the output fused image to achieve a reasonable illumination level and improving the quality of the fused image. Attached Figure Description
[0053] The technical solution and other beneficial effects of this application will become apparent from the following detailed description of specific embodiments in conjunction with the accompanying drawings.
[0054] Figure 1 A flowchart of an image fusion method based on feedback-adaptive illumination provided in an embodiment of this application.
[0055] Figure 2 Another flowchart of the image fusion method based on feedback-adaptive illumination provided in the embodiments of this application.
[0056] Figure 3 This is a schematic diagram illustrating the process of generating enhanced images provided in an embodiment of this application.
[0057] Figure 4 A flowchart for generating a fused image provided in an embodiment of this application.
[0058] Figure 5 This is a schematic diagram of the structure of the residual block provided in an embodiment of this application.
[0059] Figure 6 This is a schematic diagram of the structure of the multi-scale spatial / channel attention fusion module provided in the embodiments of this application.
[0060] Figure 7 A flowchart for generating probability arrays provided in an embodiment of this application.
[0061] Figure 8 This is a schematic diagram comparing the results of image fusion tests using 12 methods provided in the embodiments of this application.
[0062] Figure 9 This is another comparative diagram showing the results of image fusion tests using 12 methods provided in the embodiments of this application.
[0063] Figure 10 This is a schematic diagram of the structure of the image fusion device based on feedback-based adaptive illumination provided in an embodiment of this application.
[0064] Figure 11 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.
[0065] Figure 12 Another structural schematic diagram of the electronic device provided in the embodiments of this application. Detailed Implementation
[0066] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0067] Currently, traditional methods rely on multi-scale transformation, sparse representation, subspace clustering, optimization algorithms, and hybrid methods. Although existing image fusion methods have achieved certain results, several problems still exist: (1) Ambient lighting limitations: Visible light images often exhibit low brightness, insufficient contrast, and lack of texture details. They are mostly only applicable to scenes with good lighting, and it is difficult to maintain high-quality image fusion in nighttime or low-light environments. (2) Pixel intensity constraints: Existing nighttime image fusion methods generally adopt pixel intensity constraints as the core mechanism, resulting in deficiencies in texture clarity and color performance of the fused image. Pixel intensity constraints emphasize brightness consistency, leading to smoothing of texture regions. That is, texture information with high frequency changes is low-pass filtered during the fusion process, reducing the local contrast and edge clarity of the image. (3) Existing enhancement-based image fusion methods only focus on the fine design of enhancement and fusion networks. Although these methods can enhance the visibility of images under certain conditions, when dealing with low-light or high-contrast scenes, they may cause the brightness distribution in the fused image to be skewed, resulting in underexposure or overexposure, which is not conducive to obtaining ideal lighting levels in the fused image. To address these issues, this invention presents a feedback-based, illumination-adaptive infrared and visible light image fusion method.
[0068] To address the aforementioned problems, embodiments of this application provide an image fusion method, apparatus, storage medium, and electronic device based on feedback-based adaptive illumination. The image fusion apparatus based on feedback-based adaptive illumination provided in this application can be integrated into an electronic device, which may be a terminal, server, or other device. The terminal may include a tablet computer, laptop computer, personal computer (PC), microprocessor box, or other devices.
[0069] Please see Figure 1 and Figure 2 , Figure 1 This is a flowchart of an image fusion method based on feedback-based adaptive illumination provided in an embodiment of this application. Figure 1 Another flowchart of the image fusion method based on feedback-adaptive illumination provided in this application embodiment, which is applied in an electronic device, includes the following steps:
[0070] S1, acquire the visible light image dataset and the infrared image dataset.
[0071] The visible light image dataset includes several visible light images, each labeled with a corresponding ground truth label. The infrared image dataset includes several infrared images, each labeled with a corresponding ground truth label. Both the visible light and infrared image datasets are used to train the illumination adaptation network, fusion network, and feedback network; the LLVIP and MSRS datasets can be selected from these datasets.
[0072] S2, input the visible light image dataset into the illumination adaptive network to obtain the enhanced image dataset.
[0073] Figure 3 This is a schematic diagram of the process for generating enhanced images provided in an embodiment of this application, such as... Figure 3 As shown, the illumination adaptive network consists of multiple convolutional layers (Conv) and multiple linear activation layers (ReLU). First, the illumination adaptive network receives a visible light image. Then, the visible light image is processed through a convolutional layer and a ReLU activation function. The image is then processed. The processed result is then passed sequentially through three identical convolutional layers and activation functions. The illumination-adaptive network uses residual connections, meaning that after the 2nd, 3rd, and 4th convolutional layers, the feature maps from the previous layer are directly passed to the subsequent layers. These residual connections help alleviate the vanishing gradient problem in deep networks, enhance information flow, and retain more of the original image information. Finally, an enhanced image is obtained through a 3×3 convolutional layer and a Tanh activation function, with values ranging from [value range missing]. In this case, the stride and padding size of each convolutional kernel are both set to 1.
[0074] The embodiments of this application use illumination enhancement curves as a basic method to adaptively adjust the brightness of visible light images, effectively restoring the brightness, texture, and color of visible light images.
[0075] S3 inputs the infrared image dataset and the enhanced image dataset into the fusion network to obtain the fused image dataset.
[0076] Figure 4 This is a flowchart for generating a fused image provided in an embodiment of this application. Figure 5 This is a schematic diagram of the structure of the residual block provided in an embodiment of this application. Figure 6 This is a schematic diagram of the structure of the multi-scale spatial / channel attention fusion module provided in the embodiments of this application, as shown below. Figures 4-6 As shown, the fusion network consists of three parts: an encoder, a multi-scale spatial / channel attention fusion module, and a decoder. The encoder includes multiple... The encoder employs convolutional layers, batch normalization, and the Leaky ReLU activation function. It also includes two densely connected Sobel / Laplacian residual blocks. The encoder processes the infrared and enhanced images separately to obtain their respective feature maps. A multi-scale spatial / channel attention fusion module performs multi-scale convolutions across 128 channels, extracting features at different scales from the feature maps and integrating these features using spatial and channel attention mechanisms to obtain fused features. The decoder converts the fused features back to image format through multiple convolutional layers, generating the final fused image. The stride of each convolutional kernel is set to 1.
[0077] S4. Input the fused image dataset into the pre-trained feedback network to obtain multiple sets of probability arrays. Calculate the feedback loss function based on the multiple sets of probability data. Jointly train the illumination adaptation network and the fusion network based on the feedback loss function.
[0078] Figure 7 The flowchart for generating the probability array provided in the embodiments of this application is as follows: Figure 7 As shown, the fusion network includes multi-scale convolutional modules, convolutional layers, global average pooling layers, fully connected layers, and an output layer. Step S4 includes the following steps:
[0079] S41, pre-train the feedback network.
[0080] Specifically, using the cross-entropy loss function Training the feedback network: To obtain accurate prediction results, it is crucial that the fusion network correctly outputs the three probabilities (probability of overexposure, probability of normal exposure, and probability of underexposure). This application introduces cross-entropy loss. The definition is as follows:
[0081]
[0082] Where N is the number of categories, which is set to 3 here. It is the one-hot encoding of the real label. It is the probability that the sample belongs to category i, as predicted by the feedback network.
[0083] S42, the fused image dataset is input into the multi-scale convolution module for feature extraction and stitching to obtain the first feature map.
[0084] S43, the first feature map is sequentially passed through multiple convolutional layers for feature extraction to obtain the second feature map.
[0085] S44, the second feature map is input into the global average pooling layer for dimensionality reduction.
[0086] S45 performs a linear transformation on the dimensionality-reduced second feature map through multiple fully connected layers.
[0087] S46, the second feature map after linear transformation is input into the output layer, and the values of the output nodes are converted into a probability array through the activation function. The probability array includes the probability that the fused image is in an overexposed state, the probability that it is in a normally exposed state, and the probability that it is underexposed.
[0088] S47, Calculate the feedback loss function based on multiple sets of probability data. .
[0089] Specifically, it is calculated using the following formula:
[0090]
[0091] in, The probability of overexposure This represents the probability of normal exposure. Let r be the probability of underexposure, and r be the number of iterations during training. This is a local minimum value to prevent the base of the logarithmic function from being 0; here it is set to 1 × 10. -6 H and W represent the height and width of the fused image, respectively.
[0092] S48, Calculate the first loss function based on the feedback loss function. The first loss function is used to train the illumination adaptive network. The calculation method of the first loss function is as follows:
[0093]
[0094] in, , and It's a hyperparameter, here. , , This is used to balance different losses. For the first loss function, Let be the spatial consistency loss function. For the exposure control loss function, Color constant loss function, Smoothing loss function This is the feedback loss function.
[0095] Specifically, to quantify and minimize the inconsistency between predicted values of adjacent pixels in the enhanced image, and to ensure that the semantic and structural features of the enhanced image within local regions remain highly consistent with the original image, this application introduces a spatial consistency loss function. The spatial consistency loss function is calculated as follows:
[0096]
[0097] Where M is the number of local regions, and the size of the local region is set to M. , express and The adjacent regions are divided into four areas: top, bottom, left, and right. I and E represent the average intensity values of local regions in the infrared image and the enhanced image, respectively.
[0098] To control the exposure level of an image and suppress underexposed or overexposed areas, thereby achieving a more balanced image brightness, this application introduces an exposure control loss function. The exposure control loss function is calculated as follows:
[0099]
[0100] Where N is the number of local regions, and the size of the local region is set to N. E represents the average intensity value of a local region in the enhanced image. The preset good exposure level has a value range of [value range missing]. Here it is set to 0.6.
[0101] To maintain the naturalness and consistency of colors during the enhancement process, especially under varying lighting conditions, this invention introduces a color constant loss function. The color constant loss function is calculated as follows:
[0102]
[0103] in, , and These represent the mean values of the red, green, and blue channels, respectively.
[0104] To smooth the transitions between adjacent pixels in the enhanced image and reduce noise and discontinuities, this invention introduces a smoothing loss function. The smoothing loss function is calculated as follows:
[0105]
[0106] in, These are the x and y coordinates of the pixel, respectively.
[0107] S49, Calculate the second loss function based on the feedback loss function. The second loss function is used to train the fusion network. The calculation method of the second loss function is as follows:
[0108]
[0109] in, , and It's a hyperparameter, here. , , This is used to balance different losses. It is the second loss function. Let be the gradient loss function. The structural similarity loss function is... Let the color be constant loss function. This is the feedback loss function.
[0110] Specifically, in order to preserve key edge information and texture details in the source images (infrared images and enhanced images), i.e., focusing on the high-frequency components of the images, this application introduces a gradient loss function. The gradient loss function is calculated as follows:
[0111]
[0112] Where H and W are the height and width of the infrared image, respectively. This represents the Sobel gradient operator, used to calculate the spatial derivative of image intensity. , and These represent the fused image, the infrared image, and the enhanced image after passing through the illumination adaptive network, respectively.
[0113] To ensure that the fused image is structurally consistent with the reference image, this application introduces a structural similarity loss function. The structural similarity loss function is calculated as follows:
[0114]
[0115] in, This represents a structural similarity index measure.
[0116] The training data for the above-mentioned illumination adaptive network, fusion network, and feedback network are shown in Table 1 below:
[0117] Table 1 Training data for illumination-adaptive network, fusion network, and feedback network
[0118]
[0119] S5. Obtain the visible light image and the infrared image to be fused. Input the visible light image to be fused into the jointly trained illumination adaptation network to obtain the enhanced image. Input the infrared image to be fused and the enhanced image into the jointly trained fusion network to obtain the fused image.
[0120] To verify the superior performance of this invention in the field of image fusion, this application selected 50 pairs of images and 361 pairs of images on the LLVIP and MSRS datasets, respectively, for fusion testing. This application selected nine advanced comparison methods, two of which are based on autoencoders, including RFN-Nest and SDCFusion. Six of these methods are based on convolutional neural networks, including SDNet, SwinFusion, UMF-CMGR, MURF, LRRNet, and PTET. The last method is based on a generative adversarial network, namely TarDAL. Furthermore, this invention uses five metrics to objectively evaluate the fusion performance: information entropy, spatial frequency, standard deviation, visual information fidelity, and average gradient.
[0121] The overall framework of the image fusion method based on feedback-based adaptive illumination provided in this application includes three networks: an adaptive illumination network, a fusion network, and a feedback network. First, this invention uses an illumination enhancement curve as the basic method to adaptively adjust the brightness of the visible light image, effectively restoring the brightness, texture, and color of the visible light image. Second, the fusion network uses Sobel / Laplacian residual blocks as feature extraction blocks for infrared and visible light images, and combines a multi-scale spatial / channel attention mechanism to ensure that the fused image contains rich semantic and spatial information. Next, the feedback network evaluates the illumination level of the fused image by calculating the probabilities of underexposure, normal exposure, and overexposure. Finally, these probabilities are fed back to the adaptive illumination network and the fusion network in the form of a loss function, guiding them to perform joint training to ensure that the illumination level of the fused image reaches the optimal state. This application has the following advantages: (1) This application calculates the feedback loss function through the prediction results of the fusion network, and performs joint training of the adaptive illumination network and the fusion network based on the feedback loss function, effectively realizing the mutual promotion between the adaptive illumination network and the fusion network, and ensuring that the fused image can reach the ideal illumination level. (2) In joint training, different loss functions were set for training based on the needs of the illumination adaptive network and the fusion network, respectively. For example, a structural consistency loss function was set based on the need for semantic and structural features to be consistent with the original image, and an exposure control loss function was set based on the need to obtain more balanced image brightness. These loss functions improve the performance of the model, can adapt to various scenarios, and can improve the quality of the fused image. (2) This application designs two novel modules in the fusion network: the Sobel / Laplacian residual block effectively fuses the depth features and strong and weak texture details of the source image, and the multi-scale spatial / channel attention module makes the fused image contain rich semantic and spatial information. (3) This application conducted pedestrian detection experiments on the fused image, and the superior detection performance verified that the present invention can effectively promote advanced computer vision tasks.
[0122] Figure 8 This is a comparative diagram showing the results of image fusion tests performed using 12 methods provided in the embodiments of this application, as shown below. Figure 8 As shown, six typical infrared and visible light image pairs were selected, including scenes such as pedestrians, inscriptions, buildings, and license plates. The thermal targets within the red boxes of RFN-Nest, SDNet, UMF-CMGR, MURF, and LRRNet are relatively blurry, with insufficient detail. SwinFusion has low overall brightness and insufficient edge detail processing. TarDAL has low overall contrast and insufficient detail gradation. While PTET effectively highlights the thermal infrared target, its color information is not realistic enough, and its detail representation is poor. SDCFusion suffers some loss of texture detail within the green box, and its detail representation is average. In contrast, the image fusion method provided in this application effectively highlights the infrared thermal target while preserving and enhancing rich texture details, which is more in line with human visual perception.
[0123] The quantitative results of this application on the LLVIP and MSRS datasets are shown in Tables 2 and 3. This invention maintains a leading position across all metrics. Notably, this invention significantly outperforms nine other state-of-the-art comparative methods in spatial frequency, visual information fidelity, and average gradient metrics.
[0124] Table 2 Quantitative results on the LLVIP dataset
[0125]
[0126] Table 3 Quantitative results on the MSRS dataset
[0127]
[0128] To verify that this invention can facilitate the application of advanced vision tasks, this invention uses the YOLOv7 algorithm and selects 50 pairs of images on the LLVIP dataset to conduct pedestrian detection experiments. Figure 9 This is another comparative diagram showing the results of image fusion tests using the 12 methods provided in the embodiments of this application, as shown below. Figure 9 As shown, two typical scenarios were selected, from... Figure 9 It can be seen that the image fusion method provided in this application has the highest detection accuracy and detection confidence. As shown in Table 4, AP@0.60 represents the average accuracy calculated when the cross-union threshold is 0.60, and the step size in mAP@[0.50:0.95] is 0.05. Obviously, the image fusion method provided by this invention has the highest average accuracy.
[0129] Table 4. Quantitative results on the MSRS dataset
[0130]
[0131] Based on the method described in the above embodiments, this embodiment will further describe it from the perspective of a feedback-based adaptive illumination image fusion device. This feedback-based adaptive illumination image fusion device can be implemented as an independent entity or integrated into an electronic device. The electronic device can be a terminal, server, or other devices. The terminal can include a tablet computer, a laptop computer, a personal computer (PC), a microprocessor box, or other devices.
[0132] Please see Figure 10 , Figure 10 This application provides a specific description of an image fusion device based on feedback-adaptive illumination, which is applied in electronic devices. The image fusion device based on feedback-adaptive illumination may include:
[0133] The training module is used to acquire visible light image datasets and infrared image datasets; input the visible light image datasets into the illumination adaptation network to obtain the enhanced image dataset; input the infrared image datasets and enhanced image datasets into the fusion network to obtain the fused image dataset; and input the fused image datasets into the pre-trained feedback network to obtain multiple sets of probability arrays, calculate the feedback loss function based on the multiple sets of probability data, and jointly train the illumination adaptation network and the fusion network based on the feedback loss function.
[0134] The application module is used to acquire the visible light image and the infrared image to be fused. The visible light image to be fused is input into the jointly trained illumination adaptation network to obtain the enhanced image. The infrared image to be fused and the enhanced image are input into the jointly trained fusion network to obtain the fused image.
[0135] In specific implementation, the above modules and / or units can be implemented as independent entities, or they can be arbitrarily combined and implemented as the same or several entities. For the specific implementation of the above modules and / or units, please refer to the previous method embodiments. For the specific beneficial effects that can be achieved, please also refer to the beneficial effects in the previous method embodiments, which will not be repeated here.
[0136] In addition, embodiments of this application also provide an electronic device, which may be a computer, tablet computer, or other similar device. Figure 11 As shown, the electronic device 400 includes a processor 401 and a memory 402. The processor 401 and the memory 402 are electrically connected.
[0137] The processor 401 is the control center of the electronic device 400. It connects various parts of the electronic device through various interfaces and lines. By running or loading the application program stored in the memory 402 and calling the data stored in the memory 402, it performs various functions of the electronic device and processes data, thereby monitoring the electronic device as a whole.
[0138] In this embodiment, the processor 401 in the electronic device 400 loads the instructions corresponding to the processes of one or more application programs into the memory 402 according to the following steps, and the processor 401 runs the application programs stored in the memory 402 to realize various functions:
[0139] Obtain visible light image datasets and infrared image datasets;
[0140] The visible light image dataset is input into the illumination adaptive network to obtain the enhanced image dataset;
[0141] The infrared image dataset and the enhanced image dataset are input into the fusion network to obtain the fused image dataset;
[0142] The fused image dataset is input into a pre-trained feedback network to obtain multiple sets of probability arrays. A feedback loss function is calculated based on the multiple sets of probability data. The illumination adaptive network and the fusion network are jointly trained based on the feedback loss function.
[0143] A visible light image and an infrared image to be fused are acquired. The visible light image to be fused is input into the jointly trained illumination adaptive network to obtain an enhanced image. The infrared image to be fused and the enhanced image are input into the jointly trained fusion network to obtain a fused image.
[0144] This electronic device can implement the steps of any embodiment of the image fusion method based on feedback-adaptive illumination provided in the embodiments of this application. Therefore, it can achieve the beneficial effects that any image fusion method based on feedback-adaptive illumination provided in the embodiments of this invention can achieve. For details, please refer to the previous embodiments, which will not be repeated here.
[0145] Figure 12 A specific structural block diagram of an electronic device provided in an embodiment of the present invention is shown. This electronic device can be used to implement the image fusion method based on feedback-based adaptive illumination provided in the above embodiments. The electronic device 500 can be a terminal, server, or other device. The terminal can include a tablet computer, laptop computer, personal computer (PC), microprocessor box, or other devices.
[0146] RF circuit 510 is used to receive and transmit electromagnetic waves, converting electromagnetic waves into electrical signals and vice versa, thereby enabling communication with communication networks or other devices. RF circuit 510 may include various existing circuit elements used to perform these functions, such as antennas, radio frequency transceivers, digital signal processors, encryption / decryption chips, subscriber identity modules (SIM cards), memory, etc. RF circuit 510 can communicate with various networks such as the Internet, corporate intranets, and wireless networks, or communicate with other devices via wireless networks. The aforementioned wireless networks may include cellular telephone networks, wireless local area networks (WLANs), or metropolitan area networks (MANs). The aforementioned wireless networks may use various communication standards, protocols, and technologies, including but not limited to Global System for Mobile Communication (GSM), Enhanced Data GSM Environment (EDGE), Wideband Code Division Multiple Access (WCDMA), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Wireless Fidelity (Wi-Fi) (such as IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, and / or IEEE 802.11n), Voice over Internet Protocol (VoIP), Worldwide Interoperability for Microwave Access (Wi-Max), other protocols for email, instant messaging, and short messages, and any other suitable communication protocols, including those that have not yet been developed.
[0147] The memory 520 can be used to store software programs and modules, such as the program instructions / modules corresponding to those in the above embodiments. The processor 580 executes various functional applications and data processing by running the software programs and modules stored in the memory 520, such as taking pictures with the front-facing camera, processing the captured images, and switching the display colors of the content displayed on the screen. The memory 520 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 520 may further include memory remotely located relative to the processor 580, and these remote memories can be connected to the electronic device 500 via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
[0148] The input unit 530 can be used to receive input numeric or character information, and to generate a keyboard and mouse related to user settings and function control.
[0149] Display unit 540 can be used to display information input by the user or information provided to the user, as well as various graphical user interfaces, which can be composed of graphics, text, icons, video, and any combination thereof. Display unit 540 may include display panel 541, which may optionally be configured in the form of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or other similar forms.
[0150] Audio circuitry 560, speaker 561, and microphone 562 provide an audio interface between the user and electronic device 500. Audio circuitry 560 converts received audio data into electrical signals and transmits them to speaker 561, where speaker 561 converts them into sound signals for output. Conversely, microphone 562 converts collected sound signals into electrical signals, which are then received by audio circuitry 560, converted back into audio data, and processed by processor 580. The audio data is then transmitted via RF circuitry 510 to, for example, another terminal, or output to memory 520 for further processing. Audio circuitry 560 may also include an earphone jack to facilitate communication between external headphones and electronic device 500.
[0151] Electronic device 500, through transmission module 570 (e.g., Wi-Fi module), can help users receive requests, send information, etc., providing users with wireless broadband internet access. Although transmission module 570 is shown in the figure, it is understood that it is not an essential component of electronic device 500 and can be omitted as needed without changing the essence of the invention.
[0152] The processor 580 is the control center of the electronic device 500. It connects to various parts of the phone via various interfaces and lines, and performs various functions and processes data of the electronic device 500 by running or executing software programs and / or modules stored in the memory 520, and by calling data stored in the memory 520, thereby providing overall monitoring of the electronic device. Optionally, the processor 580 may include one or more processing cores; in some embodiments, the processor 580 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and applications, and the modem processor mainly handles wireless communication. It is understood that the modem processor may also not be integrated into the processor 580.
[0153] Electronic device 500 also includes a power supply 590 (such as a battery) that supplies power to various components. In some embodiments, the power supply may be logically connected to processor 580 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system. The power supply 590 may also include one or more DC or AC power supplies, recharging systems, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components.
[0154] Although not shown, the electronic device 500 also includes cameras (such as front-facing cameras and rear-facing cameras), Bluetooth modules, etc., which will not be described in detail here. Specifically, in this embodiment, the display unit of the electronic device is a touch screen display, and the mobile terminal also includes a memory and one or more programs, wherein one or more programs are stored in the memory and configured to be executed by one or more processors. One or more programs contain instructions for performing the following operations:
[0155] Obtain visible light image datasets and infrared image datasets;
[0156] The visible light image dataset is input into the illumination adaptive network to obtain the enhanced image dataset;
[0157] The infrared image dataset and the enhanced image dataset are input into the fusion network to obtain the fused image dataset;
[0158] The fused image dataset is input into a pre-trained feedback network to obtain multiple sets of probability arrays. A feedback loss function is calculated based on the multiple sets of probability data. The illumination adaptive network and the fusion network are jointly trained based on the feedback loss function.
[0159] A visible light image and an infrared image to be fused are acquired. The visible light image to be fused is input into the jointly trained illumination adaptive network to obtain an enhanced image. The infrared image to be fused and the enhanced image are input into the jointly trained fusion network to obtain a fused image.
[0160] In practice, the above modules can be implemented as independent entities or combined in any way to be implemented as the same or several entities. For the specific implementation of the above modules, please refer to the previous method implementation examples, which will not be repeated here.
[0161] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be implemented by instructions, or by instructions controlling related hardware. These instructions can be stored in a computer-readable storage medium and loaded and executed by a processor. Therefore, embodiments of the present invention provide a storage medium storing multiple instructions that can be loaded by a processor to execute the steps of any embodiment of the image fusion method based on feedback-based adaptive illumination provided by the present invention.
[0162] The computer-readable storage medium may include: read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.
[0163] Since the instructions stored in the storage medium can execute the steps in any embodiment of the image fusion method based on feedback-adaptive illumination provided in the embodiments of the present invention, the beneficial effects that any image fusion method based on feedback-adaptive illumination provided in the embodiments of the present invention can achieve can be realized. For details, please refer to the previous embodiments, which will not be repeated here.
[0164] The foregoing has provided a detailed description of an image fusion method, apparatus, storage medium, and electronic device based on feedback-based adaptive illumination provided in the embodiments of this application. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the embodiments above are only for the purpose of helping to understand the method and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.
Claims
1. An image fusion method based on feedback-based adaptive illumination, characterized in that, The method includes: Obtain visible light image datasets and infrared image datasets; The visible light image dataset is input into the illumination adaptive network to obtain the enhanced image dataset; The infrared image dataset and the enhanced image dataset are input into the fusion network to obtain the fused image dataset; The fused image dataset is input into a pre-trained feedback network to obtain multiple probability arrays. Each probability array includes the probability that the fused image is overexposed, the probability that it is normally exposed, and the probability that it is underexposed. A feedback loss function is calculated based on the multiple probability arrays, and the illumination adaptive network and the fusion network are jointly trained based on the feedback loss function. The feedback loss function is: in, The probability of overexposure This represents the probability of normal exposure. Let r be the probability of underexposure, and r be the number of iterations during training. It is a local minimum value to prevent the base of the logarithmic function from being 0. H and W are the height and width of the fused image, respectively. A visible light image and an infrared image to be fused are acquired. The visible light image to be fused is input into the jointly trained illumination adaptive network to obtain an enhanced image. The infrared image to be fused and the enhanced image are input into the jointly trained fusion network to obtain a fused image.
2. The image fusion method based on feedback-based adaptive illumination according to claim 1, characterized in that, The fusion network includes a multi-scale convolutional module, convolutional layers, a global average pooling layer, a fully connected layer, and an output layer; The fused image dataset is input into a pre-trained feedback network to obtain multiple probability arrays, including: The fused image dataset is input into a multi-scale convolution module for feature extraction and stitching to obtain a first feature map. The first feature map is sequentially passed through multiple convolutional layers to extract features, resulting in a second feature map; The second feature map is input into the global average pooling layer for dimensionality reduction. The second feature map after dimensionality reduction is linearly transformed through multiple fully connected layers; The second feature map after linear transformation is input into the output layer, and the values of the output nodes are converted into the probability array by the activation function.
3. The image fusion method based on feedback-based adaptive illumination according to claim 1, characterized in that, The joint training of the illumination adaptive network and the fusion network based on the feedback loss function includes: A first loss function is calculated based on the feedback loss function. This first loss function is used to train the illumination adaptive network. The calculation method for the first loss function is as follows: in, , and It's a hyperparameter. For the first loss function, Let be the spatial consistency loss function. For the exposure control loss function, Color constant loss function, Smoothing loss function This is the feedback loss function.
4. The image fusion method based on feedback-based adaptive illumination according to claim 3, characterized in that, The spatial consistency loss function is calculated as follows: Where M is the number of local regions, and the size of the local region is set to M. , express and The adjacent regions are divided into four areas: top, bottom, left, and right. I and E represent the average intensity values of local regions in the infrared image and the enhanced image, respectively. The exposure control loss function is calculated as follows: Where N is the number of local regions, and the size of the local region is set to N. E represents the average intensity value of a local region in the enhanced image. The preset good exposure level has a value range of [value range missing]. ; The color constant loss function is calculated as follows: in, , and These represent the mean values of the red, green, and blue channels, respectively. The smoothing loss function is calculated as follows: in, These are the x and y coordinates of the pixel, respectively.
5. The image fusion method based on feedback-based adaptive illumination according to claim 1, characterized in that, The joint training of the illumination adaptive network and the fusion network based on the feedback loss function includes: A second loss function is calculated based on the feedback loss function. This second loss function is used to train the fusion network. The calculation method for the second loss function is as follows: in, , and It's a hyperparameter. It is the second loss function. Let be the gradient loss function. Let S be the structural similarity loss function. Let the color be constant loss function. This is the feedback loss function.
6. The image fusion method based on feedback-based adaptive illumination according to claim 5, characterized in that, The gradient loss function is calculated as follows: Where H and W are the height and width of the infrared image, respectively. This represents the Sobel gradient operator, used to calculate the spatial derivative of image intensity. , and These represent the fused image, the infrared image, and the enhanced image after passing through the illumination adaptive network, respectively. The structural similarity loss function is calculated as follows: in, This represents a structural similarity index measure.
7. An image fusion device based on feedback-adaptive illumination, wherein the image fusion device based on feedback-adaptive illumination is used to implement the image fusion method based on feedback-adaptive illumination as described in claim 1, characterized in that, include: The training module is used to acquire visible light image datasets and infrared image datasets; The visible light image dataset is input into the illumination adaptive network to obtain the enhanced image dataset; The infrared image dataset and the enhanced image dataset are input into the fusion network to obtain a fused image dataset; and the fused image dataset is input into a pre-trained feedback network to obtain multiple probability arrays, each probability array including the probability that the fused image is in an overexposed state, the probability that it is in a normal exposed state, and the probability that it is underexposed. A feedback loss function is calculated based on the multiple probability arrays, and the illumination adaptive network and the fusion network are jointly trained based on the feedback loss function. The feedback loss function is: in, The probability of overexposure This represents the probability of normal exposure. Let r be the probability of underexposure, and r be the number of iterations during training. It is a local minimum value to prevent the base of the logarithmic function from being 0. H and W are the height and width of the fused image, respectively. The application module is used to acquire a visible light image and an infrared image to be fused, input the visible light image to be fused into the jointly trained illumination adaptive network to obtain an enhanced image, and input the infrared image to be fused and the enhanced image into the jointly trained fusion network to obtain a fused image.
8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a plurality of instructions adapted for loading by a processor to execute the image fusion method based on feedback-adaptive illumination as described in any one of claims 1 to 6.
9. An electronic device, characterized in that, The method includes a processor and a memory, the processor being electrically connected to the memory, the memory being used to store instructions and data, and the processor being used to execute the steps of the image fusion method based on feedback-adaptive illumination as described in any one of claims 1 to 6.