Image multi-frame fusion method and device, electronic equipment and storage medium
By combining image alignment and local light source diffusion weighting with Laplacian pyramid fusion and Retinex technology, the problem of insufficient dynamic range capture of images under complex lighting conditions is solved, achieving image fusion effects with high dynamic range and low halo.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BLACK SESAME TECH CO LTD
- Filing Date
- 2022-09-19
- Publication Date
- 2026-06-19
AI Technical Summary
In complex lighting or backlit scenarios, existing technologies cannot effectively capture the full dynamic range of an image, resulting in the discarding of some highlight or low-light information during image generation, leading to differences in human visual perception, as well as issues such as halo and contrast compression during multi-frame fusion.
By aligning the images based on the frame with the highest exposure sharpness, combining local light source diffusion weight information and Laplacian pyramid fusion, the image fusion region is calculated, and Retinex technology is used for post-processing to suppress halos and maintain the image's high dynamic range and low halos.
It achieves high dynamic range and low halo in complex lighting environments, improving image quality and aesthetics, especially maintaining natural transitions between portrait brightness and background in nighttime scenes.
Smart Images

Figure CN115578273B_ABST
Abstract
Description
Technical Field
[0001] The embodiments described in this specification relate to computer graphics, and in particular to a method, apparatus, electronic device, and storage medium for multi-frame image fusion. Background Technology
[0002] When capturing images, especially in environments with a large dynamic range, such as those with complex lighting or backlighting, the sensor modules on mobile phones cannot always capture the entire dynamic range of an image, particularly in nighttime scenes, due to limitations in their precision. Therefore, during image generation, to ensure the quality of the generated image remains within an acceptable range, the dynamic range is typically kept low to capture most of the scene information. This results in the discarding of some highlight or low-light information, leading to discrepancies between human vision and photographic representation. To guarantee consistently good visual quality in nighttime image acquisition, fusing multiple images with different exposures becomes the primary choice.
[0003] However, fusing multi-frame exposure images presents significant challenges, such as motion scenes, halos, and contrast compression. During image fusion, halos may be generated, and highlighted areas may be further amplified; existing halos around light sources cannot be effectively eliminated. Summary of the Invention
[0004] This specification provides a method, apparatus, electronic device, and storage medium for multi-frame image fusion through various embodiments. In the image fusion process, local information is introduced into the image using a light source diffusion weighting method, so that the fused image has a higher dynamic range and a lower halo.
[0005] One embodiment of this specification provides a method for multi-frame image fusion, comprising:
[0006] Image alignment is performed based on the frame with the highest exposure sharpness in the image; image alignment includes extracting feature points from each frame of the image so that each frame of the image has the same spatial layout.
[0007] Based on the local light source diffusion weight information in the spatial domain, the fusion region of each frame of the image is calculated.
[0008] Based on the fusion region, images with different exposures in each frame are fused; image fusion is to use a pre-defined algorithm to fuse multiple frames of images with different exposures into a fused image;
[0009] The fused image is compressed to a pre-defined dynamic range.
[0010] One embodiment of this specification provides an apparatus for multi-frame image fusion, comprising:
[0011] The acquisition module is used to perform image alignment based on the frame with the highest exposure sharpness in the image; image alignment includes extracting feature points of each frame of the image so that each frame of the image has the same spatial layout.
[0012] The calculation module is used to calculate the fusion region of each frame of the image based on the local light source diffusion weight information in the spatial domain.
[0013] The fusion module is used to fuse images with different exposures in each frame based on the fusion region; image fusion is the process of merging multiple frames of images with different exposures into a fused image using a pre-defined algorithm;
[0014] The mapping module is used to compress the fused image to a pre-defined dynamic range range.
[0015] One embodiment of this specification provides an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the program, implements the method as described in any of the foregoing descriptions.
[0016] One embodiment of this specification provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform any of the methods described above.
[0017] The various implementation methods provided in this specification introduce local information into the image by using light source diffusion weights during the image fusion process, resulting in a fused image with a higher dynamic range and lower halo. Attached Figure Description
[0018] To more clearly illustrate the technical solutions in the embodiments of this specification, the drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0019] Figure 1 A flowchart of an image multi-frame fusion method provided for one embodiment of this specification;
[0020] Figure 2 A schematic diagram of an image multi-frame fusion apparatus provided for one embodiment of this specification;
[0021] Figure 3 A schematic diagram of image fusion weight calculation provided for one embodiment of this specification;
[0022] Figure 4A schematic diagram of image light source diffusion weight calculation provided for one embodiment of this specification;
[0023] Figure 5 A schematic diagram of image human detection provided for one embodiment of this specification;
[0024] Figure 6 An image fusion schematic diagram provided for one embodiment of this specification;
[0025] Figure 7 An image enhancement diagram provided for one embodiment of this specification. Detailed Implementation
[0026] The technical solutions in the embodiments provided in this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments, not all of them. Based on the embodiments provided in this specification, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this invention.
[0027] When capturing images, especially in environments with a large dynamic range, such as those with complex lighting or backlighting, the sensor modules on mobile phones cannot always capture the entire dynamic range of an image, particularly in nighttime scenes, due to limitations in their precision. Therefore, during image generation, to ensure the quality of the generated image remains within an acceptable range, the dynamic range is typically kept low to capture most of the scene information. This results in the discarding of some highlight or low-light information, leading to discrepancies between human vision and photographic representation. To guarantee consistently good visual quality in nighttime image acquisition, fusing multiple images with different exposures becomes the primary choice.
[0028] However, fusing multi-frame exposure images presents significant challenges, such as motion scenes, halos, and contrast compression. During image fusion, halos may be generated, and highlighted areas may be further amplified; existing halos around light sources cannot be effectively eliminated.
[0029] In view of this, this specification provides a method, apparatus, electronic device, and storage medium for multi-frame image fusion.
[0030] Please see Figure 1 This specification provides a method for multi-frame image fusion, which may include the following steps.
[0031] S101: Perform image alignment based on the frame with the highest exposure sharpness in the image; the image alignment includes extracting feature points of each frame of the image so that each frame of the image has the same spatial layout.
[0032] In this embodiment, after exposing all input images, sharpness detection is performed on all images with different exposure levels. Sharpness is the most important factor affecting image quality, determining the amount of image detail a system can reproduce. Sharpness is determined by the boundaries of different chromaticity or color regions. After sharpness detection, the frame with the highest exposure sharpness is found, and image alignment is performed based on this image. That is, feature points of each frame of the image are extracted, the feature point set of each frame is matched to obtain the optimal match, and then the correspondence between each frame of the image is optimized using affine transformation / perspective transformation, thereby obtaining the transformation parameters. Finally, the optimized parameters can be used to ensure that each frame of the image has the same spatial layout, thus ensuring spatial consistency and preventing ghosting caused by external factors such as camera shake during the fusion process, which would reduce image quality and lay a foundation for image fusion. The sharpness detection methods include the 10 / 90 rise distance technique, relative contrast (output contrast / input contrast), frequency domain method, and bevel method. In this embodiment, the sharpness detection method is not limited, and different sharpness detection methods can be selected according to requirements.
[0033] For example, after image alignment based on the frame with the highest exposure sharpness in the image, it may also include: detecting moving objects between images with different exposures in each frame after image alignment; mapping images with different exposures in each frame to the same brightness range; calculating the differences between different frames and finding the interference region in images with different exposures in each frame.
[0034] Specifically, after image alignment of the input images, it is necessary to detect moving objects between images with different exposures in each frame. When using the background subtraction method to detect moving targets, interference areas, also known as ghosting regions, often appear in the exposed images. In most high dynamic range imaging techniques, the target scene needs to be kept still during the shooting process. Once the scene changes during the shooting process, or a moving object enters, blurred or semi-transparent images will appear in the motion areas in the final fused image, generally referred to as "ghosting." When initializing the background model, moving targets may be in the background, and their movement will produce ghosting; in another case, when a moving target in the scene changes from motion to stillness and then starts moving again, ghosting will also occur. Other similar ghosting situations include objects left in the background or moving targets that have stopped moving. Considering the differences in exposure between different images, dynamic frame selection is used to ensure the brightness of the input image. Histogram mapping can be used to map images with different exposures to the same brightness range, and then the differences between different frames are calculated to find the ghosting region in the exposure image of each frame. When objects in the scene are moving, there will be obvious differences between frames. The difference between two frames is subtracted to obtain the absolute value of the brightness difference between the two frames. It is then judged whether it is greater than a threshold to analyze the motion characteristics of the video or image sequence, determine whether there is object movement in the image sequence, and thus find the ghosting region.
[0035] S102: Calculate the fusion region for each frame of the image based on the local light source diffusion weight information in the spatial domain.
[0036] For example, step S102 specifically includes calculating the fusion weight of each frame of the image based on brightness, and assigning the weight of the image with high exposure in the light source area to the image with low exposure.
[0037] In this embodiment, such as Figure 3 As shown, based on multi-frame exposure images, the maximum RGB and grayscale values of the multi-frame exposure images can be used as the joint input luminance levels. Then, using the joint input luminance levels as a benchmark, the usable range for different exposures is divided. The fusion weight of each exposed image is calculated, and local luminance diffusion weight information in the spatial domain is introduced into the calculation of the global weight, which can effectively suppress the appearance of halos. Multi-frame exposure images utilize three or more images of the same scene at different exposures, performing image processing operations in the image transform domain or spatial domain to fuse them into a single image with high clarity and rich color details.
[0038] In the fusion of multi-frame exposure images, considering only global information makes it difficult to balance image contrast and can easily introduce issues such as halos. Therefore, it is necessary to incorporate local information during processing to protect or specially process difficult-to-process areas such as bright areas. Thus, light source diffusion weights can be used to specially process the light source region during post-processing such as weight calculation or mapping, in order to reduce halos and improve overall image contrast.
[0039] A light source diffusion model based on guided filtering can be used to calculate the light source diffusion weights. For example... Figure 4 As shown, after inputting multiple exposure images, the location of the light source needs to be determined. The biggest difference between a light source and a reflecting object is that a light source is a self-illuminating object, bright and not easily affected by ambient brightness. Therefore, short exposure shooting can also achieve a high sensor response. Thus, during the light source detection process, max pooling and a slicing photomask are used to obtain the main image, that is, the brightest part that reaches the threshold in the shortest exposure image is selected as our light source. The threshold can be adjusted according to the specific exposure ratio. Due to environmental limitations, the same light source has different halo diffusion characteristics under different atmospheric conditions. Correspondingly, a longer exposure time can characterize the light source's diffusion capability in the current scene to some extent. On the other hand, introducing too many bright areas of the light source in the low-frequency part during the fusion process will also amplify the halo. Therefore, an image with a longer exposure time that is similar in shape to the input image can be selected as the guide image, and its brightness information is used to guide the direction and intensity of the light source's outward diffusion. During the diffusion process, directions that are closer to the light source in the spatial domain and closer to the light source's brightness level will obtain a greater diffusion intensity. Through multiple iterations, a light diffusion map can be obtained. Given the smoothing characteristics of light source diffusion, these calculations can be performed on the downsampled small image, thus reducing computation time. Sampling, as commonly referred to, usually means downsampling, which is the extraction of a signal. Upsampling and downsampling both involve resampling a digital signal. The resampling rate is compared to the original sampling rate used to obtain the digital signal (e.g., sampled from an analog signal). A rate higher than the original rate is called upsampling, and a rate lower is called downsampling; upsampling is the reverse process of downsampling. The methods for calculating light source diffusion weights are not limited to those mentioned above; different calculation methods can be selected according to requirements.
[0040] To ensure optimal visual presentation across all regions of the image, unaffected by factors like scene lighting, it's necessary to select the most suitable regions from images with different exposures for fusion. This involves comprehensively adjusting brightness, color, and local spatial information to select the most suitable exposure for each local area. Weight calculation primarily selects the most appropriate multi-exposure fusion region based on the image's brightness and related color information. In pixel-level weight calculations, since adjacent exposure differences in the input images are relatively close, more than one frame might fall within the appropriate brightness range. In such scenarios, color information is incorporated as a reference in weight allocation, providing more support for more vibrant colors. Furthermore, using a single grayscale image as the standard for weight calculation can affect the fairness of weight allocation for objects with a single color due to the grayscale image generation method. Therefore, as... Figure 3 As shown, the brightness information involved in the calculation is changed to be composed of grayscale image and bright channel image, where bright channel image is generated by the maximum RGB flux of pixels in RGB color space; because the calculation rules of grayscale image make the prior color brightness information always lower than the brightness of the real scene to varying degrees, while adding color information can make the color of our fused image always bright and highly consistent with the natural image, preventing the situation of low color saturation of fused image due to weight allocation; secondly, local brightness diffusion information in the spatial domain is introduced in the process of calculating weight, mainly using the light source diffusion image to make the weight allocation threshold of the local area of the light source lower. The specific execution method can be expressed by the formula, which is transformed from formula (1) to formula (2):
[0041] W = (YY) low ) / (Y high -Y low ), Y high >Y>Y low (1)
[0042] W = (YY) p_low ) / (Y p_high -Y p_low ), Y p_high >Y>Y p_low (2)
[0043] Y p_low =Y low -a, Y p_high =Y high -a (3)
[0044] In formula (1), W is the image fusion weight, Y is the image brightness, and Y0 is the image fusion weight. high For high image brightness, Y lowFor low image brightness; in formulas (2) and (3), W is the image fusion weight calculated after the grayscale image and the bright channel image are jointly constructed, and Y is the image brightness after the grayscale image and the bright channel image are jointly constructed. p_low Y represents the low brightness of the image composed of the grayscale image and the bright channel image. p_high This represents the high brightness of the image, which is constructed from the grayscale image and the bright channel image; 'a' represents the pixel-level light source diffusion weight. In conventional weight calculations, values below Y... low The time weights are all allocated to the current frame, which is higher than Y. high Weights are not assigned to the current frame; they are a constant globally.
[0045] S103: Based on the fusion region, each frame of images with different exposures is fused; the image fusion is to use a pre-set algorithm to fuse multiple frames of images with different exposures into a fused image.
[0046] For example, step S103 specifically includes fusing images with different exposures in each frame using a Laplacian pyramid fusion method.
[0047] In this embodiment, a Laplacian pyramid-based fusion method is used to fuse images with different exposures in each frame. This primarily involves calculating the fusion region mask of images with different exposures using components such as brightness or detail. Then, multiple frames of exposed images are separated at different scales using the Laplacian operator to separate high frequencies, and finally fused to obtain a fused image with naturally transitioning blocks. The Laplacian pyramid method achieves a smooth transition between optimally selected regions of images with different exposures. Here, the mask is used to occlude (fully or partially) the processed image using a selected image, graphic, or object, thereby controlling the area or process of image processing; it is the specific image or object used for coverage.
[0048] Based on the Laplacian pyramid fusion, a "light source diffusion weight" associated with local spatial domain information is added. This aims to use local light source information in the spatial domain to exert influence in fusion and post-processing, so that the image has less halo and higher contrast during processing. The light source diffusion weight has been described in the above implementation and will not be repeated here.
[0049] Synthesizing a high dynamic range (HDR) image from multiple images of the same scene at different exposures is a mainstream approach to restoring the dynamic range of an image. This isn't limited to Laplacian pyramid fusion; other multi-exposure fusion algorithms can also be used, including HDR image synthesis based on physical exposure ratios, block-based structural decomposition algorithms, and Poisson fusion-based multi-exposure fusion algorithms. In exposure ratio-based HDR image synthesis, the main approach is to fill in overexposed pixels with appropriate values based on the original exposure ratios, then map them within a predetermined range. Block-based methods decompose the image's structural information into different modules, such as color, signal intensity, and signal structure, then fuse these modules separately, finally combining them into a single color image. Poisson fusion algorithms smooth the transitions and preserve edges in bright areas while pasting back information from underexposed areas of the same region.
[0050] In one feasible implementation, after fusing images of different exposures in each frame based on the fusion region, the method further includes: detecting whether there is a human image and / or the area where the human image is located; if the human image and / or the area where the human image is located exists, the human image is protected so that the brightness of the human image is kept within a suitable range.
[0051] In this embodiment, when the portrait brightness is low across the entire dynamic range of the scene (which often occurs in nighttime scenes with artificial lighting), the result of Laplacian multi-exposure fusion often shows a dark portrait, resulting in a very poor image quality. Therefore, it is necessary to protect the portrait in scenes with people and maintain its brightness within a suitable range. So, as... Figure 5 As shown, the first step is to detect the presence of a human face and its location. For more refined processing, the human face needs to be segmented separately. Therefore, the PFLD face detection model, based on MobileNet v2, can quickly detect the human face region, and the U-Net-based segmentation network can accurately segment the human face region. The PFLD face detection model can automatically locate a set of predefined facial reference points (such as the corners of the eyes and mouth), offering high detection accuracy and fast processing speed. The U-Net segmentation network is a CNN-based image segmentation network primarily used for medical image segmentation. Initially proposed for cell wall segmentation, it has since demonstrated excellent performance in lung nodule detection and retinal blood vessel extraction. Therefore, the U-Net-based segmentation network can accurately segment the human face region.
[0052] For example, the step of detecting the presence of a human face and / or the area containing the human face, and protecting the human face by keeping its brightness within a suitable range if the human face and / or the area containing the human face are present, includes using artificial intelligence face detection to detect the human face and / or the area containing the human face, calculating the gamma mapping curve based on the brightness changes of the human face portion before and after fusion, and ensuring that the human face brightness remains within a suitable range after fusion. This method can consistently ensure that the human face brightness is within a good range after fusion, offering the advantage of maintaining comfortable human face brightness with minimal computational effort. Alternatively, AI human face matting can be used, employing foreground and background block gamma mapping fusion. This primarily involves calculating different mapping curves for the foreground and background separately, ensuring the human face remains within a comfortable range while maintaining normal background mapping. This method offers high accuracy, and processing the human face and background separately allows for more refined human face processing while ensuring normal background mapping. Therefore, this approach can accurately protect the details, color, and brightness of the human face without affecting the normal mapping of the background. The above methods can effectively protect the human face. The gamma curve is a special type of tone curve. Within the 0-1 range, when the gamma value equals 1, the input and output signals are linearly correlated; the input equals the output, and there is no signal distortion. When gamma is greater than 1, the output signal is always less than the input signal; gamma values greater than 1 result in less distortion of the input signal, making the output brightness darker than expected. When gamma is less than 1, the output signal is always greater than the input signal; gamma values less than 1 make the output brightness brighter. Face protection methods are not limited to this; other face protection methods can be used as needed.
[0053] S104: Compress the fused image to a preset dynamic range range.
[0054] In this implementation, firstly, multiple different curves are used to map the main part, highlight part, and dark part; secondly, considering local spatial information, pixel-level Y-value fusion is performed on the two mapped images; finally, the color is corrected in the normal brightness range of the image based on the color of the main frame, using its Laplacian pyramid fusion weight as a mask, to ensure color consistency before and after fusion; the fused high dynamic range image is compressed to a specified dynamic range range, ultimately obtaining a high-quality image with a higher dynamic range. YUV is a color encoding method, a type of true-color color space, commonly used in various video processing components. When encoding photos or videos, YUV takes into account human perception and allows for reduced chroma bandwidth. "Y" represents luminance, which is the grayscale value, i.e., the range from black to white, so an image with only Y values is black and white; "U" represents chroma value, and "V" represents color saturation. "U" and "V" are used to specify the color of a pixel.
[0055] If the dynamic range of the image exceeds the predetermined range due to fusion, it needs to be compressed to the given range. Since the brightness distribution varies across different regions of the image, excessive reliance on global information can lead to compression of the dynamic range into a fixed and small range, resulting in a flattened overall contrast in the compressed image. Therefore, in this section, we will give more consideration to local information in the spatial domain to minimize the compression of local contrast. Figure 6 As shown, firstly, we divide and map the global dynamic range of the high dynamic range image into multiple intervals, such as a primary display interval and a highlighting interval. The highlighting interval can include bright display intervals and dark display intervals. The primary display interval will be divided by cutting the information at both ends of the histogram and mapped using diagonal lines, mainly to ensure that the contrast information in areas with normal brightness is not excessively compressed. To ensure that the information and contrast in bright and dark areas are not compressed, we can choose to use gamma curves greater than 1 and less than 1 for mapping. The human eye's perception of brightness conforms to the gamma curve, meaning it is more sensitive to changes in dark areas and less sensitive to changes in bright areas. Some data is lost during image acquisition and encoding. By increasing the bit width and using an encoding form that conforms to the human eye curve (i.e., inverted gamma), the impact of data loss on image quality can be reduced.
[0056] During the mapping process, to better handle the dynamic range, halo, and highlight details near the light source, a light source diffusion image can be used. This image diffuses outward from the light source, making brighter neighboring areas easier to diffuse based on the brightness information of nearby pixels in the spatial domain. Correspondingly, in our image fusion process involving multiple curve mappings, this diffusion image will be used as one of our fusion weight benchmarks. For the light source portion, more information from the main display area image will be used; the closer to the light source, the more highlight display area information will be used. By fusing the main image, the light source diffusion image, and the highlight image in this way, an image with a higher dynamic range can be obtained. This fusion method, which introduces local information from the spatial domain, has the advantages of preventing local dynamic range compression, improving image contrast, and suppressing halo during the mapping process.
[0057] In one feasible implementation, after compressing the fused image to a pre-defined dynamic range range, the method further includes: using an image enhancement method to make the local brightness of different regions of each frame of the image with different exposures approach each other.
[0058] In this embodiment, to specifically address potential halos or fog in the input image, a post-processing halo technique based on the Retinex concept can be used. Retinex is a commonly used image enhancement method built upon scientific experiments and analysis; its main function is to adjust the local brightness information of different regions of an image to a similar degree. Based on this function, a negative image and Retinex processing mode can be used to reduce the brightness of areas that may contain fog to a level as consistent as possible with normal areas, thereby suppressing the fog. Specifically, as shown... Figure 7 As shown, firstly, a negative of the Y image in the YUV image is obtained. A negative is an image obtained after exposure and development; its brightness is opposite to that of the subject, and its color is the complementary color of the subject. It needs to be printed onto a photograph to be restored to a positive image. For example, in black and white film, a person's hair appears white on the negative, but white clothing appears black on the film. In color film, the colors on the film are complementary to the actual colors of the scene; for example, red clothing appears cyan on the film. Secondly, the illuminance of the negative image is estimated, and the illuminance is separated from the negative image based on the illuminance image. Illuminance refers to the energy of visible light received per unit area, used to indicate the intensity of light and the degree to which an object's surface is illuminated. Finally, the negative image after illuminance separation is restored to its normal Y domain, and after proportionally mapping the UV values, the local brightness information of different areas of the image can be adjusted to a similar degree.
[0059] In Retinex illumination image calculations, morphological closing operations can be used to smooth images with halos in a more targeted manner, while also covering up some darker information. The closing operation involves dilation followed by erosion, which fills small holes within objects and connects neighboring objects to smooth boundaries. In negative scenes, the local brightness information in halo areas is relatively flat and dark, meaning these areas are not significantly affected by the closing operation. Other areas, after morphological calculation, tend to be closer to 1 in the [0, 1] interval. This allows us to effectively separate halo and non-halo areas. Furthermore, inevitably, the edges of the closing operation will not always align with natural edges, resulting in artificial artifacts. Therefore, guided filtering can be used to correct these artifacts. Guided filtering is mainly used for image enhancement, image fusion, image dehazing, image denoising, feathering, beautification, and 3D reconstruction, and it is fast and efficient in image processing.
[0060] As can be seen from the above, the image multi-frame fusion method provided in this specification fuses multiple images of the same scene with different exposures. During the image fusion process, a light source diffusion weighting method is used to introduce local information into the image, resulting in a higher dynamic range and lower halo, making the image more aesthetically pleasing and of higher quality. Furthermore, Retinex post-processing is used to further reduce the halo of the input image while maintaining image contrast, especially when light fog is present in the input image, further improving the halo suppression effect and stability. Dynamic frame selection is used to ensure the brightness of the input image; short-exposure bright areas and long-exposure dark areas masks are used to calculate the histogram cutting thresholds at both ends of the fused image, ensuring optimal dynamic range cutting; the main frame color is used as a reference for mapping to the fused image, ensuring color consistency; this results in robust fusion effects in complex scenes; the portrait protection module ensures stable output even in portrait scenes; and it guarantees natural transitions between normal areas and light sources even in complex lighting environments.
[0061] Please see Figure 2 This specification also provides an apparatus for multi-frame image fusion, which may include the following modules.
[0062] The acquisition module 210 is used to perform image alignment based on the frame with the highest exposure sharpness in the image; image alignment includes extracting feature points of each frame of the image so that each frame of the image has the same spatial layout.
[0063] The calculation module 220 is used to calculate the fusion region of each frame of the image based on the local light source diffusion weight information in the spatial domain.
[0064] The fusion module 230 is used to fuse images with different exposures in each frame based on the fusion region; image fusion is to fuse multiple frames of images with different exposures into a fused image using a pre-set algorithm;
[0065] The mapping module 240 is used to compress the fused image to a preset dynamic range range.
[0066] In one feasible implementation, the computing module 220 is further configured to:
[0067] Detect moving objects between images with different exposures in each frame after image alignment; map each image with different exposures to the same brightness range; calculate the differences between different frames and find the interference regions in each image with different exposures.
[0068] In one feasible implementation, the computing module 220 is further configured to:
[0069] The fusion weight of each frame of the image is calculated based on brightness, and the weight of the image with high exposure in the light source area is assigned to the image with low exposure.
[0070] In one feasible implementation, the fusion module 230 is further configured to:
[0071] The Laplacian pyramid fusion method is used to fuse images with different exposures in each frame.
[0072] In one feasible implementation, the mapping module 240 is further configured to:
[0073] Image enhancement methods are used to make the local brightness of different areas in each frame of the image at different exposures similar to each other.
[0074] In one feasible implementation, the image multi-frame fusion apparatus further includes:
[0075] The monitoring module is used to detect the presence of a human image and / or the area where the human image is located. If a human image and / or the area where the human image is located are found, the human image is protected so that the brightness of the human image is kept within an appropriate range.
[0076] In one feasible implementation, the monitoring module is also used for:
[0077] By using artificial intelligence face detection to detect human images and / or areas where human images exist, and by calculating the gamma mapping curve based on the brightness changes of the human image portion before and after fusion, the brightness of the human image is kept within an appropriate range.
[0078] This specification also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the image multi-frame fusion method described in any of the above embodiments.
[0079] This specification also provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the image multi-frame fusion method described in any of the above embodiments.
[0080] It is understood that the specific examples in this document are only intended to help those skilled in the art better understand the embodiments described herein, and are not intended to limit the scope of the invention.
[0081] It is understood that in the various embodiments described in this specification, the sequence number of each process does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments described in this specification.
[0082] It is understood that the various implementation methods described in this specification can be implemented individually or in combination, and the implementation methods in this specification are not limited in this respect.
[0083] Unless otherwise stated, all technical and scientific terms used in the embodiments of this specification have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the scope of this specification. The term "and / or" as used in this specification includes any and all combinations of one or more of the associated listed items. The singular forms "a," "the," and "the" as used in the embodiments of this specification and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise.
[0084] It is understood that the processor in the embodiments of this specification can be an integrated circuit chip with signal processing capabilities. In implementation, each step of the above method embodiments can be completed by integrated logic circuits in the processor's hardware or by instructions in software form. The processor can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this specification. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this specification can be directly implemented by a hardware decoding processor, or by a combination of hardware and software modules in the decoding processor. The software modules can reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. This storage medium is located in memory; the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above methods.
[0085] It is understood that the memory in the embodiments of this specification may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory may be random access memory (RAM). It should be noted that the memory in the apparatus and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
[0086] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this specification.
[0087] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the aforementioned method implementations, and will not be repeated here.
[0088] In the several embodiments provided in this specification, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.
[0089] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment, depending on actual needs.
[0090] In addition, the functional units in the various embodiments of this specification can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0091] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of this specification, in essence, or the parts that contribute to the prior art, or parts of the technical solutions, can be embodied in the form of software products. These computer software products are stored in a storage medium and include several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this specification. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0092] The above description is merely a specific embodiment of this specification, but the scope of protection of this invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in this specification should be included within the scope of protection of this specification. Therefore, the scope of protection of this invention should be determined by the scope of the claims.
Claims
1. A method for image multi-frame fusion, characterized in that, include: Image alignment is performed based on the frame with the highest exposure sharpness in the image; The image alignment includes extracting feature points from each frame of the image so that each frame of the image has the same spatial layout; Based on the local light source diffusion weight information in the spatial domain, the fusion region of each frame of the image is calculated, including: calculating the fusion weight of each frame of the image based on brightness, and assigning the weight of the image with high exposure in the light source region to the image with low exposure to suppress the appearance of halo. Based on the fusion region, images with different exposures in each frame are fused; the image fusion is to use a pre-set algorithm to fuse multiple frames of images with different exposures into a fused image; The fused image is compressed to a pre-defined dynamic range.
2. The method of image multi-frame fusion according to claim 1, characterized in that, After performing image alignment based on the frame with the highest exposure sharpness in the image, the process also includes: Detect moving objects between images with different exposures in each frame after image alignment; map each image with different exposures to the same brightness range; calculate the differences between different frames and find the interference regions in each image with different exposures.
3. The image multi-frame fusion method according to claim 1, characterized in that, The step of fusing images with different exposures in each frame based on the fusion region includes: The Laplacian pyramid fusion method is used to fuse images with different exposures in each frame.
4. The method of image multi-frame fusion according to claim 1, characterized in that, The process of fusing images with different exposures in each frame based on the fusion region further includes: The system detects the presence of a human image and / or the area containing the human image. If the human image and / or the area containing the human image are present, the human image is protected to maintain its brightness within an appropriate range.
5. The method of image multi-frame fusion according to claim 4, characterized in that, The step of detecting the presence of a human image and / or the area containing the human image, and if the human image and / or the area containing the human image are present, protecting the human image and maintaining its brightness within a suitable range, includes: By using artificial intelligence face detection to detect the human image and / or the area where the human image exists, the gamma mapping curve is calculated by the brightness change of the human image portion before and after, so that the brightness of the human image is kept within a suitable range after fusion.
6. The method of image multi-frame fusion according to claim 1, wherein, After compressing the fused image to a preset dynamic range range, the method further includes: Image enhancement methods are used to make the local brightness of different areas in each frame of the image at different exposures similar to each other.
7. An apparatus for multi-frame image fusion, characterized in that, include: The acquisition module is used to perform image alignment based on the frame with the highest exposure sharpness in the image. The image alignment includes extracting feature points from each frame of the image so that each frame of the image has the same spatial layout; The calculation module is used to calculate the fusion region of each frame of the image based on the local light source diffusion weight information in the spatial domain. This includes: calculating the fusion weight of each frame of the image based on brightness, and assigning the weight of the image with high exposure in the light source region to the image with low exposure to suppress the appearance of halo. The fusion module is used to fuse images of different exposure levels in each frame based on the fusion region; the image fusion is to fuse multiple frames of images of different exposure levels into a fused image using a pre-set algorithm; The mapping module is used to compress the fused image to a preset dynamic range range.
8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the program, implements the method as claimed in any one of claims 1 to 6.
9. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 6.