Image fusion methods and apparatuses, storage media and electronic devices

By calculating the first registration displacement of the non-reference image and using semantic segmentation to repair abnormal pixel values, the problem of poor registration in multi-camera image fusion is solved, and the quality of the fused image is improved.

CN115526813BActive Publication Date: 2026-06-30BEIJING JIGAN TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING JIGAN TECH CO LTD
Filing Date
2021-06-25
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing multi-camera image fusion technologies, the limitations of parallax and registration algorithms lead to poor image registration, which prevents effective fusion and affects the final image quality.

Method used

By acquiring the image to be fused, calculating the first registration displacement of the pixels in the non-reference image relative to the reference image, performing semantic segmentation to obtain a segmentation mask, repairing abnormal pixel values ​​in the displacement image, using normal pixel values ​​for registration, and finally performing image fusion.

Benefits of technology

It improves image registration results, enhances the quality of fused images, and increases the area that can be effectively registered, thus meeting user needs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115526813B_ABST
    Figure CN115526813B_ABST
Patent Text Reader

Abstract

This application relates to the field of image processing technology, and provides an image fusion method, apparatus, storage medium, and electronic device. The image fusion method includes: acquiring an image to be fused, which includes a reference image and a non-reference image; calculating a first registration displacement of the pixels to be fused in the non-reference image relative to their corresponding pixels in the reference image, obtaining a displacement image; performing semantic segmentation on the reference image to obtain a segmentation mask; identifying abnormal pixel values ​​in the displacement image, and repairing the abnormal pixel values ​​within the segmented regions of the displacement image using normal pixel values, obtaining a repaired displacement image; registering the non-reference image based on the repaired displacement image, obtaining a registered non-reference image; and fusing the reference image and the registered non-reference image to obtain a fused image. This method helps improve the image registration effect, thereby improving the quality of the final fused image.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and more specifically, to an image fusion method and apparatus, a storage medium, and an electronic device. Background Technology

[0002] Modern smartphones are generally equipped with multiple cameras, and when taking photos, images captured by multiple cameras can be merged to improve the quality of the photos.

[0003] In multi-camera fusion algorithms, it is often necessary to first register the images to be fused, and then fuse the registered images. However, there is parallax between images captured by different cameras, and effective registration cannot be performed in the parallax region. In addition, due to the limitations of the registration algorithm itself, there are also some regions in the image that cannot be effectively registered. The existence of these regions results in poor registration quality of the images to be fused, which in turn leads to poor quality of the fused image, making it difficult to meet user requirements. Summary of the Invention

[0004] The purpose of this application is to provide an image fusion method, apparatus, storage medium, and electronic device to improve the above-mentioned technical problems.

[0005] To achieve the above objectives, this application provides the following technical solution:

[0006] In a first aspect, embodiments of this application provide an image fusion method, comprising: acquiring an image to be fused, the image to be fused including a reference image and a non-reference image; calculating a first registration displacement of the pixels to be fused in the non-reference image relative to corresponding pixels in the reference image, to obtain a displacement image; performing semantic segmentation on the reference image to obtain a segmentation mask, the segmentation mask including position information of multiple segmentation regions; determining abnormal pixel values ​​in the displacement image, and repairing the abnormal pixel values ​​within the segmentation regions of the displacement image using normal pixel values, to obtain a repaired displacement image; registering the non-reference image according to the repaired displacement image to obtain a registered non-reference image; and fusing the reference image and the registered non-reference image to obtain a fused image.

[0007] In the above method, the non-reference image is not directly registered using the displacement image. Instead, abnormal pixel values ​​in the displacement image are first identified, and then repaired using normal pixel values ​​from the displacement image. The non-reference image is then registered based on the repaired displacement image. Since the abnormal pixel values ​​in the displacement image constitute the regions in the non-reference image where effective registration is impossible, repairing these abnormal pixel values ​​is equivalent to increasing the area where effective registration is possible. This improves the image registration effect and ultimately enhances the quality of the final fused image.

[0008] Furthermore, when repairing abnormal pixel values ​​in a displacement image, the above method utilizes a segmentation mask obtained by semantic segmentation of the reference image. The repair of abnormal pixel values ​​is performed within each segmentation region contained in the segmentation mask, rather than across segmentation regions. This is because each segmentation region in the segmentation mask represents the same or similar object, and can be approximated as a planar region. Therefore, pixel values ​​within the same segmentation region of the displacement image are likely to be smooth, meaning that the values ​​generally do not exhibit abrupt changes. Thus, region-based pixel value repair yields more reasonable repair values, resulting in better repair performance.

[0009] In one implementation of the first aspect, the step of repairing abnormal pixel values ​​using normal pixel values ​​within a segmented region in the displacement image includes: filtering the displacement image within the segmented region to repair abnormal pixel values ​​using normal pixel values ​​within the segmented region. The repair method includes: calculating new pixel values ​​based on normal pixel values ​​within the filtering window, and replacing abnormal pixel values ​​located at the center of the filtering window with the new pixel values.

[0010] In the above implementation, abnormal pixel values ​​in the displacement image are repaired by performing filtering operations within the segmented region. During filtering, the abnormal pixel values ​​located at the center of the window are continuously repaired by constantly moving the position of the filtering window (if the pixel values ​​at the center of the window are indeed abnormal). Since the size of the filtering window is usually small, such as a 3×3 or 5×5 rectangle, the pixel values ​​of the displacement image within the same filtering window can be considered smooth, making this pixel value repair method quite reasonable.

[0011] In one implementation of the first aspect, filtering the displacement image within the segmented region includes: filtering the displacement image multiple times within the segmented region until all abnormal pixel values ​​in the displacement image are repaired.

[0012] In the above implementation, the filtering operation can be repeated to maximize the repair effect.

[0013] In one implementation of the first aspect, determining abnormal pixel values ​​in the displacement image includes: dividing the reference image and the non-reference image into regions respectively; calculating the homography matrix between each region in the non-reference image and the corresponding region in the reference image; for each pixel to be fused in the non-reference image, performing the following operations: calculating the second registration displacement of the pixel relative to the corresponding pixel in the reference image using the homography matrix of the region to which the pixel belongs; determining whether the difference between the first registration displacement and the second registration displacement of the pixel exceeds a first threshold; if it exceeds the first threshold, then determining the first registration displacement of the pixel as an abnormal pixel value in the displacement image.

[0014] The above implementation method can effectively detect disparity regions (composed of pixels with detected abnormal values) in displacement images.

[0015] In one implementation of the first aspect, determining abnormal pixel values ​​in the displacement image includes: statistically analyzing the pixel values ​​in the displacement image by region, and determining pixel values ​​with a frequency less than a second threshold as abnormal pixel values ​​based on the statistical results of each region.

[0016] The above implementation method can effectively detect displacement distortion regions in displacement images (composed of pixels with detected abnormal values).

[0017] In one implementation of the first aspect, calculating the first registration displacement of the pixel to be fused in the non-reference image relative to the corresponding pixel in the reference image to obtain a displacement image includes: determining the common part in the reference image and the non-reference image as the region to be fused; calculating the first registration displacement of the pixel in the non-reference image relative to the corresponding pixel in the reference image in the region to be fused to obtain the displacement image.

[0018] For both reference and non-reference images, if they contain identical content, the entire image can be used for fusion; if they only partially share the same content, only the common portions can be fused. In short, the selection of the region to be fused offers a high degree of flexibility.

[0019] In one implementation of the first aspect, the step of acquiring the image to be fused includes: acquiring an original image to be fused, the original image to be fused including an original reference image and an original non-reference image, the original reference image and the original non-reference image being images captured by cameras at different zoom ratios; normalizing the original reference image and the original non-reference image to the same zoom ratio to obtain the reference image and the non-reference image.

[0020] In images acquired at different zoom levels, the same object (such as a person or a physical object) may have different sizes (referring to the size in the image), making direct registration difficult. It is necessary to normalize the zoom level of the image first.

[0021] In one implementation of the first aspect, the reference image is an image captured by a wide-angle camera and the non-reference image is an image captured by a telephoto camera; or, the reference image is an image captured by a color camera and the non-reference image is an image captured by a monochrome camera.

[0022] The above implementation methods provide two specific image fusion scenarios. Of course, it is understandable that image fusion is not limited to these two application scenarios.

[0023] Wide-angle cameras capture images with a larger field of view, but the objects within them are smaller, resulting in relatively lower clarity. Telephoto cameras capture images with a smaller field of view, but the objects within them are larger, resulting in relatively higher clarity. Therefore, merging the images captured by the telephoto camera into the images captured by the wide-angle camera can improve image quality while maintaining the shooting range.

[0024] Color cameras capture images with color, which can meet most users' needs, but the image clarity is relatively low. Black and white cameras capture images without color, which cannot meet most users' needs, but the image clarity is relatively high. Therefore, fusing images captured by black and white cameras into images captured by color cameras can improve image quality while meeting the general needs of users.

[0025] Secondly, embodiments of this application provide an image fusion apparatus, comprising: a module for acquiring an image to be fused, wherein the image to be fused includes a reference image and a non-reference image; a displacement image acquisition module for calculating a first registration displacement of a pixel to be fused in the non-reference image relative to a corresponding pixel in the reference image, thereby obtaining a displacement image; an image segmentation module for performing semantic segmentation on the reference image to obtain a segmentation mask, wherein the segmentation mask includes position information of multiple segmentation regions; a pixel value repair module for determining abnormal pixel values ​​in the displacement image and repairing the abnormal pixel values ​​within the segmentation regions of the displacement image using normal pixel values, thereby obtaining a repaired displacement image; an image registration module for registering the non-reference image according to the repaired displacement image, thereby obtaining a registered non-reference image; and an image fusion module for fusing the reference image and the registered non-reference image, thereby obtaining a fused image.

[0026] Thirdly, embodiments of this application provide a computer-readable storage medium storing computer program instructions, which, when read and executed by a processor, perform the method provided in the first aspect or any possible implementation thereof.

[0027] Fourthly, embodiments of this application provide an electronic device, including: a memory and a processor, wherein the memory stores computer program instructions, and the computer program instructions are read and executed by the processor to perform the method provided in the first aspect or any possible implementation of the first aspect. Attached Figure Description

[0028] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments of this application will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0029] Figure 1 The flowchart of the image fusion method provided in the embodiments of this application is shown;

[0030] Figure 2 This illustrates an application scenario of the image fusion method provided in an embodiment of this application;

[0031] Figure 3 The segmentation mask in an embodiment of this application is shown;

[0032] Figure 4 The filtering principle of the displacement image in the embodiments of this application is illustrated;

[0033] Figure 5 This illustrates the difference in whether a segmentation mask is used during filtering of the displacement image in the embodiments of this application;

[0034] Figure 6 The structure of the image fusion apparatus provided in the embodiments of this application is shown;

[0035] Figure 7 The structure of the electronic device provided in the embodiments of this application is shown. Detailed Implementation

[0036] The technical solutions of the embodiments of this application will now be described with reference to the accompanying drawings. It should be noted that similar reference numerals and letters in the following drawings indicate similar items; therefore, once an item is defined in one drawing, it does not need to be further defined and explained in subsequent drawings.

[0037] The terms “comprising,” “including,” or any other variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase “comprising one…” does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0038] The terms “first,” “second,” etc., are used only to distinguish one entity or operation from another, and should not be construed as indicating or implying relative importance, nor as requiring or implying any such actual relationship or order between these entities or operations.

[0039] Figure 1 The flowchart of the image fusion method provided in the embodiments of this application is illustrated. This method can be, but is not limited to, [the method described above]. Figure 7 The illustrated electronic device performs the operation; the structure of this electronic device can be referred to in the following section. Figure 7 The explanation. (Refer to...) Figure 1 The method includes:

[0040] Step S110: Obtain the image to be fused.

[0041] The images to be considered and fused include multiple frames, one of which is a reference image, and the rest are non-reference images. For simplicity, we will first consider the case where there is only one non-reference image. The reference image can refer to the image used as the basis for image fusion. That is, the non-reference image should be registered to the coordinate system of the reference image before being fused with the reference image, while the reference image itself does not need to be registered.

[0042] Reference images and non-reference images can originate from images captured by different cameras, such as images captured by multiple cameras on a mobile phone, or they can originate from images captured by the same camera, such as images captured by a single camera on a mobile phone at multiple consecutive moments. In the following text, we will mainly take the case of images originating from different cameras as an example.

[0043] For example, the reference image can be an image captured by a wide-angle camera, while the non-reference image can be an image captured by a telephoto camera. The wide-angle camera captures a larger image area, but the objects within it, such as people and other objects, are smaller in size (meaning they occupy fewer pixels in the image), resulting in relatively lower sharpness. Conversely, the telephoto camera captures a smaller image area, but the objects within it are larger in size, resulting in relatively higher sharpness. Therefore, fusing the image from the telephoto camera into the image from the wide-angle camera can improve image quality while maintaining the shooting range.

[0044] It should be noted that the phrase "the reference image is an image captured by a wide-angle camera" means that the reference image originates from an image captured by a wide-angle camera, but it does not mean that the reference image is necessarily the original image captured by the wide-angle camera. It may also be an image obtained by processing the original image captured by the wide-angle camera. The same applies to non-reference images.

[0045] For example, the reference image can be an image captured by a color camera, while the non-reference image can be an image captured by a monochrome camera. The color camera image has color, which meets most user needs (e.g., most users take color photos), but its image clarity is relatively low. The monochrome camera image lacks color and cannot meet most user needs, but its image clarity is relatively high. Therefore, fusing the monochrome camera image into the color camera image can improve image quality while meeting general user needs.

[0046] Furthermore, different cameras may have different zoom ratios. For example, a mobile phone's wide-angle camera might have a 1x zoom ratio, while its telephoto camera might have a 1.5x zoom ratio (note that wide-angle and telephoto here should be understood in a relative sense, not strictly wide-angle and telephoto in a photographic sense). This results in the same object appearing at different sizes in the raw images captured by different cameras, making it difficult to directly register these raw images and thus difficult to fuse them. Similarly, for the same zoomable camera, the same problem exists if you want to fuse the raw images captured at different zoom ratios.

[0047] For example, in Figure 2 The first row contains two images, which are original images captured by cameras with different zoom levels. The image on the left was captured by a wide-angle camera, and the image on the right was captured by a telephoto camera; these are referred to as the original reference image and the original non-reference image, respectively. It can be observed that the same woman in white appears smaller in the original reference image but larger in the original non-reference image, making it difficult to directly fuse the original reference image and the original non-reference image.

[0048] In such cases, the original reference image and the original non-reference image can be normalized to the same zoom level to obtain the reference image and the non-reference image, and then the subsequent image registration and fusion steps can be performed.

[0049] For example, for Figure 2Because a wide-angle camera captures a larger image range than a telephoto camera, the content of the original non-reference image is actually already included in the original reference image (although the shooting angle may be slightly different). Therefore, the normalization method can be to keep the original reference image unchanged (i.e., directly use the original reference image as the reference image), while reducing the size of the original non-reference image to obtain the non-reference image, such as... Figure 2 As shown in the second row. The reduction factor of the non-reference image depends on the zoom factor of the original reference image and the original non-reference image. For example, if the zoom factor of the original reference image is 1x and the zoom factor of the original non-reference image is 1.5x, then the reduction factor is 1.5 (the side length is reduced to 2 / 3 of the original).

[0050] Continue to refer to Figure 2 The reference image also contains a dashed box, the contents of which are the same as those in the non-reference image (although the shooting angle may be slightly different). That is, the dashed box represents the common part between the reference image and the non-reference image. This part can be called the fusion region. Pixels in the reference image and the non-reference image within the fusion region can be fused. The fusion method is detailed in the following steps. Pixels outside the fusion region cannot be fused.

[0051] The location of the region to be fused in the reference and non-reference images is computational. For Figure 2 By simply keeping the center of the reference image's border unchanged and reducing its side length to 2 / 3 of its original value, the position of the region to be fused in the reference image can be obtained. The position of the region to be fused in the non-reference image is simply the border of the non-reference image. In other words, when performing image fusion, only the portion of the non-reference image within the dashed frame of the reference image needs to be fused, while the portion of the reference image outside the dashed frame can maintain its original pixel values.

[0052] exist Figure 2 In the example shown, the original reference image is not processed during normalization, but there are some cases where processing of the original reference image is necessary. For example, a mobile phone's wide-angle camera has a zoom ratio of 1x, and its telephoto camera has a zoom ratio of 2x, both with fixed focal lengths. However, the user wants to see a 1.5x zoom image on the phone screen. In this case, the target image can be obtained by fusing the images captured by the wide-angle camera and the telephoto camera. One possible approach is as follows:

[0053] Because a zoom ratio of 1x < 1.5x, in the original reference image with a zoom ratio of 1x, some content, specifically near the image border, occupies the entire image. Figure 1The portion representing 3 / 4 (1-1 / 1.5) is not visible in the 1.5x zoom image. It can be cropped from the original reference image, and then the cropped image can be enlarged appropriately (since mobile phone camera resolution is fixed, cropping reduces resolution, so enlarging the image restores the original resolution). This will give you the 1.5x zoom reference image. Since 2x zoom > 1.5x, the content of the original non-reference image at 2x zoom is already included in the 1.5x zoom image. Therefore, you only need to reduce the original non-reference image to 3 / 4 (1.5 / 2) of its original size to obtain the 1.5x zoom non-reference image.

[0054] After obtaining the reference and non-reference images, the position of the region to be fused can also be calculated. Specifically, by keeping the center of the reference image's border unchanged and reducing its side length to 3 / 4 of its original value, the position of the region to be fused in the reference image can be obtained. The position of the region to be fused in the non-reference image is simply the border of the non-reference image. Once the region to be fused is determined, the specific fusion steps will be described later.

[0055] It's understandable that if the original reference image and the original non-reference image already have the same zoom ratio, they can be directly used as the reference and non-reference images without any scaling. In this case, there's no need to calculate the fusion region, as the entire reference and non-reference images can be used for fusion. Of course, the fusion region can be freely specified according to user needs. For example, even if the entire reference and non-reference images can be used for fusion, only the central region of the two images can be selected as the fusion region. For simplicity, in the following text, we can assume that the fusion region always needs to be calculated; the case where the fusion region doesn't need to be calculated can be considered a special case where the fusion region is the entire image area.

[0056] Step S120: Calculate the first registration displacement of the pixel to be fused in the non-reference image relative to the corresponding pixel in the reference image, and obtain the displacement image.

[0057] "Pixels to be fused in the non-reference image" refers to pixels in the non-reference image located within the region to be fused. Ideally, these pixels have corresponding pixels in the region to be fused in the reference image. Each pair of corresponding pixels corresponds to the same point on the actual object. However, due to parallax and other reasons, the coordinates of these two pixels in the region to be fused may be different. The difference between their coordinates is the first registration displacement. The first registration displacement can be regarded as a vector. Knowing the coordinates of a pixel in the non-reference image and its corresponding first registration displacement, the pixel can be moved to the position of the corresponding pixel in the reference image (i.e., pixel registration), and then pixel fusion can be performed on this basis.

[0058] For each pixel to be fused in the non-reference image, a first registration displacement can be calculated according to a specific dense registration algorithm. These first registration displacements also form an image based on their corresponding pixel coordinates in the non-reference image. The size of this image is the same as the region to be fused, and it is called a displacement image. That is, the value of each pixel in the displacement image is a first registration displacement. The aforementioned dense registration algorithm generally refers to an algorithm that performs registration pixel by pixel. In contrast, there is also a sparse registration algorithm, which performs registration only based on certain specific points. In step S120, a dense registration algorithm is used, such as optical flow. The optical flow field output by the optical flow method is the displacement image in step S120.

[0059] Note that although a first registration shift can be calculated for each pixel to be fused in the non-reference image, for various reasons, the value of this shift may not be able to register the pixel to be fused in the non-reference image to its actual corresponding pixel in the reference image. For example, a pixel representing an eye in the non-reference image may be incorrectly registered to a pixel representing a nose in the reference image by an inaccurate first registration shift. Such a first registration shift is called abnormal, which is reflected in the shift image as abnormal pixel values, preventing effective registration. These abnormal pixel values ​​in the shift image will be corrected in subsequent steps.

[0060] Step S130: Perform semantic segmentation on the reference image to obtain a segmentation mask.

[0061] Semantic segmentation is a class of image segmentation algorithms that divide an image into multiple regions with certain semantic meanings, which we can call segmented regions. For example, each segmented region can represent one object or a class of objects: for instance, dividing people in an image into one class of regions (a class of regions can contain multiple disconnected segmented regions), and the background into another class of regions; or dividing each person in an image into one region, and the background into another region; or dividing each person in an image into one region, and each object (such as a vehicle, road, tree, or building) into another region, and so on.

[0062] The result of semantic segmentation can be represented by a segmentation mask, which is an image in which pixel values ​​record the location information of each segmented region. For example, in the simplest case, if only two types of regions are divided, the result of semantic segmentation can be a binary mask, where pixel values ​​of 255 represent the regions containing people, and pixel values ​​of 0 represent the regions containing the background. Figure 3 An example of such a binary mask is shown, which corresponds to Figure 2 The semantic segmentation results of the reference image. It should be noted that, due to... Figure 2 The reference image only uses a portion of the region to be fused for fusion, therefore Figure 3The results shown are only the semantic segmentation results for this part, not the semantic segmentation results for the entire reference image. Optionally, when performing semantic segmentation, the entire reference image can be segmented, but only the part of the obtained segmentation mask located in the region to be fused can be used. Alternatively, only the part of the reference image located in the region to be fused can be segmented from the beginning to obtain the segmentation mask.

[0063] The solution in this application does not limit the specific semantic segmentation algorithm; for example, if performing... Figure 3 For human face segmentation, segmentation algorithms based on neural network models such as Unet and FCN can be used.

[0064] Step S140: Determine the abnormal pixel values ​​in the displacement image, and within the segmented region of the displacement image, use the normal pixel values ​​to repair the abnormal pixel values ​​to obtain the repaired displacement image.

[0065] Abnormal pixel values ​​in a displacement image can form anomalous regions. The presence of these anomalous regions prevents effective registration directly from the displacement image, severely impacting subsequent image fusion results. Based on the inventors' research, the possible anomalous regions include the following:

[0066] (1) Parallax region

[0067] When the reference and non-reference images are captured by different cameras, parallax exists between them due to the different shooting angles. This means that some content present in the non-reference image does not exist in the reference image. Consequently, some pixels to be fused in the non-reference image may not have corresponding pixels in the reference image, leading to an incorrect first registration displacement. This abnormal first registration displacement constitutes the parallax region in the displacement image.

[0068] (2) Distortion region

[0069] When performing image registration using optical flow, in the boundary region between the foreground and background (e.g. Figure 2 The first registration displacement (i.e., optical flow) calculated at the edge of the task is prone to anomalies. If image registration and fusion are performed according to these abnormal first registration displacements, the fused image will be distorted in the boundary area between the foreground and background. Therefore, the area in the displacement image composed of such abnormal first registration displacements can be called the distortion area.

[0070] (3) Other abnormal areas

[0071] Many other factors can cause the first registration displacement to be calculated incorrectly, and these abnormal first registration displacements constitute other abnormal regions in the displacement image. For example, there may be large areas of no texture or weak texture in the image to be fused. Because these areas lack features, the correspondence between pixels is unclear, resulting in inaccurate calculations of the first registration displacement.

[0072] In summary, the parallax region is caused by the camera's inherent configuration and is an inherent anomaly in multi-camera fusion scenarios. The other two types of anomalies are caused by the limitations of the registration algorithm itself and are difficult to completely avoid.

[0073] The following section uses parallax and distortion regions as examples to illustrate how to detect abnormal pixel values ​​belonging to these regions in a displacement image.

[0074] A. Parallax Region Detection

[0075] Step a1: Divide the reference image and the non-reference image into regions respectively.

[0076] For example, the reference image and non-reference image can be divided into several rectangular regions according to a predetermined method (e.g., uniform division), and the size of the regions can be set as needed. Optionally, there can be overlapping parts between the regions, that is, a pixel in the image can belong to multiple regions simultaneously. Understandably, it is also possible to further divide only the portions of the reference image and non-reference image within the region to be registered.

[0077] Step a2: Calculate the homography matrix between each region in the non-reference image and the corresponding region in the reference image.

[0078] The correspondence between regions can be determined by their location, the similarity of their image content, or other methods. In short, every region in the non-reference image can be found in a corresponding region in the reference image. The homography matrix represents the transformation from a region in the non-reference image to its corresponding region in the reference image. This transformation maps the former region to the plane containing the latter region. In other words, the homography matrix can be used to register regions in the non-reference image to the coordinate system of their corresponding regions in the reference image. This operation can be called region registration. Note that, strictly speaking, registration targets the image contained within the region; saying "register regions" here is a simplified way of saying it.

[0079] Step a3: For each pixel to be fused in the non-reference image, perform the following operations: First, calculate the second registration displacement of the pixel relative to the corresponding pixel in the reference image using the homography matrix of the region to which the pixel belongs; then, determine whether the difference between the first registration displacement and the second registration displacement of the pixel exceeds a first threshold; if it exceeds the first threshold, then determine the first registration displacement of the pixel as an abnormal pixel value in the displacement image.

[0080] According to the explanation in step a2, the region registration using the homography matrix is ​​to register the entire region. That is, all pixels in the region of the non-reference image are registered through the same transformation. This is different from the dense registration algorithm in step S120. The dense registration algorithm calculates a first registration displacement for each pixel to be registered in the non-reference image, and the first registration displacements corresponding to each pixel are independent to a certain extent. The first registration displacements corresponding to pixels in the same region do not necessarily follow the same transformation.

[0081] Furthermore, for a pixel to be fused in a non-reference image, given its coordinates, the coordinates of its corresponding pixel in the reference image can be obtained by transforming it using the homography matrix corresponding to the region to which the pixel belongs. The difference between these two coordinates is the second registration displacement.

[0082] As mentioned above, the calculation methods for the second registration shift and the first registration shift are different, and their meanings also differ. The first registration shift primarily characterizes the registration rule followed by the pixels themselves in the non-reference image, while the second registration shift primarily characterizes the registration rule followed by the region to which the pixels in the non-reference image belong. Ideally, the first and second registration shifts should be relatively consistent, meaning that individual distributions also follow the overall distribution. Therefore, if the difference (which can be the absolute value) between the first and second registration shifts of a pixel to be fused in the non-reference image is large enough to exceed a set first threshold, then the first registration shift of that pixel can be considered abnormal, meaning an abnormal pixel value has been detected in the shifted image.

[0083] The inventors discovered that the abnormal pixel values ​​detected by the above method can at least cover the parallax region in the displacement image.

[0084] In an optional approach, to improve the reliability of abnormal pixel value detection, a suitable region division method can be designed so that each pixel to be fused in the non-reference image belongs to multiple regions. This allows for the calculation of multiple second registration shifts for a single pixel in the non-reference image. The difference between the corresponding first registration shift and each second registration shift can then be calculated, and these differences can be compared with a first threshold to obtain multiple comparison results. The first registration shift of the pixel is determined to be abnormal only if the multiple comparison results meet certain conditions. These conditions could be: all comparison results have a difference greater than the first threshold; the majority of comparison results have a difference greater than the first threshold; and so on.

[0085] B. Distortion Region Detection

[0086] The following method can be used to detect pixel values ​​located in distorted regions: First, divide the displacement image into several regions according to a predetermined method (e.g., uniform division), such as rectangular regions. Then, statistically analyze the pixel values ​​in each region. If the statistical results show that the frequency of a certain pixel value is less than a predetermined second threshold, it is identified as an abnormal pixel value. Here, frequency refers to how frequently a pixel value appears, such as the number of times a pixel value appears, the frequency of a pixel value's occurrence, etc.

[0087] For example, if a rectangular region in a displacement image has a size of 10×10, which contains 100 pixel values, and the frequency is defined as the number of times a pixel value appears, and the second threshold is set to 10, if 50 (4,4), 45 (3,0), 3 (10,5), and 2 (20,3) appear among the 100 pixel values, then the 3 (10,5) and 2 (20,3) pixel values, a total of 5, can be judged as abnormal pixel values.

[0088] The inventors discovered that the abnormal pixel values ​​detected by the above method can at least cover the distorted region in the displacement image. The general principle of this detection method is as follows: the distorted region is generally distributed at the boundary between the foreground and background in the displacement image, and the distribution range of the distorted region is usually relatively narrow. Therefore, if the size of the divided region is relatively large, the pixels belonging to the distorted region are likely to account for only a small proportion (i.e., the frequency of the statistically analyzed pixel values ​​is low, such as (10,5) above), while most of them are pixels belonging to the foreground or background. The pixel value changes of these pixels are relatively small (i.e., the frequency of the statistically analyzed pixel values ​​is high, such as (4,4) above). Therefore, abnormal pixel values ​​can be detected by using the frequency of pixel value occurrence.

[0089] In an alternative, the aforementioned regional pixel value statistics can be replaced by algorithms such as regional pixel value clustering that can find isolated pixel values.

[0090] Other abnormal regions in the displacement image can also be detected using corresponding methods, which will not be described in detail here.

[0091] For pixel values ​​detected as abnormal, their pixel positions are marked. For example, a binary mask can be created, where pixels with values ​​of 0 correspond to pixels with abnormal values ​​in the displacement image, and pixels with values ​​of 255 correspond to pixels with normal values ​​in the displacement image. Initially, the pixel values ​​in this binary mask can be set to all 255. Each time an abnormal pixel value is detected in the displacement image, the pixel value at the corresponding position in the binary mask is set to 0. It should be noted that although there may be multiple reasons for abnormal pixel values ​​in the displacement image, the positions of abnormal pixel values ​​do not need to be distinguished when marking them. That is, the solution of this application uses a uniform method to repair pixel value abnormalities caused by different reasons.

[0092] In step S140, repairing abnormal pixel values ​​using normal pixel values ​​means calculating a new pixel value using the normal pixel values ​​and then replacing the abnormal pixel values ​​in the displacement image with this new pixel value to obtain the repaired displacement image. It's important to note that depending on the repair algorithm, the repaired displacement image and the original displacement image may not be the same image. Therefore, the "replacement" mentioned above only indicates that the abnormal pixel values ​​have been replaced in the repaired displacement image, but the abnormal pixel values ​​in the original displacement image may not necessarily have been covered.

[0093] The key to this application's solution is that pixel value repair is performed on a segmented region basis. That is, for each segmented region in the displacement image, the abnormal pixel values ​​within that region are repaired using the normal pixel values ​​within that region. Pixel value repair across segmented regions is not performed. The location information of the segmented regions can be obtained from the segmentation mask in step S130. It should be understood that if there are no abnormal pixel values ​​within a certain segmented region of the displacement image, repair processing is not required.

[0094] According to the definition of semantic segmentation of images, each segmentation region in the segmentation mask represents the same or the same type of object. Therefore, the pixel values ​​of the displacement image within the same segmentation region are likely to be smooth, that is, the values ​​generally do not jump, but only remain constant or gradually change. Therefore, pixel value repair by region can maintain this smoothness and avoid producing some unreasonable repair values, thus achieving a better repair effect.

[0095] In some implementations, abnormal pixel values ​​in a shifted image can be repaired by performing filtering operations within a segmented region. During the filtering process, the position of the filtering window is continuously moved. At each new position, a new pixel value is calculated based on the normal pixel values ​​within the current filtering window, and this new pixel value replaces the abnormal pixel value located at the center of the filtering window. Of course, if the pixel value at the center of the filtering window is normal, repair is not required, and filtering can proceed directly to the next position. If the filtering window is located at the boundary of the segmented region, and some pixels within the window are outside the segmented region, then only those pixels within the segmented region need to be considered during filtering.

[0096] There are different ways to calculate new pixel values ​​using the normal pixel values ​​within the filtering window. For example, one can calculate the average, weighted average, median, or extreme values ​​of the normal pixel values ​​within the window, depending on the specific filtering algorithm. Furthermore, it is not necessary to use all the normal pixel values ​​in the filtering window when calculating new pixel values; a portion of them can be used.

[0097] Figure 4 The diagram shows a 5×5 filtering window in the displacement image. Squares represent pixels, numbers represent pixel numbers, black squares represent pixels with abnormal values, and white squares represent pixels with normal values. Since the value of the center pixel 12 is abnormal, a weighted average of the values ​​of pixels 0, 1, 2, 3, 5, 6, 7, 10, 11, 15, and 20 can be calculated as the value of pixel 12 in the repaired displacement image. The weights can be calculated based on factors such as the grayscale value (or color value) of the corresponding pixel in the reference image and the distance between the pixel and the center of the window.

[0098] For the above-mentioned method of pixel value restoration by filtering within a region, since the size of the filtering window is usually small, such as a 3×3 or 5×5 rectangle, the pixel values ​​of the displacement image within the same filtering window can be considered smooth, thus this pixel value restoration method is more reasonable.

[0099] Figure 5 This illustrates the difference between using a segmentation mask and not using it during the filtering of the displacement image. (Refer to...) Figure 5The first row shows the pixel values ​​in the reference image, and the second row shows the pixel values ​​in the displacement image, with gray indicating abnormal pixel values. If the displacement image is directly filtered based on the similarity of pixel values ​​in the reference image, the filtering result is shown in the third row, where the three abnormal pixel values ​​are corrected to -1, 1, and 1, respectively, because the pixel values ​​in the reference image at these three positions are 10, 50, and 50. However, according to the segmentation mask in the fourth row, the six pixel values ​​on the left side of the reference image belong to the same segmentation region, while the three pixel values ​​on the right side belong to another segmentation region. The correction result in the third row causes a jump in pixel values ​​within the same segmentation region in the displacement image (from -1 to 1), making the correction result highly likely unreasonable. In the fifth row, the displacement image is filtered based on the similarity of pixel values ​​and the segmentation mask. The three abnormal pixel values ​​in the filtering result are repaired to -1, -1, and -1 respectively, because the pixel values ​​of the segmentation mask at these three positions are 0, 0, and 0. This repair result makes it so that the pixel values ​​of the displacement image in the same segmentation area do not jump (all are -1), so the repair result is likely to be reasonable.

[0100] Furthermore, step S140 does not specify whether all abnormal pixel values ​​in the displacement image should be repaired; only a portion may need to be repaired, depending on the specific pixel value repair algorithm. For example, if the displacement image is only filtered once, it may not be possible to repair all abnormal pixel values. For instance, if a certain filtering window contains only pixels with abnormal values, new pixel values ​​cannot be calculated. In such cases, the displacement image can be filtered multiple times according to the segmented regions until all abnormal pixel values ​​in the displacement image are repaired. Because each filtering operation repairs more abnormal pixel values, this implementation method can maximize the repair effect.

[0101] Understandably, besides filtering, there are other pixel value repair algorithms. For example, for each abnormal region (a connected region composed of abnormal pixel values) in the displacement image, the pixel values ​​in it can be directly replaced with a fixed value. This fixed value can be the average of the normal pixel values ​​around the abnormal region, or the average of all normal pixel values ​​in the segmented region where the abnormal region is located, and so on.

[0102] Step S150: Register the non-reference image based on the repaired displacement image to obtain the registered non-reference image.

[0103] For pixels to be fused in the non-reference image, the registration can be completed by moving them according to the corresponding pixel values ​​in the repaired displacement image.

[0104] Step S160: Fuse the reference image and the registered non-reference image to obtain a fused image.

[0105] The registered images can be fused using methods such as weighted summation, which will not be elaborated on here.

[0106] In summary, the image fusion method provided in this application does not directly register a non-reference image using a displacement image. Instead, it first identifies abnormal pixel values ​​in the displacement image, repairs them using normal pixel values ​​from the displacement image, and then registers the non-reference image based on the repaired displacement image. Since the abnormal pixel values ​​in the displacement image constitute the regions in the non-reference image that cannot be effectively registered, repairing these abnormal pixel values ​​is equivalent to increasing the area where effective registration is possible, thereby improving the image registration effect and ultimately enhancing the quality of the final fused image.

[0107] Furthermore, when repairing abnormal pixel values ​​in the displacement image, the above method utilizes a segmentation mask obtained by semantic segmentation of the reference image. The repair of abnormal pixel values ​​is performed within each segmentation region contained in the segmentation mask. This is because each segmentation region in the segmentation mask represents the same or the same type of object, so the pixel values ​​in the displacement image within the same segmentation region are likely to be smooth. Therefore, performing pixel value repair by region can obtain more reasonable repair values, thus achieving a better repair effect and improving the quality of the subsequently obtained fused image.

[0108] The image fusion described above only considered the case of fusing two frames. Below is a brief explanation of how to handle situations where more frames need to be fused, using three frames I1, I2, and I3 as an example.

[0109] Fusion Method 1: First, register I2 to I1, then fuse I1 and I2 to obtain I12. At this point, I1 is the reference image, I2 is the non-reference image, and I12 is the fused image. Next, register I3 to I12, then fuse I12 and I3 to obtain I123. At this point, I12 is the reference image, I3 is the non-reference image, and I123 is the fused image. Fusion Method 1 is applied twice. Figure 1 The fusion method.

[0110] Fusion Method 2: First, register I3 to I2, then fuse I2 and I3 to obtain I23. At this point, I2 is the reference image, I3 is the non-reference image, and I23 is the fused image. Next, register I23 to I1, then fuse I1 and I23 to obtain I123. At this point, I1 is the reference image, I23 is the non-reference image, and I123 is the fused image. Fusion Method 2 is applied twice. Figure 1 The fusion method.

[0111] Fusion Method 3: First, register I2 to I1, and simultaneously register I3 to I1 as well. Then, fuse I1, I2, and I3 simultaneously to obtain I123. At this point, I1 is the reference image, I2 and I3 are non-reference images, and I123 is the fused image. Fusion Method 3 is applied only once. Figure 1 The fusion method.

[0112] Of course, there may be other ways of fusion, which can be deduced by analogy, but will not be explained in detail here.

[0113] Figure 6 A functional block diagram of an image fusion apparatus 200 provided in an embodiment of this application is shown. (Refer to...) Figure 6 The image fusion device 200 includes:

[0114] The image to be fused acquisition module 210 is used to acquire the image to be fused, which includes a reference image and a non-reference image;

[0115] The displacement image acquisition module 220 is used to calculate the first registration displacement of the pixel to be fused in the non-reference image relative to the corresponding pixel in the reference image, and obtain the displacement image;

[0116] Image segmentation module 230 is used to perform semantic segmentation on the reference image to obtain a segmentation mask, wherein the segmentation mask includes the position information of multiple segmentation regions;

[0117] The pixel value repair module 240 is used to determine abnormal pixel values ​​in the displacement image and repair the abnormal pixel values ​​using normal pixel values ​​within the segmented area of ​​the displacement image to obtain a repaired displacement image.

[0118] Image registration module 250 is used to register the non-reference image according to the repaired displacement image to obtain the registered non-reference image;

[0119] The image fusion module 260 is used to fuse the reference image and the registered non-reference image to obtain a fused image.

[0120] In one implementation of the image fusion device 200, the pixel value repair module 240 repairs abnormal pixel values ​​using normal pixel values ​​within a segmented region of the displacement image. This includes filtering the displacement image within the segmented region to repair abnormal pixel values ​​using normal pixel values ​​within the segmented region. The repair method includes calculating new pixel values ​​based on normal pixel values ​​within the filtering window and replacing abnormal pixel values ​​located at the center of the filtering window with the new pixel values.

[0121] In one implementation of the image fusion device 200, the pixel value repair module 240 filters the displacement image within the segmented region, including: filtering the displacement image multiple times within the segmented region until all abnormal pixel values ​​in the displacement image are repaired.

[0122] In one implementation of the image fusion apparatus 200, the pixel value repair module 240 determines abnormal pixel values ​​in the displacement image by: dividing the reference image and the non-reference image into regions respectively; calculating the homography matrix between each region in the non-reference image and the corresponding region in the reference image; and for each pixel to be fused in the non-reference image, performing the following operations: calculating the second registration displacement of the pixel relative to the corresponding pixel in the reference image using the homography matrix of the region to which the pixel belongs; determining whether the difference between the first registration displacement and the second registration displacement of the pixel exceeds a first threshold; and if it exceeds the first threshold, determining the first registration displacement of the pixel as an abnormal pixel value in the displacement image.

[0123] In one implementation of the image fusion device 200, the pixel value repair module 240 determines abnormal pixel values ​​in the displacement image by: statistically analyzing the pixel values ​​in the displacement image by region, and determining pixel values ​​with a frequency less than a second threshold as abnormal pixel values ​​based on the statistical results of each region.

[0124] In one implementation of the image fusion device 200, the displacement image acquisition module 220 calculates the first registration displacement of the pixel to be fused in the non-reference image relative to the corresponding pixel in the reference image to obtain a displacement image, including: determining the common part in the reference image and the non-reference image as the region to be fused; calculating the first registration displacement of the pixel in the non-reference image relative to the corresponding pixel in the reference image in the region to be fused to obtain the displacement image.

[0125] In one implementation of the image fusion device 200, the image acquisition module 210 acquires the image to be fused, including: acquiring the original image to be fused, the original image to be fused including an original reference image and an original non-reference image, the original reference image and the original non-reference image being images captured by cameras at different zoom ratios; normalizing the original reference image and the original non-reference image to the same zoom ratio to obtain the reference image and the non-reference image.

[0126] In one implementation of the image fusion device 200, the reference image is an image captured by a wide-angle camera and the non-reference image is an image captured by a telephoto camera; or, the reference image is an image captured by a color camera and the non-reference image is an image captured by a monochrome camera.

[0127] The image fusion apparatus 200 provided in this application embodiment has been described in the foregoing method embodiment in terms of its implementation principle and the resulting technical effects. For the sake of brevity, any parts not mentioned in the apparatus embodiment can be referred to the corresponding content in the method embodiment.

[0128] Figure 7 This illustration shows a possible structure of the electronic device 300 provided in an embodiment of this application. (Refer to...) Figure 7 The electronic device 300 includes a processor 310, a memory 320, and a communication interface 330. These components are interconnected and communicate with each other via a communication bus 340 and / or other forms of connection mechanism (not shown).

[0129] The memory 320 includes one or more (only one is shown in the figure), which may be, but is not limited to, random access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), etc.

[0130] Processor 310 includes one or more (only one is shown in the figure), which can be an integrated circuit chip with signal processing capabilities. The processor 310 can be a general-purpose processor, including a Central Processing Unit (CPU), a Microcontroller Unit (MCU), a Network Processor (NP), or other conventional processors; it can also be a special-purpose processor, including a Neural-network Processing Unit (NPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. Furthermore, when there are multiple processors 310, some can be general-purpose processors and others can be special-purpose processors.

[0131] Processor 310 and other possible components can access memory 320, read and / or write data therein. Furthermore, memory 320 can store one or more computer program instructions, which processor 310 can read and execute to implement the image fusion method provided in the embodiments of this application.

[0132] Communication interface 330 includes one or more (only one is shown in the figure) that can be used to communicate directly or indirectly with other devices to exchange data. Communication interface 330 may include interfaces for wired and / or wireless communication.

[0133] Understandable. Figure 7 The structure shown is for illustrative purposes only; the electronic device 300 may also include components that are more advanced than those shown. Figure 7 The more or fewer components shown, or having the same Figure 7 The different configurations shown. Figure 7 The components shown can be implemented using hardware, software, or a combination thereof. Electronic device 300 may be a physical device, such as a mobile phone, camera, camcorder, wearable device, tablet computer, PC, laptop computer, server, etc., or it may be a virtual device, such as a virtual machine, virtualization container, etc. Furthermore, electronic device 300 is not limited to a single device; it can also be a combination of multiple devices or a cluster of a large number of devices.

[0134] This application also provides a computer-readable storage medium storing computer program instructions. These instructions are read and executed by a computer's processor to perform the image fusion method provided in this application. For example, the computer-readable storage medium can be implemented as follows: Figure 7 The memory 320 in the electronic device 300.

[0135] The above description is merely an embodiment of this application and is not intended to limit the scope of protection of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.

Claims

1. An image fusion method, characterized by, include: Obtain the images to be fused, which include a reference image and a non-reference image; Calculate the first registration displacement of the pixel to be fused in the non-reference image relative to the corresponding pixel in the reference image to obtain the displacement image; Semantic segmentation is performed on the reference image to obtain a segmentation mask, the segmentation mask including the positional information of multiple segmentation regions; Identify abnormal pixel values ​​in the displacement image, and within the segmented region of the displacement image, repair the abnormal pixel values ​​using normal pixel values ​​to obtain a repaired displacement image. The non-reference image is registered based on the repaired displacement image to obtain the registered non-reference image. The reference image and the registered non-reference image are fused to obtain a fused image; The step of determining abnormal pixel values ​​in the displacement image includes: The reference image and the non-reference image are divided into regions respectively; Calculate the homography matrix between each region in the non-reference image and the corresponding region in the reference image; For each pixel to be fused in the non-reference image, perform the following operations: The second registration displacement of the pixel relative to the corresponding pixel in the reference image is calculated using the homography matrix of the region to which the pixel belongs; Determine whether the difference between the first registration shift and the second registration shift of the pixel exceeds a first threshold. If the first threshold is exceeded, the first registration displacement of the pixel is determined as an abnormal pixel value in the displacement image; The step of determining abnormal pixel values ​​in the displacement image further includes: The pixel values ​​in the displacement image are statistically analyzed by region, and pixel values ​​with a frequency less than a second threshold are identified as abnormal pixel values ​​based on the statistical results of each region.

2. The image fusion method according to claim 1, characterized in that, Within the segmented region of the displacement image, abnormal pixel values ​​are repaired using normal pixel values, including: The displacement image is filtered within the segmented region to repair abnormal pixel values ​​using normal pixel values ​​within the segmented region. The repair method includes: calculating new pixel values ​​based on normal pixel values ​​within the filtering window, and replacing abnormal pixel values ​​located at the center of the filtering window with the new pixel values.

3. The image fusion method according to claim 2, characterized in that, The step of filtering the displacement image within the segmented region includes: The displacement image is filtered multiple times within the segmented region until all abnormal pixel values ​​in the displacement image are repaired.

4. The image fusion method according to any one of claims 1-3, characterized in that, The step of calculating the first registration displacement of the pixel to be fused in the non-reference image relative to the corresponding pixel in the reference image to obtain a displacement image includes: The common portion between the reference image and the non-reference image is identified as the region to be fused. In the region to be fused, the first registration displacement of the pixels in the non-reference image relative to the corresponding pixels in the reference image is calculated to obtain the displacement image.

5. The image fusion method according to claim 4, characterized in that, The acquisition of the image to be fused includes: Acquire the original image to be fused, which includes an original reference image and an original non-reference image. The original reference image and the original non-reference image are images captured by cameras at different zoom levels. The original reference image and the original non-reference image are normalized to the same zoom level to obtain the reference image and the non-reference image.

6. The image fusion method according to any one of claims 1-3, characterized in that, The reference image is an image captured by a wide-angle camera, and the non-reference image is an image captured by a telephoto camera; or... The reference image is an image captured by a color camera, and the non-reference image is an image captured by a monochrome camera.

7. An image fusion apparatus, characterized in that, include: The image to be fused acquisition module is used to acquire the image to be fused, which includes a reference image and a non-reference image; The displacement image acquisition module is used to calculate the first registration displacement of the pixel to be fused in the non-reference image relative to the corresponding pixel in the reference image, and obtain the displacement image; The image segmentation module is used to perform semantic segmentation on the reference image to obtain a segmentation mask, wherein the segmentation mask includes the position information of multiple segmentation regions; The pixel value repair module is used to identify abnormal pixel values ​​in the displacement image and repair the abnormal pixel values ​​using normal pixel values ​​within the segmented area of ​​the displacement image to obtain a repaired displacement image. An image registration module is used to register the non-reference image based on the repaired displacement image to obtain a registered non-reference image; The image fusion module is used to fuse the reference image and the registered non-reference image to obtain a fused image; Specifically, the pixel value repair module is used to divide the reference image and the non-reference image into regions respectively; calculate the homography matrix between each region in the non-reference image and the corresponding region in the reference image; and for each pixel to be fused in the non-reference image, perform the following operations: calculate the second registration displacement of the pixel relative to the corresponding pixel in the reference image using the homography matrix of the region to which the pixel belongs; determine whether the difference between the first registration displacement and the second registration displacement of the pixel exceeds a first threshold; if it exceeds the first threshold, then determine the first registration displacement of the pixel as an abnormal pixel value in the displacement image; Specifically, the pixel value repair module is used to statistically analyze the pixel values ​​in the displacement image by region, and to determine the pixel values ​​with a frequency less than a second threshold as abnormal pixel values ​​based on the statistical results of each region.

8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer program instructions, which, when read and executed by a processor, perform the method as described in any one of claims 1-6.

9. An electronic device, characterized in that, The method includes a memory and a processor, wherein the memory stores computer program instructions, which are read and executed by the processor to perform the method according to any one of claims 1-6.