Method for camera control, image signal processor and device

By acquiring image frame streams in the camera system and selecting reference frames using SLAM data and depth information, the AWB, AEC, and TM algorithms were optimized, solving the problem of inconsistent color and brightness reproduction between different frames and improving image and video quality.

CN115714919BActive Publication Date: 2026-06-30BEIJING XIAOMI MOBILE SOFTWARE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING XIAOMI MOBILE SOFTWARE CO LTD
Filing Date
2022-07-25
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing camera systems struggle to guarantee consistent color and brightness reproduction across different frames in their automatic white balance (AWB), automatic exposure control (AEC), and tone mapping (TM) algorithms, especially when the scene has a single or limited color palette, resulting in poor image and video quality.

Method used

Image frame streams are acquired by an image sensor, scene information of the target frame is identified, a reference frame is selected from the image frame stream, the acquisition parameters of the reference frame are used to determine the final image, and AWB, AEC and TM algorithms are optimized by combining SLAM data and depth information to improve consistency and accuracy.

Benefits of technology

It improves the consistency and accuracy of color and brightness reproduction in images and videos, reduces the need for high-quality training data, and lowers computational and storage costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115714919B_ABST
    Figure CN115714919B_ABST
Patent Text Reader

Abstract

A method and apparatus for camera control to acquire images. The method includes: acquiring an image frame stream comprising at least one frame by an image sensor; acquiring a target frame by the image sensor; determining scene information in the target frame; selecting a reference frame from the image frame stream by identifying the scene information of the target frame in the reference frame; determining at least one acquisition parameter of the reference frame; and acquiring a final image from the target frame using the acquisition parameter.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to electronic devices and methods for controlling such electronic devices. More particularly, this invention relates to a method for camera control to acquire images and an image signal processor (ISP) for implementing the method. Furthermore, this invention relates to apparatus for implementing this method. Background Technology

[0002] In current camera systems, framing certain scenes is challenging, and the algorithms used for implementing Automatic White Balance (AWB), Automatic Exposure Control (AEC), and Tone Mapping (TM) can produce unsatisfactory results. In particular, if only one color or a limited number of different colors are visible in a frame, AWB may fail to achieve accurate lighting estimation, and AEC / TM may fail to accurately estimate the true brightness of objects. Therefore, color and brightness reproduction may be inconsistent between different frames of the same scene, resulting in poor image and video quality and a subpar user experience.

[0003] The problem of different colors and / or brightness reproductions of the same scene in different shots remains in all digital camera devices. The most common handling of temporal stability still relies on direct temporal filtering of the acquisition parameters of the AWB / AEC / TM algorithm, such as by using the trimmed mean or other similar filters that yield multi-frame algorithm results. This ensures a smooth transition between the acquisition parameters of subsequent frames, but cannot guarantee consistent reproduction of the same object under the same lighting conditions.

[0004] To address this issue, more information about the scene should be used, not just the current camera frame. One possibility is to perform temporal filtering on successive AWB and / or AEC / TM results. This would result in a smooth transition between each subsequent frame, but would not prevent convergence to incorrect parameters. Therefore, it does not solve the problem in question.

[0005] Therefore, the purpose of this invention is to improve the consistency and accuracy of color and brightness reproduction in images and videos in automatic white balance (AWB), automatic exposure control (AEC), and tone mapping (TM) algorithms. Summary of the Invention

[0006] This invention provides a method for camera control to acquire images, and also provides a camera device.

[0007] In a first aspect of the invention, a method for camera control to acquire images is provided. The method includes the following steps:

[0008] Acquire an image frame stream comprising at least one frame using an image sensor;

[0009] The target frame is acquired using an image sensor;

[0010] Determine the scene information of the target frame;

[0011] At least one reference frame is selected from the image frame stream by identifying scene information of the target frame in the reference frame;

[0012] Determine at least one acquisition parameter of the reference frame; and

[0013] The final image is determined from the target frame using at least one acquisition parameter.

[0014] Therefore, according to the present invention, an image frame stream comprising at least one frame and preferably multiple subsequent frames is acquired by the image sensor of a camera. In particular, the image frame stream can be used as a preview of the camera, or it can be part of a video stream.

[0015] Subsequently, the image sensor acquires the target frame, which can be selected through user interaction, such as pressing a trigger button to start recording video, acquire an image, the next image in the video stream, or a frame in the preview operation. Therefore, the target frame is the raw data of the image the user wants to capture or display in the preview.

[0016] Subsequently, scene information for the target frame is determined. This scene information can relate to the entire target frame or any real-world object within it. Objects include shapes, surfaces, and structures that can be identified in the image frame stream, and may contain multiple complete objects and some partially visible objects, or it may contain only a portion of an object. Furthermore, scene information can be determined for a partial or complete target frame. Similarly, to identify scene information in a corresponding image frame of the image frame stream, scene information for a partial or complete image frame can be determined to identify matching scene information.

[0017] Next, at least one reference frame is selected from the image frame stream by identifying scene information of the target frame in the reference frames. Each frame in the image frame stream is checked to see if it at least partially matches the corresponding scene information of the target in the corresponding image frame. Therefore, consistent scene information of the image frames in the image frame stream is checked. Specifically, the target frame content can be compared with earlier frames as a whole using scene information to see how much of the current frame content was visible in earlier frames, without having to segment the target frame content into objects and compare them one by one. If scene information can be identified in one of the frames in the image frame stream, that frame in the image frame stream is selected as the reference frame. Preferably, this method continuously checks the image frames in the image frame stream to identify the corresponding scene information and select reference frames. Alternatively, only those image frames that may improve acquisition accuracy and consistency are checked.

[0018] At least one or more acquisition parameters are determined from a reference frame, and the final image is determined from a target frame using the determined acquisition parameters. These acquisition parameters may be related to automatic white balance (AWB), automatic exposure control (AEC), and / or tone mapping (TM) parameters.

[0019] Therefore, this invention uses acquisition parameters from image frames obtained before capturing the target frame to improve the consistency and accuracy of color and brightness reproduction in images and videos. Thus, this invention utilizes previously acquired image frames to obtain more information about the scene in which the camera is operating.

[0020] Preferably, the scene information may include localization information for image frames and target frames in the image frame stream, such as Simultaneous Localization and Mapping (SLAM) data. Therefore, by utilizing SLAM data, the camera can easily determine whether scene information matches through the overlap of SLAM data. For example, the presence of an object in the target frame, which also exists in an image frame of the image frame stream, can be determined using SLAM data. Therefore, reference frame selection can be performed based on the acquired SLAM data. SLAM data can be acquired for a portion or the entire target frame. Similarly, SLAM data can be acquired for each or only a portion of a corresponding image frame in a complete image frame. By using SLAM data, the cost of accumulating high-quality training data is reduced, eliminating the need for training any object recognition using large amounts of annotated ground truth data. Furthermore, by using SLAM data, the invention is not limited to recognizing specific and previously trained objects. In particular, by using SLAM data, the method is independent of the corresponding objects, which can be any real-world object, specific structure, surface, or shape that is localized and mapped by the SLAM process. In addition, most modern terminals, such as smartphones and tablets, have implemented SLAM modules, which enable the information provided by the SLAM modules to be used for the identification of target frames in this invention.

[0021] Preferably, the scene information includes depth or odometry information of image frames and / or target frames. Alternatively or additionally, the scene information includes the pose of the image sensor, i.e., the camera. Therefore, preferably, the camera includes one or more inertial motion units (IMUs), such as accelerometers, gyroscopes, etc., to enable the acquisition of the camera's pose. The depth information of the object can be provided by stereo camera measurements, LiDAR, etc. The pose and depth / odometry information can also be included in the SLAM data.

[0022] Preferably, selecting a reference frame from the image frame stream by identifying scene information of the target frame in the reference frame includes determining, through scene information, that at least partially overlaps between the image frame from the image frame stream and the target frame. Therefore, by matching the scene information of the target frame and the corresponding image frame, partial overlap of the scene content of the target frame and the image frame is determined to ensure that determining the final image using at least one acquisition parameter of the selected reference frame is appropriate. Thus, through at least partial overlap, when the scene information of the target frame matches the scene information of the image frame in the image frame stream, objects present and visible in the target frame are also at least partially present and visible in the corresponding image frame.

[0023] Preferably, the scene information includes the scene's coordinates and, more preferably, objects within the scene. Selecting a reference frame from the image stream by identifying the scene information of the target frame includes calculating the scene's coordinates and determining the overlap with the coordinates of the corresponding image frame in the image frame stream. Therefore, if there is sufficient overlap between the scene of the corresponding image frame and the target frame according to the calculated coordinates, that image frame can be selected as the reference frame. Wherein, if object coordinates are used, the object can be any real-world object, such as shape, structure, surface, etc. This object can also be several real-world objects or a portion thereof, or only one real-world object or a portion thereof. Wherein, preferably, SLAM data and / or depth information and / or the pose of the image sensor are used to calculate the coordinates of the scene or objects within the scene. Wherein, preferably, the coordinates are calculated in a world coordinate system so that comparisons can be made between frames, and also when the camera is moving or the camera's pose is changing.

[0024] Preferably, calculating the coordinates of the scene or objects within the scene includes:

[0025] Obtain the depth information d of pixels (u, v) in the corresponding image frame and / or target frame;

[0026] The preferred method for determining the coordinates (X) in the camera system is by the following formula. cam ,Y cam ,d,1)

[0027] X cam = (u×4-px)×d÷cx and

[0028] Y cam = (v×4+60-py)×d÷cy

[0029] Where (px,py) are the principal points of the image sensor, (cx,cy) are the focal lengths, and preferably cx = cy; and

[0030] The preferred method is to transform the coordinates to the world coordinate system using the following formula.

[0031]

[0032] Where (X,Y,Z,1) are the coordinates in the world coordinate system, and (R|t) is the pose of the image sensor.

[0033] Preferably, the coordinates of the provided target frame in the world coordinate system are then compared with the coordinates of each image frame in the image frame stream, which are also in the world coordinate system, to determine the partial overlap with the target frame.

[0034] Preferably, selecting a reference frame involves determining a confidence level for the corresponding frame with respect to the acquisition parameters, and selecting a reference frame if the confidence level is higher than a preset threshold. Thus, the confidence level provides a measure of whether at least one or more acquisition parameters of the determined corresponding image frame are suitable for use in determining the final image. An image frame in the image frame stream is selected as a reference image only when the confidence level is sufficiently high, i.e., higher than the preset threshold. Specifically, the confidence level of the corresponding image frame to be selected as a reference frame needs to be higher than the confidence level of the target frame to provide improved consistency and accuracy in the color and brightness reproduction of the image. Specifically, if no image frame with a confidence level higher than the preset threshold is found in the image frame stream, the acquisition parameters are determined from the target frame itself.

[0035] Preferably, the reference frame is selected based on the maximum overlap between the corresponding image frame and the target frame in the image frame stream, and the confidence level of the corresponding image frame in the image frame stream. Therefore, optimization of color and brightness consistency and accuracy can be achieved.

[0036] Preferably, the confidence value is determined by one or more of the following: color gamut for AWB, brightness gamut for AEC and / or TM, a hull of the 2D chromaticity for AWB, 1D brightness range for AEC and / or TM, or 3D color histogram for AWB and / or AEC and / or TM. If a coarse model of the scene in which the camera operates is created using SLAM data, the acquisition parameters of the target frame that results in a lower confidence level can be corrected using AWB / AEC / TM parameters from image frames with higher confidence levels, thereby improving the consistency and accuracy of color and brightness reproduction.

[0037] Preferably, the image frames from the image frame stream include low-resolution images having a resolution lower than that of the final image, particularly less than 640x480 pixels, more preferably less than 320x240 pixels, and even more preferably less than 64x48 pixels. Therefore, image frames from the image frame stream can be easily stored and processed without increasing the computational demands on the device.

[0038] Preferably, image frames from the image frame stream are stored in the camera's memory for later use in determining acquisition parameters. Specifically, if image frames from the image frame stream provide low-resolution images, they can be easily stored without consuming excessive memory. In particular, only image frames from the image frame stream with a confidence level above a preset threshold can be stored. Therefore, storing only those image frames that can be used as reference images, while ignoring other image frames in the image frame stream, further reduces memory requirements.

[0039] Preferably, the camera pose is stored together with the image frames in the image frame stream. Therefore, the coordinates of objects in the corresponding image frame can be calculated from the pose. Further information, such as focal length, principal point, and depth information, can be stored together with the image frames in the image frame stream.

[0040] Preferably, the method further includes:

[0041] Detect illumination changes between the reference frame and the target frame, and adapt the reference frame to the changing illumination before determining the acquisition parameters.

[0042] Preferably, more than one reference frame is selected, wherein at least one acquisition parameter is determined from the more than one reference frame, for example, by averaging. In particular, a weighted average can be used, wherein the acquisition parameters of the more than one reference frame are weighted by their respective confidence values.

[0043] Preferably, the steps of this method are repeated iteratively for each new target frame of the video stream or preview image stream.

[0044] In this invention, an image signal processor (ISP) is provided. The ISP is configured to perform the steps of the method described above. Preferably, the ISP can be connected to an image sensor to receive image data or image frames. Further, the ISP can be connected to the SLAM module of a device implementing the ISP, such as a terminal or the like.

[0045] In one aspect of the invention, a camera device is provided, preferably implemented in a mobile terminal. The camera device includes an image sensor, a processor, and a memory storing instructions that, when executed by the processor, perform the steps of the method described above.

[0046] Preferably, the camera device includes a SLAM module for acquiring SLAM data to identify reference frames. Attached Figure Description

[0047] The invention will be described in more detail with reference to the accompanying drawings.

[0048] The attached diagram shows:

[0049] Figure 1 This is a flowchart of the method according to the present invention.

[0050] Figure 2 These are example images of the steps of the method according to the present invention.

[0051] Figure 3 The steps of the method according to the present invention are shown in detail.

[0052] Figure 4 This is a schematic diagram illustrating another embodiment of the present invention and...

[0053] Figure 5 This is a camera device according to the present invention. Detailed Implementation

[0054] This invention relates to camera control to improve the consistency and accuracy of color and brightness reproduction in images and videos, particularly during automatic white balance (AWB), automatic exposure control (AEC), and tone mapping (TM) algorithms.

[0055] Preferably, the method according to the invention is implemented in a camera module of a terminal, such as a smartphone or tablet computer. Preferably, the camera module is connected to a processing module for performing the steps of the invention. The processing module may include an image signal processor (ISP), etc. However, the invention is not limited to a particular terminal or any specific embodiment.

[0056] See Figure 1 This illustrates a method for camera control to acquire images.

[0057] In step S01, an image frame stream is acquired by an image sensor, wherein the image frame stream includes at least one frame.

[0058] Therefore, an image frame stream comprising at least one frame and preferably multiple subsequent frames is acquired by the camera's image sensor. Specifically, the image frame stream can be used as a preview of the camera or as part of a captured video stream. In particular, the image frames in the image frame stream have a low resolution, preferably below 640x480 pixels, more preferably below 320x240 pixels, and even more preferably below 64x48 pixels. Alternatively, the image frames are 3A statistics rather than the original raw frames to reduce memory consumption, for example, a 2D RGB grid representing a linearized original camera RGB image frame.

[0059] In step S02, the target frame is acquired by the image sensor.

[0060] The selection of the target frame can be performed through user interaction, such as pressing a trigger button to start recording video or acquiring an image. Alternatively, the target frame can be determined by the next frame of the video stream to be captured or the next frame of the preview. Therefore, the target frame is the raw data of the image the user wants to capture.

[0061] In step S03, the scene information of the target frame is preferably determined by the processing module or the ISP.

[0062] Scene information includes any information about the scene of the target frame. Scene information can be determined for a portion or the entire target frame. Similarly, to identify scene information in a corresponding image frame of an image frame stream, scene information for a portion or the entire image frame can be determined to identify a match of scene information.

[0063] In step S04, preferably, the processing module or ISP selects the reference frame from the image frame stream by recognizing the scene information of the target frame in the reference frame.

[0064] Each frame in the image frame stream is examined to determine if there is at least partial overlap between the scene information of the corresponding image frame and the target frame, and whether the scene content of the target frame is partially or completely present in the corresponding image frame. Alternatively, only those image frames that may improve acquisition accuracy and consistency are examined. If scene information can be identified in one frame of the image frame stream, that frame in the image frame stream is selected as a reference frame. Preferably, this method continuously examines image frames in the image frame stream to identify the corresponding scene information and select reference frames. Thus, the overlap between the target frame and the corresponding image frame in the image frame stream is determined by the scene information to identify possible reference frames to be selected when sufficient overlap is determined.

[0065] In step S05, at least one acquisition parameter of the reference frame is preferably determined by the processing module or the ISP. The at least one acquisition parameter may be an automatic white balance (AWB), automatic exposure control (AEC), and / or tone mapping (TM) parameter determined from the reference frame.

[0066] Preferably, more than one reference frame is selected, wherein at least one acquisition parameter is determined, for example, by averaging from more than one reference frame. In particular, all reference frames with matching scores above a certain level can be selected. Specifically, a weighted average can be used, wherein the acquisition parameters of more than one reference frame are weighted by their respective confidence values. Thus, more information from previous frames can be used to determine the acquisition parameters of the target frame, thereby providing more reliable results.

[0067] In step S06, the final image is preferably determined from the target frame by the processing module or ISP through at least one acquisition parameter.

[0068] The target frame contains the raw data, and once the corresponding acquisition parameters are determined, the raw data of the target stream is determined by using one or more acquisition parameters from the reference frame.

[0069] Therefore, through this invention, the acquisition parameters of image frames obtained before capturing the target frame are used to increase the consistency and accuracy of color and brightness reproduction in images and videos. Thus, through this invention, more information about the scene in which the camera operates is used from previously acquired image frames.

[0070] In step S04, the positioning information, and more preferably, the SLAM data, can be used as scene information to create a coarse model of the scene in which the camera operates, to determine a reference frame that includes scene content at least partially identical to the target frame. Then, AWB / AEC / TM parameters from frames with higher confidence levels can be used to correct parameters that result in target frames with lower confidence levels, thereby improving the consistency and accuracy of color and brightness reproduction. Therefore, by utilizing SLAM data, if there is at least partial overlap in the scene content between the corresponding image frame and the target frame, the camera can easily determine whether the scene information of the target frame is also present in one of the image frames in the image frame stream. Therefore, based on the acquired SLAM data, the selection of a reference frame can be performed. In particular, by using SLAM data as scene information, this method is independent of the corresponding object to be identified and can use any real-world object, such as a structure, surface, or shape located and mapped by the SLAM process, to determine the overlap between the target frame and the corresponding image frame. Furthermore, most modern terminals, such as smartphones and tablets, have implemented SLAM modules, allowing the information provided by the SLAM module to be used for target frame identification in this invention.

[0071] This method can be implemented during the iteration process and repeated for each new target frame, whether it is a frame in the video stream or a frame in the preview, thereby continuously improving image reproduction.

[0072] refer to Figure 2 , Figure 2 The steps for obtaining the final image are shown. Among them, Figure 2 This relates to the implementation of the AWB algorithm. However, this method can also be implemented alternatively or simultaneously in the AEC or TM algorithms described above.

[0073] An initial image is acquired from image A. An automatic white balance (AWB) algorithm is used to determine the acquisition parameters related to AWB for the initial image, and these parameters are then applied to image B to achieve a correctly adjusted image. Simultaneously, a SLAM algorithm is used to perform content localization and mapping in image B, and for the scene in the corresponding image frame, point clouds are determined as scene information. This includes... Figure 2 These steps are repeated for each image frame in the image frame stream from image A to image E.

[0074] Image C shows a closer view of the corresponding object in the scene by moving the camera closer to object 14 or by zooming in. Object 14 exists in image frames B and C, and is marked by point 14 in the point cloud. Similarly, other objects are detected by other points 10 in the point cloud.

[0075] Image D shows the same object 14 closer, thus reducing the color gamut of the image. Image E contains only object 14, and almost all color information is extracted directly from object 14 itself, resulting in a low color gamut being used as information to determine the corresponding AWB parameters for image E. As can be clearly seen in the comparison between images BD and E, and shown in detail in images F and G, the AWB algorithm may fail, leading to color errors in object 14, as shown in image F.

[0076] exist Figure 2 In image B, the image has a high color gamut, thus enabling a high confidence level for the acquisition parameters associated with the AWB parameters. Furthermore, the target frame shown in image E completely overlaps with the content of image B, as both depict object 14.

[0077] Therefore, using the method of the present invention, scene information including object 14 is subsequently identified in each of images D, C, and B in the reverse order of acquisition until a high confidence level is reached regarding the AWB parameters and overlap is still present in the scene content, i.e., an image showing object 14 is obtained. The image frame does not need to completely include object 14, but partial overlap of scene content between the target frame of image E and a possible reference frame may be sufficient to improve color reproduction. Furthermore, this method is not limited to a specific object, and any object that serves as scene content can be considered as scene information, such as surfaces, shapes, structures, etc. Although... Figure 2 As shown as example object 14, other objects or parts of objects are also possible. This comparison and identification of the overlap of scene information between image frames in multiple image frames BD and the target frame E is preferably performed by acquiring SLAM data as scene information for each of images B to E. Thus, the SLAM data of object 14 can be identified by the world coordinates of object 14 determined by the SLAM algorithm in other frames, in order to determine the overlap. Therefore, in Figure 2 In the example, image C is used as a reference frame, and the AWB parameters determined for image C are also used for the AWB of image E, resulting in image E having a corrected AWB and producing correct colors, thereby improving the color consistency and accuracy of object 14. The corrected AWB produces... Figure 2 The result shown in image G has the correct colors and is not affected by the reduced color information provided by image E itself.

[0078] Figure 3 The steps for determining the coordinates of a scene or objects within a scene for a target frame and the corresponding image frame are shown. Figure 3 World coordinate system 22 is shown. In the first step, when acquiring frame 20 of the image frame stream that can be used as a reference frame, the coordinates of object 14 in image frame 20 can be determined in camera coordinate system 26 of the camera in the first state / position represented by "cam1" using the acquired depth information or odometry information. The coordinates of object 14 in world coordinate system 22 can be determined using the pose (R1, t1) of camera "cam1" and the coordinates of object 14 in camera coordinate system 26 of "cam1". It is not necessary to have a world coordinate system 22. Figure 3 The image shown is a real-world object. Instead, any object, surface, shape, or structure can be used, and its coordinates can be determined to determine the overlap between the target frame and the corresponding image frame. Furthermore, the coordinates of multiple objects present in the scene, portions of multiple objects in the scene, or portions of only one object in the scene can be used to determine the overlap between the target frame and the corresponding image frame.

[0079] Similarly, for target frame 32, based on the depth information provided by the 3D point cloud 34 of the camera in the camera state represented by "cam2", the coordinates of object 14 in target frame 32 can be determined in the camera coordinate system 30 of "cam2". Using the pose (R2, t2) of camera "cam2" and the coordinates of object 14 in the camera coordinate system 30 of "cam2", the coordinates of object 14 in world coordinate system 22 can be determined. Therefore, the overlap between target frame 32 and frame 20 can be determined. Wherein, in Figure 3 In the example, the overlap is determined by a set of 3D points in the 3D point cloud 34 in the world coordinate system, which are visible in both the target and reference frames, and do not distinguish which object the points belong to. The 3D point cloud can be determined from depth information, camera position and / or camera orientation information (camera pose), as illustrated in more detail below.

[0080] Alternatively, for target frame 32 "cam2", only the coordinates of object 14 can be determined in the world coordinate system. The 3D point cloud 34 of target frame 32 is available in the world coordinate system. This 3D point cloud 34 is constructed using depth information / map, camera position, and / or camera pose from target frame 32. For image frame 20, the distance between the camera of camera state "cam1" and those 3D points is determined based on the camera pose and / or camera position in image frame 20 to determine which region of image frame 20 covers those 3D points in the 3D point cloud 34. Therefore, depth information for image frame 20 may not be available, and the overlap between the scene or object referencing frame 32 and image frame 20 can be determined without calculating the coordinates of the entire image frame 20 in the world coordinate system.

[0081] Specifically, the coordinates of each pixel in the target frame can be transformed into world coordinate system 22. Alternatively, the coordinates of only certain points in the target frame can be determined. Similarly, for a corresponding image frame from the image frame stream, either for each pixel, the coordinates in the world coordinate system are determined, or alternatively, for a selection of pixels in the corresponding image frame, the coordinates are determined and transformed into world coordinate system 22 in order to identify overlap between the target frame or objects in the target frame and the corresponding image frame.

[0082] Since the SLAM data acquired for image frames includes at least depth information, i.e., odometry, the coordinates of the scene or object 14 in the target frame 32 can be transformed into world coordinate system 22 in the image frame stream. These coordinates can then be compared with the world coordinates of the scene or object 14 in the reference frame 20 to determine whether object 14 exists in both the target frame 32 and the reference frame 20. A frame is considered a reference frame only if there is overlap, i.e., object 14 is at least partially visible in the corresponding image frame. The acquisition parameters of the determined reference frame are used to generate the final image. Specifically, for each frame, it is checked whether the scene at least partially overlaps with an earlier frame. If so, it is checked whether the earlier frame has a higher confidence level for the available acquisition parameters (for AWB, AEC, and TM, respectively).

[0083] refer to Figure 4 The system comprises three parts. The first part involves running SLAM 48 on the device using SLAM input data 46 from images, IMU, and depth data to perform camera pose estimation and scene modeling 50, acquiring depth maps or depth information. During this process, image frame sequences are captured and stored 40. The stored frames can also be low-resolution 3A statistics instead of the original raw frames to reduce memory consumption, such as a 2D RGB grid representing a linearized original camera RGB image. Furthermore, each frame stores the corresponding camera pose (4×4 matrix), as well as other image metadata, such as the camera's focal length (cx, cy), principal point (px, py), and uncorrected algorithm parameters 42, such as AWB gains. Depth data or odometry data is also acquired simultaneously.

[0084] For each frame, calculate an algorithm confidence value of 44. For example, the color gamut, convex hull of 2D chroma, or 3D color histogram can be used as a confidence metric for AWB / AEC / TM, because more visible colors within the FOV generally make it easier for the scene to estimate the correct brightness of an object relative to other objects in the scene for AWB, and also easier for AEC and TM. The convex hull should be calculated from image data in a device-independent color space to allow the same threshold to be used for high and low confidence across all devices. Higher confidence frames serve as potential reference frames that can be used to correct lower confidence frames.

[0085] Make a decision on whether the corresponding image frame has a high confidence level. If the image frame has a high confidence level, store the image frame for later use as a reference frame for video streaming, previews, or pictures. For the final image of the high-confidence frame, generate the final image using uncorrected AWB / AEC / TM parameters.

[0086] If the image frame has a low confidence level for the AWB / AEC / TM parameters during the decision process, the system will retrieve depth data and construct a depth map or 3D point cloud as scene information. To construct the 3D point cloud, each pixel (u,v) in the depth map first needs to be transformed into the camera coordinate system using the projection camera's intrinsic matrix information, as shown below.

[0087] X cam = (u×4-px)×d÷cx

[0088] Y cam = (v×4+60-py)×d÷cy

[0089] Where d is the actual depth value in the depth map. Then, the 3D point can be obtained using the following equation:

[0090]

[0091] Where (R|t) is the estimated camera pose.

[0092] The next step is to verify, from data 62 of all acquired potential reference frames (or any high-confidence frames identified as belonging to the same physical space in which the camera is currently operating), whether the content of target frame i has been displayed in the most recent potential reference frame 60. The 3D points of the previously determined target frame are projected back to potential reference frame j by reversing the above steps, replacing (R|t) with the camera pose of the potential reference frame. Frame j is selected as the reference frame based on maximizing the proportion of low-confidence frame i visible in reference frame j (c_common_area(i,j)) and maximizing the confidence level of reference frame j (c_confidence(j)). According to one embodiment of the invention, the maximum value is the product c_common_area(i,j)*c_confidence(j), but other implementations are also possible.

[0093] Once a reference frame j is selected, the system moves to the third part. AWB is used here as an example algorithm. Automatic White Balance (AWB) is a camera control algorithm that estimates the chromaticity of the illumination and calculates the white balance (WB) RGB gain. Regardless of the main illumination, the white balance (WB) RGB gain enables the correct and consistent reproduction of object colors, thus achieving color constancy. For example, white objects are reproduced as white regardless of the color of the illumination (if color adaptation processing is excluded). The effect of WB on the RGB pixels of an image can be represented as follows:

[0094] x′=C·G·x,

[0095] Where x is a 3x1 vector corresponding to the linearized raw camera RGB values, G is a diagonal 3x3 WB RGB gain matrix (the diagonal values ​​are WB RGB gains), and C is a 3x3 color space transformation matrix from the linearized raw camera RGB to device-independent linear RGB.

[0096] The illumination change between frames i and j is detected by comparing the linearized raw pixel RGB averages common_area_avg_rgb(i) and common_area_avg_rgb(j) of the same object surface that is visible in both frames and has been normalized to eliminate the effects of any exposure differences (both are 3x1 RGB vectors). Figure 3 Each point in the 3D point cloud 34 shown has a corresponding RGB value in both the target and reference frames. These are points from which "common_area_avg_rgb" is calculated for each frame. If the Euclidean distance or other difference metric diff(common_area_avg_rgb(i), common_area_avg_rgb(j)) is greater than a certain threshold common_area_similarity_thr, then a change in illumination is considered detected; otherwise, the illumination is considered unchanged.

[0097] Make a decision on whether or not the lighting change can be detected. 66.

[0098] 1. If no illumination change is detected between the target frame i and the higher confidence reference frame j, the WB gain of frame j can be used for frame i 68, and regular time filtering can be applied only on top to ensure smooth parameter changes between frames.

[0099] 2. If a change in illumination is detected, the WB RGB gain of the higher confidence reference frame j needs to be corrected according to the illumination change before being applied to the target frame i. Before being applied to frame i, the correction factor (3x1 vector) correction_factor = common_area_avg_rgb(j) / common_area_avg_rgb(i) is used as a multiplier for the WB RGB gain of frame j.

[0100] The description of AWB here can also be applied to AEC or TM. The corrected AWB / AEC / TM parameters determined for the corresponding reference frame j are applied to the target frame to achieve high color accuracy and consistency.

[0101] refer to Figure 5 , Figure 5A camera device 100 implemented in a terminal such as a smartphone or tablet is illustrated. The camera device 100 includes a processor 102 and a memory 104. The memory 104 stores instructions that, when executed by the processor 102, perform the steps of the method described above. The camera device 100 may further include or be connected to an image sensor to acquire image data for use in the method of the present invention. Furthermore, the camera device may include or be connected to a SLAM module. The camera device 100 may have a separate SLAM module, or the SLAM module may be implemented in the terminal device using the camera device 100. Figure 5 In the illustration, the camera device 100, together with the image sensor 106 and the SLAM module 108, is shown as an integrated component of the terminal.

[0102] Therefore, by using SLAM data / depth information provided by the SLAM module of the terminal or camera, more information about the corresponding scene can be used to identify scene information in different frames, thereby improving the consistency and accuracy of color reproduction by using the acquisition parameters of frames with a higher confidence level.

Claims

1. A method for camera control to acquire images includes: Acquire an image frame stream including at least one frame using an image sensor (S01); The target frame is acquired using the image sensor (S02); Determine the scene information of the target frame (S03); The scene information is used to determine the image frames that at least partially overlap with the target frame from the image frame stream. Based on the maximum overlap between the image frame and the target frame and the confidence value of the image frame, at least one reference frame is selected from the image frames (S04). The confidence value of the image frame corresponding to the selected at least one reference frame is higher than a preset threshold. Determine at least one acquisition parameter of the reference frame (S05); wherein the at least one acquisition parameter is related to automatic white balance, automatic exposure control, and / or tone mapping parameters; and The target frame is processed using at least one acquisition parameter to determine the final image (S06). Wherein: the scene information includes synchronous localization and mapping (SLAM) data of image frames and target frames, and the SLAM data includes at least localization information.

2. The method of claim 1, wherein, The scene information includes the depth information of the image frame and / or the target frame and / or the pose of the image sensor.

3. The method of claim 1, wherein the scene information includes the coordinates of the scene, and wherein selecting a reference frame from the image stream by recognizing the scene information of the target frame includes calculating the coordinates of the target frame and determining at least a partial overlap with the coordinates in the corresponding image frame of the image frame stream.

4. The method according to claim 3, wherein, The coordinates of the scene are calculated as follows: Get the depth information d of the pixels in frame (u,v,0); The coordinates (X) in the camera system are determined by the following formula. cam ,Y cam ,d,1) as well as Where (px, py) are the principal points of the image sensor, and (cx, cy) are the focal lengths; and The coordinates are transformed to the world coordinate system using the following formula. Where (X,Y,Z,1) are coordinates in the world coordinate system, and (R|t) is the pose of the image sensor.

5. The method of claim 4, further comprising comparing the world coordinates of an object in the target frame with each image frame from the stream to determine the at least partial overlap.

6. The method according to claim 1, wherein, The confidence value is provided by one or more of the following: color gamut, luminance gamut, convex hull of 2D chromaticity, 1D luminance range, or 3D color histogram.

7. The method of claim 1, wherein the image frames from the image frame stream comprise low-resolution images with a resolution lower than the 3A statistics of the final image or the original image frame.

8. The method according to claim 1, wherein, The image frames in the image frame stream that have a confidence value higher than a preset threshold are stored.

9. The method according to claim 8, wherein, The camera pose is stored together with the image frames stored in the image frame stream.

10. The method according to claim 1, wherein, The method further includes: The illumination change between the reference frame and the target frame is detected, and the reference frame is adapted to the changing illumination before the acquisition parameters are determined.

11. The method according to any one of claims 1 to 10, wherein, Repeat the steps of the method for each new target frame of the video stream or preview image stream.

12. An image signal processor (ISP) configured to perform the steps of the method according to any one of claims 1 to 11.

13. A camera device comprising a processor and a memory storage of stored instructions, said instructions, when executed by the processor, performing the steps of the method according to any one of claims 1 to 11.