Binocular image correction method, apparatus and device

By performing spatial and temporal image merging correction on binocular camera images in augmented reality devices, the problem of spatiotemporal inconsistency in binocular images is solved, reducing user dizziness and ghosting, and improving visual comfort and immersion.

CN122243830APending Publication Date: 2026-06-19VIVO MOBILE COMM CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
VIVO MOBILE COMM CO LTD
Filing Date
2026-03-10
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In augmented reality devices, the spatiotemporal inconsistency between the left and right binocular images captured by the binocular video perspective camera can cause discomfort and dizziness in users.

Method used

Spatial correction is achieved by mapping the original images captured by the binocular camera to the corresponding virtual camera pixel coordinate system to determine the positional deviation, and then adjusting the camera exposure start time accordingly to achieve spatial and temporal image reconciliation correction.

🎯Benefits of technology

While maintaining the system's real-time performance, it accurately compensates for spatial misalignment and temporal asynchrony caused by camera assembly tolerances and rolling shutter characteristics, significantly reducing image discrepancies and ghosting, and improving users' visual comfort and immersive experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243830A_ABST
    Figure CN122243830A_ABST
Patent Text Reader

Abstract

This application discloses a binocular image merging correction method, apparatus, and device, belonging to the field of mixed reality display technology. The method includes: mapping a first original image captured by the left camera and a second original image captured by the right camera in a binocular camera system to the pixel coordinate systems of the corresponding left and right virtual cameras, respectively, to obtain a spatially corrected first virtual camera image and a second virtual camera image; wherein the left and right virtual cameras have a predetermined ideal relative pose; determining the positional deviation between the first and second virtual camera images in the image row direction; determining the time delay offset between the exposure start times of the left and right cameras based on the positional deviation; and adjusting the exposure start time of at least one of the left and right cameras based on the time delay offset.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of mixed reality display technology, specifically relating to a binocular fusion correction method, device, and equipment. Background Technology

[0002] In augmented reality (MR) devices, binocular video see-through (VST) cameras are typically used to capture images of the external environment, which are then composited and displayed on the screen to achieve a blending effect between the virtual and real worlds.

[0003] However, during use, in some cases, the left and right binocular images captured by the VST camera may not be in a spatiotemporal consistency, which may cause discomfort or even dizziness in users.

[0004] Therefore, there is an urgent need for a solution that can improve the spatiotemporal consistency of binocular images. Summary of the Invention

[0005] The purpose of this application is to provide a binocular fusion correction method, apparatus, and device that can improve the problem of spatiotemporal inconsistency in binocular images.

[0006] In a first aspect, embodiments of this application provide a binocular fusion correction method, including: The first original image captured by the left camera and the second original image captured by the right camera in the binocular camera are mapped to the pixel coordinate system of the corresponding left virtual camera and right virtual camera, respectively, to obtain the spatially corrected first virtual camera image and second virtual camera image; wherein the left virtual camera and the right virtual camera have a predetermined ideal relative pose; Determine the positional deviation of the first virtual camera image and the second virtual camera image in the image row direction; Based on the positional deviation, determine the time delay offset between the exposure start times of the left and right cameras; Based on the time delay offset, adjust the exposure start time of at least one of the left and right cameras.

[0007] Secondly, embodiments of this application provide a binocular fusion correction device, comprising: The spatial correction module is used to map the first original image captured by the left camera and the second original image captured by the right camera in the binocular camera to the pixel coordinate system of the corresponding left virtual camera and right virtual camera, respectively, to obtain the spatially corrected first virtual camera image and second virtual camera image; wherein the left virtual camera and the right virtual camera have a predetermined ideal relative pose; The time correction module is used to determine the positional deviation of the first virtual camera image and the second virtual camera image in the image row direction; based on the positional deviation, determine the time delay offset of the exposure start time of the left camera and the right camera; and based on the time delay offset, adjust the exposure start time of at least one of the left camera and the right camera.

[0008] Thirdly, embodiments of this application provide an augmented reality device, including a binocular camera, an IMU, and a binocular fusion correction device as described in the second aspect; The binocular camera includes a left camera and a right camera.

[0009] Fourthly, embodiments of this application provide an electronic device including a processor and a memory, the memory storing programs or instructions executable on the processor, the programs or instructions, when executed by the processor, implementing the steps of the method described in the first aspect.

[0010] Fifthly, embodiments of this application provide a readable storage medium on which a program or instructions are stored, which, when executed by a processor, implement the steps of the method described in the first aspect.

[0011] In a sixth aspect, embodiments of this application provide a chip, the chip including a processor and a communication interface, the communication interface being coupled to the processor, the processor being used to run programs or instructions to implement the method as described in the first aspect.

[0012] In a seventh aspect, embodiments of this application provide a computer program product stored in a storage medium, which is executed by at least one processor to implement the method described in the first aspect.

[0013] In this embodiment, geometric correction is performed on the binocular images based on an ideal virtual camera, thereby achieving real-time correction of spatial image alignment. By analyzing the images, the exposure time difference in the row direction of the final displayed image is calculated, and the camera's exposure timing is adjusted in real time based on this difference to achieve a method for temporal image alignment synchronization correction. According to this embodiment, while maintaining system real-time performance, it is possible to accurately compensate for spatial misalignment and temporal asynchrony caused by camera assembly tolerances and rolling shutter characteristics. By achieving synchronization and consistency of binocular images in geometric space and exposure timing, the image alignment and ghosting phenomena in VST displays can be significantly reduced, thereby effectively alleviating user dizziness and improving visual comfort and immersive experience. Attached Figure Description

[0014] Figure 1 This is a diagram of 6DoF; Figure 2 This is a diagram of the field of view (FOV). Figure 3 This is a schematic diagram illustrating the 6DoF differences between binocular cameras; Figure 4 This is a schematic diagram illustrating the inconsistency between binocular images in a static scene; Figure 5 It is a comparison chart of the display screen's FOV and the camera's FOV; Figure 6 This is a schematic diagram illustrating the inconsistency between binocular images in a moving scene; Figure 7 This is a flowchart illustrating a binocular fusion correction method provided in some embodiments of this application; Figure 8 These are schematic diagrams illustrating exposure timing adjustments provided in some embodiments of this application; Figure 9 These are schematic diagrams of binocular fusion correction devices provided in some embodiments of this application; Figure 10 These are block diagrams illustrating electronic devices provided in some embodiments of this application; Figure 11 These are schematic diagrams of the structure of electronic devices provided in some embodiments of this application. Detailed Implementation

[0015] The technical solutions of the embodiments of this application will be clearly described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application are within the scope of protection of this application.

[0016] The terms "first," "second," etc., used in this application's specification are used to distinguish similar objects and not to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that embodiments of this application can be implemented in orders other than those illustrated or described herein, and the objects distinguished by "first," "second," etc., are generally of the same class, without limiting the number of objects; for example, a first object can be one or more. Furthermore, in the specification, "and / or" indicates at least one of the connected objects, and the character " / " generally indicates that the preceding and following objects have an "or" relationship.

[0017] Before providing a further detailed description of the embodiments of this application, the nouns and terms involved in the embodiments of this application will be explained. The terminology used in the implementation section of this application is only used to explain the specific embodiments of this application and is not intended to limit this application. The terminology involved in the embodiments of this application will be explained below.

[0018] MR (Mixed Reality): MR is a new type of human-computer interaction display technology that lies between Virtual Reality (VR) and Augmented Reality (AR). By combining real-world environmental information with computer-generated virtual content, MR allows users to simultaneously perceive and interact with real and virtual objects within the same visual scene.

[0019] Binocular Cameras: In MR devices, binocular cameras are a set of core sensors used to achieve environmental depth perception and 3D spatial understanding. These include a left camera and a right camera, which, by simulating human stereoscopic vision, capture images of the same scene from different angles using the left and right cameras. This transforms the captured 2D images into a 3D understanding of the physical world, providing an essential data foundation for the accurate, stable, and interactive fusion of virtual content with the real world.

[0020] Binocular image merging: refers to the process of using a pair of left and right images captured by a binocular camera to generate an image or data set that can accurately describe the distance (depth) of each pixel in the scene through calculation.

[0021] VST (Video See-Through) is an augmented reality (AR) display method that uses a camera to capture images of the real world and then merges them with virtual information to display them on a screen or head-mounted display.

[0022] Rolling shutter: This refers to an imaging method where the image sensor exposes and reads images sequentially in a line-scanning manner. Unlike the global shutter, which exposes all pixels at the same time, the rolling shutter introduces a time difference in the start time of exposure for pixels in different rows within the same frame. Due to this time difference, when photographing moving objects or devices whose posture changes, geometric distortion or temporal asynchrony may occur in the image, resulting in reduced temporal consistency of the imaging.

[0023] IMU (Inertial Measurement Unit): A sensing component used to measure the motion state information of a device. An IMU typically consists of an accelerometer and a gyroscope, used to measure the linear acceleration and angular velocity of the device along three axes, respectively. By integrating and filtering the IMU output data, information about the device's attitude changes, angular displacement, and motion trajectory can be obtained.

[0024] Time of Flight (ToF) is a depth sensing technology that measures the distance to a target object based on the propagation time of a light pulse or modulated light signal. ToF sensors calculate the distance between the object and the sensor by emitting infrared light or other modulated light signals into the scene and detecting the time difference between the emission, reflection, and return of the light signal.

[0025] 6DoF (Six Degrees of Freedom): refers to the six independent degrees of freedom that an object possesses in three-dimensional space, including three translational degrees of freedom and three rotational degrees of freedom. See also Figure 1 In 6DoF, the three translational degrees of freedom are linear displacements along the X, Y, and Z axes; the three rotational degrees of freedom are angular displacements about the X, Y, and Z axes, typically corresponding to roll, pitch, and yaw. In MR, AR, or VR devices, 6DoF is commonly used to describe the position and posture changes of the device or user's head in space, serving as a fundamental motion model for spatial localization, posture tracking, and immersive interaction.

[0026] FOV (Field of View) refers to the range of spatial angles that an imaging system can cover or perceive under specific operating conditions. The size of the FOV determines the field of view of an optical instrument; a larger FOV results in a wider field of view but a lower optical magnification. This angular range is typically defined by the optical axis of the imaging system, defining the boundaries of its imageable area in horizontal, vertical, or diagonal directions. See also Figure 2 FOV can be quantified from three directions: horizontal field of view (HFOV), vertical field of view (VFOV), and diagonal field of view (DFOV).

[0027] Depth information refers to the physical distance data of each object or point in the shooting scene from the camera lens.

[0028] Before providing a further detailed description of the embodiments of this application, related technologies will be introduced. As mentioned above, in related technologies, when using MR devices, inconsistencies may occur between the binocular images captured by the binocular cameras, which can easily cause discomfort or even dizziness in users. This inconsistency mainly involves two aspects: spatial image merging and temporal image merging.

[0029] Regarding spatial image merging: MR devices employ a display method where the left camera captures images and sends them to the left display screen, while the right camera captures images and sends them to the right display screen. Ideally, there should only be a horizontal translation difference between the binocular cameras to ensure the accuracy of binocular parallax. However, in actual assembly, due to limitations in process precision and assembly tolerances, a 6DoF difference often occurs between the binocular cameras, such as... Figure 3As shown, this results in inconsistent row direction positions of the same object in binocular images.

[0030] Differences in the 6DoF of binocular cameras can cause not only horizontal parallax, but also vertical and rotational parallax, when shooting static scenes. Figure 4 As shown, the aforementioned multi-dimensional inconsistencies can cause the corresponding positions of the same real object in the left and right eye images to be misaligned, resulting in ghosting or image misalignment during display. This type of image misalignment error can easily cause visual discomfort or even dizziness when users watch or observe moving scenes for extended periods, severely impacting the user experience and immersive effect of MR devices.

[0031] Regarding temporal image alignment: the field of view (FOV) of the upper screen area (the display area of ​​the screen) is usually smaller than the FOV of the camera, such as... Figure 5 As shown. Therefore, MR devices crop a portion of the image from the camera's field of view for display. Furthermore, spatial inconsistencies are typically eliminated during spatial image reassembly correction by cropping or distorting the original image. However, the center points of the cropped upper left and upper right screen areas will be located on different rows of the camera image. Since the exposure start time of each row in a rolling shutter camera differs, the center points of the upper left and upper right screen areas will be located on different rows of the camera image, resulting in a temporal consistency problem—that is, inconsistent exposure times in the final displayed images. When a user observes a moving scene, the difference in exposure times leads to inconsistencies in the binocular images, exacerbating the user's dizziness. For example... Figure 6 As shown, after spatial consistency correction, stationary objects in a moving scene only have horizontal parallax. However, due to the difference in exposure time, the position of the moving calibration plate in the left and right eye images will lead to inconsistency in binocular image merging.

[0032] Therefore, it is evident that current MR devices still suffer from the inability to simultaneously maintain consistency in spatial and temporal image merging during video see-through display. This issue not only leads to potential spatial and temporal deviations in the binocular images but also easily causes ghosting and dizziness in users, making it difficult to guarantee a good sense of immersion and a comfortable experience.

[0033] In view of this, in order to improve the problem of binocular image inconsistency in MR devices during Video See-Through display, this application provides a binocular image merging correction method, device, MR device, electronic device, storage medium, program product, and chip, particularly relating to a method for real-time correction of spatial and temporal image merging in Video See-Through (VST) display mode. This technical solution can be applied to head-mounted display devices, AR / MR glasses, and other augmented reality systems based on video see-through. It aims to achieve spatial and temporal consistency correction of binocular images, thereby effectively reducing ghosting and dizziness experienced by users, and enhancing immersion and comfort.

[0034] The core concept of this application is: to perform geometric correction on binocular images based on a virtual camera under ideal conditions, thereby achieving real-time correction of spatial image merging; by analyzing the image cropping, distortion and other transformation areas, the exposure time difference of the final image on the screen in the row direction is calculated, and the exposure timing of the camera is adjusted in real time based on the difference, so as to achieve a method of time-based image merging synchronization correction.

[0035] The binocular fusion correction method provided in this application will be described in detail below with reference to the accompanying drawings, through specific embodiments and application scenarios.

[0036] The binocular fusion correction method provided in this application can be applied to augmented reality scenarios. It should be noted that the binocular fusion correction method provided in this application can be executed by a binocular fusion correction device, which can be integrated into head-mounted display devices, AR / MR glasses, and other video-based augmented reality systems. This application uses a binocular fusion correction device to execute the binocular fusion correction method as an example to illustrate the binocular fusion correction method provided in this application.

[0037] See Figure 7 The binocular fusion correction method provided in this application includes the following steps 110-140, which will be described in detail below.

[0038] Step 110. Map the first original image captured by the left camera and the second original image captured by the right camera in the binocular camera to the pixel coordinate system of the corresponding left virtual camera and right virtual camera, respectively, to obtain the spatially corrected first virtual camera image and second virtual camera image.

[0039] In some embodiments of this application, images can be acquired using a binocular VST camera in an augmented reality (MR) device within an augmented reality (MR) scenario, resulting in a first original image captured by the left camera and a second original image captured by the right camera. Throughout the MR device's VST display mode, whenever the left and right cameras acquire a new original image frame, a binocular merging correction process is immediately performed on that frame to send the corrected image to the corresponding display screen for display. The display screen may include a first display screen and a second display screen. For ease of distinction, the display screen used to display the image acquired by the left camera is referred to as the left display screen, and the display screen used to display the image acquired by the right camera is referred to as the right display screen.

[0040] As mentioned above, in actual assembly, due to limitations in process precision and assembly tolerances, the left and right cameras of a binocular camera often produce issues such as… Figure 3 The 6DoF difference shown indicates that when the left and right cameras capture the same object, not only is there horizontal parallax, but also vertical and rotational parallax. These multi-dimensional inconsistencies cause the corresponding positions of the same real object in the original images captured by the two physical cameras to be inaccurately aligned, resulting in ghosting or image merging deviations during display. Therefore, to overcome the spatial deviations in the original left and right images, spatial image merging correction is necessary.

[0041] If spatial alignment correction is achieved by directly aligning one of the left and right cameras with the other, it becomes difficult to implement because neither camera serves as a correct reference point. Therefore, this application introduces a virtual camera (also called a rendering camera) corresponding to the display screen. The virtual camera is an ideal viewpoint defined by the system software, corresponding to the display screen. It provides the geometric target for spatial correction, defines the field of view of the final displayed image, and serves as the core reference system for the entire correction algorithm. By setting up the virtual camera, a stable, accurate, and easily manageable ideal target or gold standard can be established, providing a common and unambiguous correction target for the correction of the two physical cameras (left and right). This solves the complexity and instability problems caused by directly correcting the left and right physical cameras.

[0042] In some embodiments of this application, the virtual camera includes a left virtual camera corresponding to the left display screen and a right virtual camera corresponding to the right display screen. The left virtual camera corresponds to the ideal viewpoint of the left display screen, and the right virtual camera corresponds to the ideal viewpoint of the right display screen. The left and right virtual cameras have a predetermined ideal relative pose, that is, the two virtual cameras themselves only have horizontal parallax, and there is no vertical parallax or rotational parallax.

[0043] In some embodiments of this application, the core principle of spatial image re-alignment correction based on virtual cameras is coordinate reprojection in three-dimensional space. This mainly involves geometrically transforming the original images (containing errors) captured by two physical cameras (left and right) and reprojecting them onto the imaging plane of the corresponding ideal virtual camera. Based on this, after obtaining the first original image captured by the left camera and the second original image captured by the right camera, the first original image is mapped to the pixel coordinate system of the corresponding left virtual camera to obtain the spatially corrected first virtual camera image, and the second original image is mapped to the pixel coordinate system of the corresponding right virtual camera to obtain the spatially corrected second virtual camera image.

[0044] In some embodiments of this application, a first coordinate mapping model between the pixel coordinate system of the left camera and the pixel coordinate system of the left virtual camera and a second coordinate mapping model between the pixel coordinate system of the right camera and the pixel coordinate system of the right virtual camera can be preset. Based on this, in step 110 above, spatial image correction can be performed through the following steps 1101-1103.

[0045] Step 1101. Determine the first depth information and the second depth information respectively.

[0046] The first depth information is the depth information when the left camera captures the first original image, and the second depth information is the depth information when the right camera captures the second original image.

[0047] In some embodiments of this application, for each physical camera in the left and right cameras, the corresponding depth information can be calculated in real time using methods such as binocular parallax, LiDAR, and ToF sensors.

[0048] For example, consider using binocular disparity to determine the first and second depth information. When calculating the first depth information, the first original image is used as a reference image, and the corresponding point of each pixel in the first original image is found in the second original image. The horizontal pixel coordinate difference between corresponding points in the two original images is calculated, which is the disparity. Finally, a disparity map of the same size as the original image is generated, where each pixel value represents the disparity magnitude of that point. According to the principle of triangulation, using the known baseline distance of the binocular cameras (the horizontal distance between the optical centers of the left and right cameras) and the focal length of the left camera, each disparity value in the disparity map is converted into a physical depth value using the following formula (1), thereby obtaining the first depth information.

[0049] In the above formula, Indicates the depth value. Indicates the focal length of the left camera. Indicates the baseline distance of the binocular cameras. This represents the disparity value.

[0050] Similarly, when calculating the second depth information, the first original image can be used as a reference image for the above calculation, which will not be elaborated here.

[0051] Step 1102. Input the pixel coordinates and first depth information of the first original image into the first coordinate mapping model corresponding to the left camera to obtain the pixel coordinates of the first virtual camera image.

[0052] The first coordinate mapping model is used to map the first original image to the pixel coordinate system of the left virtual camera.

[0053] Step 1103. Input the pixel coordinates and second depth information of the second original image into the preset second coordinate mapping model to obtain the pixel coordinates of the second virtual camera image.

[0054] The second coordinate mapping model is used to map the second original image to the pixel coordinate system of the right virtual camera.

[0055] In some embodiments of this application, for each physical camera in the left and right cameras, its corresponding coordinate mapping model (first coordinate mapping model or second coordinate mapping model) can be preset based on the calibration parameters of the physical camera and the setting parameters of the virtual camera corresponding to the physical camera. Specifically, the coordinate mapping model of the physical camera can be preset through the following steps 210-260.

[0056] Step 210. Obtain the calibration parameters of each camera in the physical camera and the corresponding virtual camera.

[0057] The physical camera can be either the left or right camera, and the virtual camera can be either the left virtual camera or the right rendering camera that corresponds to the physical camera.

[0058] The calibration parameters include the camera's intrinsic parameters and the extrinsic parameters between the camera and the inertial measurement unit (IMU).

[0059] The intrinsic parameters of a camera are parameters that describe the camera's internal imaging geometry and are independent of the camera's position and orientation in space. For example, intrinsic parameters may include, but are not limited to, focal length, center point coordinates (i.e., the coordinates of the image center point in the pixel coordinate system), distortion coefficients, etc.

[0060] The extrinsic parameters between the camera and the IMU describe the relative position and orientation relationship between the camera and the IMU built into the MR device. For example, the extrinsic parameters between the camera and the IMU may include, but are not limited to, rotation matrices and translation vectors. The rotation matrix indicates how to rotate the camera's coordinate system to align its coordinate axes with the IMU's coordinate system. The translation vector indicates the vector pointing from the center of the IMU sensor to the optical center of the camera in the IMU coordinate system.

[0061] In some embodiments of this application, the camera calibration parameters may further include extrinsic parameters between cameras. For a physical camera, its calibration parameters may further include extrinsic parameters between the physical camera and the other camera in the stereo camera system. For a virtual camera, its calibration parameters may further include extrinsic parameters between the virtual camera and another virtual camera. Exemplarily, the extrinsic parameters between cameras may include, but are not limited to, rotation matrices and translation vectors. The rotation matrix indicates how to rotate the camera coordinate system of one camera to align its coordinate axes with the camera coordinate system of the other camera. The translation vector indicates the vector pointing from the optical center of one camera to the optical center of the other camera in the camera coordinate system. For a physical camera, the magnitude of the translation vector in the corresponding extrinsic parameters between cameras is the baseline distance of the stereo system.

[0062] In some embodiments of this application, the calibration parameters of the physical camera can be obtained by jointly calibrating the multi-sensor system in the MR device using a multi-sensor joint calibration method. This joint calibration method can employ vision-inertial joint calibration techniques, such as a joint optimization method based on Zhang Zhengyou's planar calibration method and IMU inertial measurement data, the Kalibr joint calibration method, or a vision-inertial joint calibration method based on the AprilTag calibration board. By acquiring image data containing calibration patterns and corresponding IMU acceleration and angular velocity data under different postures and motion states, the intrinsic parameters of the camera, the extrinsic parameters between cameras, and the extrinsic parameters between the camera and the IMU are jointly optimized and solved to obtain these parameters, providing fundamental data support for subsequent attitude calculation and spatial alignment correction.

[0063] In some embodiments of this application, the calibration parameters of the virtual camera can be determined during the design phase of the MR device and preset before leaving the factory. Specifically, the calibration parameters of the virtual camera are not obtained through measurement or calibration, but are predefined and configured in the software according to the display system specifications and predetermined optical design of the MR device. Their settings are based on an ideal, error-free display model. The intrinsic parameters of the virtual camera define the imaging geometry of the virtual camera, the core of which is to match the final image displayed to the user.

[0064] In some embodiments of this application, the focal length in the intrinsic parameters of the virtual camera can be calculated using a perspective projection model based on the physical dimensions (width and height), resolution, and the display field of view designed for the corresponding display screen (left or right display screen) of the virtual camera. The center point coordinates can be directly set to the center of the corresponding pixel coordinate system. Because the virtual camera is an ideal module, its distortion coefficient can be set to 0, i.e., no optical distortion. The intrinsic parameters of the virtual camera ensure that the image generated by the virtual camera has the same viewpoint, scale, and center as the image seen by the user on the display screen.

[0065] In some embodiments of this application, the extrinsic parameters between the virtual camera and the IMU define the ideal pose of the virtual camera in the IMU coordinate system. The rotation matrix can be set as an identity matrix or its equivalent fixed small rotation. This means that the camera coordinate system axis of the virtual camera is defined as parallel to the IMU coordinate system. This is for ease of calculation, because the IMU directly measures the device pose, and having the virtual camera parallel to it facilitates the alignment of virtual content with the real world. The translation vector defines the ideal position of the virtual camera relative to the IMU and can be set according to the interpupillary distance of the human eye and the optical design of the MR device (such as the light path deflection of the screen and lenses).

[0066] In some embodiments of this application, the extrinsic parameters between the left and right virtual cameras can be set such that the rotation matrix is ​​an identity matrix and the translation vector is only the displacement along the X-axis (horizontal direction), the magnitude of which is equal to the virtual binocular distance (usually related to the user's IPD). This ensures an ideal positional relationship between the left and right virtual cameras, where only horizontal parallax exists.

[0067] Step 220. Based on the intrinsic parameters of the physical camera, determine the first mapping function between the camera coordinate system and the pixel coordinate system of the physical camera.

[0068] In some embodiments of this application, the intrinsic parameters of the physical camera can be used to complete the mutual conversion between the pixel coordinates corresponding to the physical camera and the three-dimensional coordinates in the camera coordinate system. Based on this, a first mapping function between the camera coordinate system and the pixel coordinate system of the physical camera can be determined.

[0069] For example, taking the physical camera as the left camera, the first mapping function is shown in equation (2) below: In the above formula (2), This represents the three-dimensional coordinates of the left camera in the camera coordinate system. The coordinates are along the Z-axis of the camera coordinate system, used to represent the distance between the camera lens and the object, i.e., the camera's depth information. Represents the two-dimensional coordinates in the pixel coordinate system of the left camera. This represents the intrinsic parameters of the left camera.

[0070] use express Formula (2) above can be simplified to formula (3) as follows: Step 230. Based on the extrinsic parameters of the physical camera and the virtual camera, determine the transformation matrix between the camera coordinate system of the physical camera and the camera coordinate system of the virtual camera.

[0071] In some embodiments of this application, since the camera coordinate system of the virtual camera is parallel to the IMU coordinate system, and the extrinsic parameters between the virtual camera and the IMU and between the physical camera and the IMU are known, the transformation matrix between the camera coordinate system of the physical camera and the camera coordinate system of the virtual camera can be obtained.

[0072] For example, taking the physical camera as the left camera and the virtual camera as the left virtual camera corresponding to the left camera, the extrinsic parameters between the left camera and the IMU are denoted as... The extrinsic parameters between the left virtual camera and the IMU are denoted as... Based on this, the transformation matrix between the camera coordinate system of the left camera and the camera coordinate system of the left virtual camera can be obtained, denoted as . .

[0073] Step 240. Based on the intrinsic parameters of the virtual camera, determine the second mapping function between the camera coordinate system and the pixel coordinate system of the virtual camera.

[0074] Similar to step 220, based on the intrinsic parameters of the virtual camera, a second mapping function between the camera coordinate system and the pixel coordinate system of the virtual camera can be obtained.

[0075] For example, taking the virtual camera as the left virtual camera, the second mapping function is shown in equation (4) below: In the above formula (4), This represents the three-dimensional coordinates of the left virtual camera in the camera coordinate system. The coordinates are along the Z-axis of the camera coordinate system, used to represent the distance between the left virtual camera lens and the object, i.e., the depth information of the left virtual camera. This represents the two-dimensional coordinates in the pixel coordinate system of the left virtual camera. This represents the intrinsic parameters of the left virtual camera.

[0076] Step 250. Based on the first mapping function, the transformation matrix, and the second mapping function, determine the fourth mapping function between the pixel coordinate system of the physical camera and the pixel coordinate system of the virtual camera.

[0077] For example, taking the physical camera as the left camera and the virtual camera as the left virtual camera, the following formula (5) can be obtained based on the transformation matrix between the two: Based on the above formulas (3)-(5), the fourth mapping function shown in formula (6) can be obtained: In the above formula (6) This represents the depth information of the left camera, i.e. .

[0078] Step 260. Determine the fourth mapping function as the coordinate mapping model corresponding to the physical camera.

[0079] For example, the above formula (6) is determined as the first coordinate mapping model corresponding to the left camera.

[0080] Similarly, the second coordinate mapping model corresponding to the right camera can also be determined based on the above method, which will not be elaborated further here.

[0081] As can be seen from the above formula (6), only the depth information and the pixel coordinates of the original image captured by the physical camera are unknown on the right side of the formula. Therefore, when performing spatial merging correction on the first original image captured by the left camera, it is only necessary to obtain the first depth information corresponding to the first original image in real time to correct the left camera to the position of the left virtual camera, thereby realizing real-time spatial merging correction of the first original image.

[0082] Similarly, the right camera can also be corrected to the position of the right virtual camera. After both the left and right cameras have completed spatial image reconciliation correction, since the left and right virtual cameras only have horizontal translation parallax, the corrected first and second virtual camera images only have horizontal translation parallax. Vertical parallax and rotation parallax are corrected, and spatial consistency correction is completed.

[0083] Step 120. Determine the positional deviation of the first virtual camera image and the second virtual camera image in the image row direction.

[0084] In some embodiments of this application, spatial image reconciliation correction may result in differences in the exposure times of the left and right cameras, i.e., inconsistent timing between the two. Therefore, to achieve temporal consistency correction, after obtaining the first and second virtual camera images, the positional deviation (i.e., pixel row difference) between the first and second virtual camera images in the image row direction is determined. Based on this, temporal image reconciliation correction is performed on the left and right cameras.

[0085] In some embodiments of this application, step 120 may include steps 1201-1203.

[0086] Step 1201. Determine the first row coordinates of the center point of the first virtual camera image in the first original image.

[0087] The center point of the first virtual camera image refers to the point that coincides with the center point of the left display screen when the first virtual camera image is displayed on the left display screen. In other words, the center point of the first virtual camera image is the center point of the image area displayed on the left display screen in the first virtual camera image.

[0088] In some embodiments of this application, after determining the center point of the first virtual camera image, the pixel coordinates corresponding to the center point in the pixel coordinate system of the left virtual camera can be determined based on the coordinates of the center point in the pixel coordinate system of the left virtual camera and the first coordinate mapping model shown in the above formula (6). The row coordinates of the pixel coordinates corresponding to the center point in the pixel coordinate system of the left camera are the first row coordinates of the center point in the first original image.

[0089] In some embodiments of this application, a third coordinate mapping model corresponding to the left virtual camera can be preset. The third coordinate mapping model is used to map the left virtual camera image to the pixel coordinate system of the left camera. Based on this, the first row coordinates corresponding to the center point in the first original image can be determined based on the coordinates of the center point of the first virtual camera image in the pixel coordinate system of the left virtual camera and the third coordinate mapping model.

[0090] The third coordinate mapping model can be obtained based on the following logical reasoning: Based on the above formula (3), we can obtain the following formula (7): Based on the above formulas (5) and (7), we can obtain the following formula (8): Based on the above formulas (2) and (8), we can obtain the following formula (9): Formula (9) above is determined as the third coordinate mapping model.

[0091] Step 1202. Determine the second row coordinates of the center point of the second virtual camera image in the second original image.

[0092] The center point of the second virtual camera image refers to the point that coincides with the center point of the right display screen when the second virtual camera image is displayed on the right display screen. In other words, the center point of the second virtual camera image is the center point of the image area displayed on the right display screen in the second virtual camera image.

[0093] In some embodiments of this application, the second row coordinates corresponding to the center point of the second virtual camera image in the second original image can be determined in the same manner as step 1202 described above. For example, a fourth coordinate mapping model corresponding to the right virtual camera can be preset, which is used to map the right virtual camera image to the pixel coordinate system of the right camera. Based on this, the first row coordinates corresponding to the center point in the second original image can be determined based on the coordinates of the center point of the second virtual camera image in the pixel coordinate system of the right virtual camera and the fourth coordinate mapping model. To avoid repetition, further details are omitted here.

[0094] Step 1203. The difference between the first row coordinates and the second row coordinates is determined as the position deviation.

[0095] In some embodiments of this application, the first row coordinates are marked as Mark the second row coordinates as The positional deviation can be calculated using the following formula (10). : Step 130. Based on the positional deviation, determine the time delay offset between the exposure start times of the left and right cameras.

[0096] In some embodiments of this application, the binocular camera is a Rolling Shutter camera. Based on this, in step 130 above, the time delay offset can be determined through the following steps 1301-1302.

[0097] Step 1301. Determine the time interval between each line of exposure time for the binocular camera.

[0098] In some embodiments of this application, the time interval t for each line of exposure of the binocular camera can be obtained by consulting the specifications of the binocular camera.

[0099] Step 1302. The product of the position deviation and the time interval is determined as the time delay offset.

[0100] After obtaining the position deviation and time interval, the time delay offset is calculated according to the following formula (11). : Step 140. Based on the time delay offset, adjust the exposure start time of at least one of the left and right cameras.

[0101] After obtaining the time delay offset, the exposure start time of at least one of the left and right cameras is adjusted based on the time delay offset to achieve exposure synchronization between the two, thereby ensuring that the exposure time of the center of the image in the left and right displays seen by the user is consistent, and realizing time-based image alignment correction.

[0102] In some embodiments of this application, step 140 may include the following two adjustment strategies: When the time delay offset is positive, control the right camera to start exposure with a time delay offset relative to the left camera; When the time delay offset is negative, the left camera is controlled to start exposure with a time delay offset relative to the right camera.

[0103] For example, see Figure 8 ,exist When the value is positive, the left camera begins exposure at time T, and the right camera begins exposure at time T+. The exposure time is set to be consistent, thus ensuring that the center of the area on the screen that the user sees is consistent in time.

[0104] In this embodiment, geometric correction is performed on the binocular images based on an ideal virtual camera, thereby achieving real-time correction of spatial image alignment. By analyzing the images, the exposure time difference in the row direction of the final displayed image is calculated, and the camera's exposure timing is adjusted in real time based on this difference to achieve a method for temporal image alignment synchronization correction. According to this embodiment, while maintaining system real-time performance, it is possible to accurately compensate for spatial misalignment and temporal asynchrony caused by camera assembly tolerances and rolling shutter characteristics. By achieving synchronization and consistency of binocular images in geometric space and exposure timing, the image alignment and ghosting phenomena in VST displays can be significantly reduced, thereby effectively alleviating user dizziness and improving visual comfort and immersive experience.

[0105] In some embodiments of this application, the hardware may adopt a scheme in which the left and right cameras share a clock source by connecting the clock pins, and adopt a hard synchronization strategy to ensure the accuracy of synchronization.

[0106] In some embodiments of this application, the field of view (FOV) of the physical camera is fixed and relatively large. However, the FOV ultimately seen by the user on the display screen is determined by the optical design, screen size, and software rendering strategy, and is typically smaller; that is, the FOV of the physical camera is larger than the FOV of its corresponding virtual camera. To ensure that the final displayed image matches the FOV of the virtual camera, after obtaining the spatially corrected virtual camera image, the virtual camera image can be further cropped based on the virtual camera's FOV to obtain the final display image used on the display screen. Therefore, when the FOV of the left camera is larger than that of the left virtual camera, and the FOV of the right camera is larger than the parallax angle of the right virtual camera, steps 310-320 can be performed before step 120 above.

[0107] Step 310. Based on the field of view of the left virtual camera, crop the first virtual camera image to obtain the first display image.

[0108] The first displayed image is the image to be displayed on the left display screen.

[0109] In some embodiments of this application, when cropping the first virtual camera image, the size (width and height) of the image to be displayed on the left display screen can be determined first based on the field of view of the left virtual camera. Then, the center point coordinates defined in the intrinsic parameters of the left virtual camera are used as the cropping center, and an image with the same size as the image displayed on the left display screen is cropped from the first virtual camera image as the first display image.

[0110] Step 320. Based on the field of view of the right virtual camera, crop the image of the second virtual camera to obtain the second display image.

[0111] The second display image is the image to be displayed on the right display screen.

[0112] In some embodiments of this application, similar to step 310 above, when cropping the second virtual camera image, the size (width and height) of the image to be displayed on the right display screen can be determined first based on the field of view of the right virtual camera. Then, the center point coordinates defined in the intrinsic parameters of the right virtual camera are used as the cropping center, and an image with the same size as the image displayed on the right display screen is cropped from the second virtual camera image as the second display image.

[0113] Accordingly, if the first and second virtual camera images were cropped before step 120, then in step 120, the positional deviation of the first and second display images in the image row direction is determined instead. The method for determining the positional deviation of the first and second display images in the image row direction is the same as the method for determining the positional deviation of the first and second virtual camera images in the image row direction, and can be found in the relevant modulus in the above embodiments. To avoid repetition, it will not be elaborated further here.

[0114] The binocular fusion correction method provided in this application can be implemented by a binocular fusion correction device. This application uses a binocular fusion correction device to perform the binocular fusion correction method as an example to illustrate the binocular fusion correction device provided in this application.

[0115] See Figure 9 The binocular fusion correction device 900 provided in this application embodiment includes the following modules: The spatial correction module 901 is used to map the first original image captured by the left camera and the second original image captured by the right camera in the binocular camera to the pixel coordinate system of the corresponding left virtual camera and right virtual camera, respectively, to obtain the spatially corrected first virtual camera image and second virtual camera image; wherein, the left virtual camera and the right virtual camera have a predetermined ideal relative pose; The time correction module 902 is used to determine the positional deviation between the first virtual camera image and the second virtual camera image in the image row direction; based on the positional deviation, determine the time delay offset between the exposure start times of the left camera and the right camera; and based on the time delay offset, adjust the exposure start time of at least one of the left camera and the right camera.

[0116] In some embodiments of this application, the spatial correction module 901 is specifically used for: First depth information and second depth information are determined respectively. The first depth information is the depth information when the left camera captures the first original image, and the second depth information is the depth information when the right camera captures the second original image. The pixel coordinates of the first original image and the first depth information are input into the first coordinate mapping model corresponding to the left camera to obtain the pixel coordinates of the first virtual camera image. The first coordinate mapping model is used to map the first original image to the pixel coordinate system of the left virtual camera. The pixel coordinates of the second original image and the second depth information are input into a preset second coordinate mapping model to obtain the pixel coordinates of the second virtual camera image. The second coordinate mapping model is used to map the second original image to the pixel coordinate system of the right virtual camera.

[0117] In some embodiments of this application, the device 900 further includes: a model setting module, used for: Obtain the calibration parameters of each camera in the physical camera and the corresponding virtual camera, where the physical camera is the left camera or the right camera, and the virtual camera is the left virtual camera or the right virtual camera. The calibration parameters include the intrinsic parameters of the camera and the extrinsic parameters between the camera and the inertial measurement unit (IMU). Based on the intrinsic parameters of the physical camera, a first mapping function is determined between the camera coordinate system and the pixel coordinate system of the physical camera; Based on the extrinsic parameters of the physical camera and the extrinsic parameters of the virtual camera, determine the transformation matrix between the camera coordinate system of the physical camera and the camera coordinate system of the virtual camera; Based on the intrinsic parameters of the virtual camera, a second mapping function is determined between the camera coordinate system and the pixel coordinate system of the virtual camera; Based on the first mapping function, the transformation matrix, and the second mapping function, a fourth mapping function is determined between the pixel coordinate system of the physical camera and the pixel coordinate system of the virtual camera; The fourth mapping function is determined as the coordinate mapping model corresponding to the physical camera.

[0118] In some embodiments of this application, the time correction module 902 is specifically used for: Determine the coordinates of the first row in the first original image corresponding to the center point of the first virtual camera image; Determine the second row coordinates of the center point of the second virtual camera image in the second original image; The difference between the first row coordinates and the second row coordinates is determined as the position deviation.

[0119] In some embodiments of this application, the time correction module 902 is specifically used for: Determine the time interval between each line of exposure time of the binocular camera; The product of the position deviation and the time interval is determined as the time delay offset.

[0120] In some embodiments of this application, the time correction module 902 is specifically used for: When the time delay offset is positive, the right camera is controlled to start exposure with a delay of the time delay offset relative to the left camera; When the time delay offset is negative, the left camera is controlled to delay the exposure by the time delay offset relative to the right camera.

[0121] In some embodiments of this application, the device 900 further includes: a trimming module, used for: Based on the field of view of the left virtual camera, the image of the first virtual camera is cropped to obtain the first display image; Based on the field of view of the right virtual camera, the image of the second virtual camera is cropped to obtain the second display image; The time correction module 902 is specifically used for: Determine the positional deviation between the first displayed image and the second displayed image in the image row direction.

[0122] The binocular fusion correction device in this application embodiment can be an electronic device or a component within an electronic device, such as an integrated circuit or a chip. The electronic device can be a terminal or other devices besides a terminal. For example, the electronic device can be a mobile phone, tablet computer, laptop computer, PDA, in-vehicle electronic device, mobile internet device (MID), augmented reality (AR) / virtual reality (VR) device, robot, wearable device, ultra-mobile personal computer (UMPC), netbook, or personal digital assistant (PDA), etc. It can also be a server, network attached storage (NAS), personal computer (PC), television (TV), ATM, or self-service machine, etc. This application embodiment does not specifically limit the device.

[0123] The binocular fusion correction device in this application embodiment can be a device with an operating system. This operating system can be Android, iOS, or other possible operating systems; this application embodiment does not specifically limit it.

[0124] The binocular fusion correction device provided in this application embodiment can achieve... Figures 7 to 8 The various processes implemented in the method implementation examples will not be described again here to avoid repetition.

[0125] This application also provides an augmented reality device, including a binocular camera, an IMU, and the binocular fusion correction device provided in the above embodiments. The binocular camera includes a left camera and a right camera.

[0126] In some embodiments of this application, the augmented reality device further includes hardware synchronization circuitry; The hardware synchronization circuit is connected to the clock pins of the left camera and the right camera respectively, so that the left camera and the right camera can share a clock source.

[0127] Optionally, such as Figure 10 As shown, this application embodiment also provides an electronic device 1000, including a processor 1001 and a memory 1002. The memory 1002 stores a program or instructions that can run on the processor 1001. When the program or instructions are executed by the processor 1001, they implement the various steps of the above-described binocular fusion correction device method embodiment and can achieve the same technical effect. To avoid repetition, they will not be described again here.

[0128] It should be noted that the electronic devices in the embodiments of this application include the mobile electronic devices and non-mobile electronic devices described above.

[0129] Figure 11 A schematic diagram of the hardware structure of an electronic device to implement an embodiment of this application.

[0130] The electronic device 1100 includes, but is not limited to, components such as: radio frequency unit 1101, network module 1102, audio output unit 1103, input unit 1104, sensor 1105, display unit 1106, user input unit 1107, interface unit 1108, memory 1109, and processor 1110.

[0131] Those skilled in the art will understand that the electronic device 1100 may also include a power supply (such as a battery) for supplying power to various components. The power supply may be logically connected to the processor 1110 through a power management system, thereby enabling functions such as managing charging, discharging, and power consumption through the power management system. Figure 11 The electronic device structure shown does not constitute a limitation on the electronic device. The electronic device may include more or fewer components than shown, or combine certain components, or have different component arrangements, which will not be elaborated here.

[0132] The processor 1110 is configured to map the first original image captured by the left camera and the second original image captured by the right camera in the binocular camera system to the pixel coordinate systems of the corresponding left and right virtual cameras, respectively, to obtain a spatially corrected first virtual camera image and a second virtual camera image; wherein the left virtual camera and the right virtual camera have a predetermined ideal relative pose; determine the positional deviation of the first virtual camera image and the second virtual camera image in the image row direction; determine the time delay offset of the exposure start time of the left camera and the right camera based on the positional deviation; and adjust the exposure start time of at least one of the left camera and the right camera based on the time delay offset.

[0133] In some embodiments of this application, the processor 1110 is specifically used for: First depth information and second depth information are determined respectively. The first depth information is the depth information when the left camera captures the first original image, and the second depth information is the depth information when the right camera captures the second original image. The pixel coordinates of the first original image and the first depth information are input into the first coordinate mapping model corresponding to the left camera to obtain the pixel coordinates of the first virtual camera image. The first coordinate mapping model is used to map the first original image to the pixel coordinate system of the left virtual camera. The pixel coordinates of the second original image and the second depth information are input into a preset second coordinate mapping model to obtain the pixel coordinates of the second virtual camera image. The second coordinate mapping model is used to map the second original image to the pixel coordinate system of the right virtual camera.

[0134] In some embodiments of this application, processor 1110 is further configured to: Obtain the calibration parameters of each camera in the physical camera and the corresponding virtual camera, where the physical camera is the left camera or the right camera, and the virtual camera is the left virtual camera or the right virtual camera. The calibration parameters include the intrinsic parameters of the camera and the extrinsic parameters between the camera and the inertial measurement unit (IMU). Based on the intrinsic parameters of the physical camera, a first mapping function is determined between the camera coordinate system and the pixel coordinate system of the physical camera; Based on the extrinsic parameters of the physical camera and the extrinsic parameters of the virtual camera, determine the transformation matrix between the camera coordinate system of the physical camera and the camera coordinate system of the virtual camera; Based on the intrinsic parameters of the virtual camera, a second mapping function is determined between the camera coordinate system and the pixel coordinate system of the virtual camera; Based on the first mapping function, the transformation matrix, and the second mapping function, a fourth mapping function is determined between the pixel coordinate system of the physical camera and the pixel coordinate system of the virtual camera; The fourth mapping function is determined as the coordinate mapping model corresponding to the physical camera.

[0135] In some embodiments of this application, the processor 1110 is specifically used for: Determine the coordinates of the first row in the first original image corresponding to the center point of the first virtual camera image; Determine the second row coordinates of the center point of the second virtual camera image in the second original image; The difference between the first row coordinates and the second row coordinates is determined as the position deviation.

[0136] In some embodiments of this application, the processor 1110 is specifically used for: Determine the time interval between each line of exposure time of the binocular camera; The product of the position deviation and the time interval is determined as the time delay offset.

[0137] In some embodiments of this application, the processor 1110 is specifically used for: When the time delay offset is positive, the right camera is controlled to start exposure with a delay of the time delay offset relative to the left camera; When the time delay offset is negative, the left camera is controlled to delay the exposure by the time delay offset relative to the right camera.

[0138] In some embodiments of this application, the processor 1110 is also used for: Based on the field of view of the left virtual camera, the image of the first virtual camera is cropped to obtain the first display image; Based on the field of view of the right virtual camera, the image of the second virtual camera is cropped to obtain the second display image; The processor 1110 is specifically used for: Determine the positional deviation between the first displayed image and the second displayed image in the image row direction.

[0139] It should be understood that, in this embodiment, the input unit 1104 may include a graphics processing unit (GPU) 11041 and a microphone 11042. The GPU 11041 processes image data of still images or videos obtained by an image capture device (such as a camera) in video capture mode or image capture mode. The display unit 1106 may include a display panel 11061, which may be configured in the form of a liquid crystal display, an organic light-emitting diode, or the like. The user input unit 1107 includes at least one of a touch panel 11071 and other input devices 11072. The touch panel 11071 is also called a touch screen. The touch panel 11071 may include a touch detection device and a touch controller. Other input devices 11072 may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, power buttons, etc.), trackballs, mice, and joysticks, which will not be described in detail here.

[0140] The memory 1109 can be used to store software programs and various data. The memory 1109 may primarily include a first storage area for storing programs or instructions and a second storage area for storing data. The first storage area may store the operating system, application programs or instructions required for at least one function (such as sound playback, image playback, etc.). Furthermore, the memory 1109 may include volatile memory or non-volatile memory, or both. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be random access memory (RAM), static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link dynamic random access memory (SLDRAM), and direct memory bus RAM (DRRAM). The memory 1109 in this embodiment includes, but is not limited to, these and any other suitable types of memory.

[0141] Processor 1110 may include one or more processing units; optionally, processor 1110 integrates an application processor and a modem processor, wherein the application processor mainly handles operations involving the operating system, user interface, and applications, and the modem processor mainly handles wireless communication signals, such as a baseband processor. It is understood that the aforementioned modem processor may also not be integrated into processor 1110.

[0142] This application also provides a readable storage medium storing a program or instructions. When the program or instructions are executed by a processor, they implement the various processes of the above-described binocular fusion correction device method embodiments and achieve the same technical effects. To avoid repetition, these will not be described again here.

[0143] The processor is the processor in the electronic device described in the above embodiments. The readable storage medium includes computer-readable storage media, such as computer read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk.

[0144] This application embodiment also provides a chip, which includes a processor and a communication interface. The communication interface is coupled to the processor. The processor is used to run programs or instructions to implement the various processes of the above-described binocular fusion correction device method embodiment, and can achieve the same technical effect. To avoid repetition, it will not be described again here.

[0145] It should be understood that the chip mentioned in the embodiments of this application may also be referred to as a system-on-a-chip, system chip, chip system, or system-on-a-chip, etc.

[0146] This application provides a computer program product, which is stored in a storage medium and executed by at least one processor to implement the various processes of the binocular fusion correction device method embodiment described above, and can achieve the same technical effect. To avoid repetition, it will not be described again here.

[0147] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element. Furthermore, it should be noted that the scope of the methods and apparatuses in the embodiments of this application is not limited to performing functions in the order shown or discussed, but may also include performing functions substantially simultaneously or in the reverse order, depending on the functions involved. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

[0148] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a computer software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in the various embodiments of this application.

[0149] The embodiments of this application have been described above with reference to the accompanying drawings. However, this application is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms under the guidance of this application without departing from the spirit and scope of the claims, and all of these forms are within the protection scope of this application.

Claims

1. A binocular fusion correction method, characterized in that, include: The first original image captured by the left camera and the second original image captured by the right camera in the binocular camera are mapped to the pixel coordinate system of the corresponding left virtual camera and right virtual camera, respectively, to obtain the spatially corrected first virtual camera image and second virtual camera image; wherein the left virtual camera and the right virtual camera have a predetermined ideal relative pose; Determine the positional deviation between the first virtual camera image and the second virtual camera image in the image row direction; Based on the positional deviation, the time delay offset between the exposure start times of the left camera and the right camera is determined; Based on the time delay offset, the exposure start time of at least one of the left camera and the right camera is adjusted.

2. The method according to claim 1, characterized in that, The step of mapping the first original image captured by the left camera and the second original image captured by the right camera in the binocular camera system to the corresponding pixel coordinate systems of the left and right virtual cameras, respectively, to obtain the spatially corrected first virtual camera image and second virtual camera image, includes: First depth information and second depth information are determined respectively. The first depth information is the depth information when the left camera captures the first original image, and the second depth information is the depth information when the right camera captures the second original image. The pixel coordinates of the first original image and the first depth information are input into the first coordinate mapping model corresponding to the left camera to obtain the pixel coordinates of the first virtual camera image. The first coordinate mapping model is used to map the first original image to the pixel coordinate system of the left virtual camera. The pixel coordinates of the second original image and the second depth information are input into a preset second coordinate mapping model to obtain the pixel coordinates of the second virtual camera image. The second coordinate mapping model is used to map the second original image to the pixel coordinate system of the right virtual camera.

3. The method according to claim 2, characterized in that, Before mapping the first original image captured by the left camera and the second original image captured by the right camera in the binocular camera system to the corresponding pixel coordinate systems of the left and right virtual cameras, respectively, to obtain the spatially corrected first and second virtual camera images, the method further includes: Obtain the calibration parameters of each camera in the physical camera and the corresponding virtual camera, where the physical camera is the left camera or the right camera, and the virtual camera is the left virtual camera or the right virtual camera. The calibration parameters include the intrinsic parameters of the camera and the extrinsic parameters between the camera and the inertial measurement unit (IMU). Based on the intrinsic parameters of the physical camera, a first mapping function is determined between the camera coordinate system and the pixel coordinate system of the physical camera; Based on the extrinsic parameters of the physical camera and the extrinsic parameters of the virtual camera, determine the transformation matrix between the camera coordinate system of the physical camera and the camera coordinate system of the virtual camera; Based on the intrinsic parameters of the virtual camera, a second mapping function is determined between the camera coordinate system and the pixel coordinate system of the virtual camera; Based on the first mapping function, the transformation matrix, and the second mapping function, a fourth mapping function is determined between the pixel coordinate system of the physical camera and the pixel coordinate system of the virtual camera; The fourth mapping function is determined as the coordinate mapping model corresponding to the physical camera.

4. The method according to claim 1, characterized in that, Determining the positional deviation between the first virtual camera image and the second virtual camera image in the image row direction includes: Determine the coordinates of the first row in the first original image corresponding to the center point of the first virtual camera image; Determine the second row coordinates of the center point of the second virtual camera image in the second original image; The difference between the first row coordinates and the second row coordinates is determined as the position deviation.

5. The method according to claim 1, characterized in that, The step of determining the time delay offset between the exposure start times of the left camera and the right camera based on the positional deviation includes: Determine the time interval between each line of exposure time of the binocular camera; The product of the position deviation and the time interval is determined as the time delay offset.

6. The method according to claim 5, characterized in that, Adjusting the exposure start time of at least one of the left camera and the right camera based on the time delay offset includes: When the time delay offset is positive, the right camera is controlled to start exposure with a delay of the time delay offset relative to the left camera; When the time delay offset is negative, the left camera is controlled to delay the exposure by the time delay offset relative to the right camera.

7. The method according to any one of claims 1-6, characterized in that, The field of view of the left camera is greater than that of the left virtual camera, and the field of view of the right camera is greater than that of the right virtual camera. Before determining the positional deviation between the first virtual camera image and the second virtual camera image in the image row direction, the method further includes: Based on the field of view of the left virtual camera, the image of the first virtual camera is cropped to obtain the first display image; Based on the field of view of the right virtual camera, the image of the second virtual camera is cropped to obtain the second display image; Determining the positional deviation between the first virtual camera image and the second virtual camera image in the image row direction includes: Determine the positional deviation between the first displayed image and the second displayed image in the image row direction.

8. A binocular fusion correction device, characterized in that, include: The spatial correction module is used to map the first original image captured by the left camera and the second original image captured by the right camera in the binocular camera to the pixel coordinate system of the corresponding left virtual camera and right virtual camera, respectively, to obtain the spatially corrected first virtual camera image and second virtual camera image; wherein the left virtual camera and the right virtual camera have a predetermined ideal relative pose; The time correction module is used to determine the positional deviation between the first virtual camera image and the second virtual camera image in the image row direction; based on the positional deviation, determine the time delay offset between the exposure start times of the left camera and the right camera; and based on the time delay offset, adjust the exposure start time of at least one of the left camera and the right camera.

9. An augmented reality device, characterized in that, Includes a binocular camera, an IMU, and the binocular fusion correction device as described in claim 8; The binocular camera includes a left camera and a right camera.

10. An electronic device, characterized in that, It includes a processor and a memory, the memory storing a program or instructions that can run on the processor, the program or instructions being executed by the processor to implement the steps of the binocular fusion correction method as described in any one of claims 1-7.