Pose information processing method, device and storage medium

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By dividing pose information into intrinsic, relative, and absolute extrinsic parameters, and optimizing the absolute extrinsic parameter of the main camera, the problem of inaccurate pose estimation for objects with weak texture features is solved, and the accuracy and stability of pose information are improved.

CN116894873BActive Publication Date: 2026-06-26TAOBAO CHINA SOFTWARE

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: TAOBAO CHINA SOFTWARE
Filing Date: 2023-07-17
Publication Date: 2026-06-26

AI Technical Summary

Technical Problem

Existing technologies struggle to accurately reconstruct camera poses when estimating the pose of objects with weak texture features, and the relative sliding error between the camera and the turntable is significant, affecting the accuracy of pose information.

Method used

Pose information is divided into intrinsic, relative, and absolute extrinsic parameters of the camera. Taking advantage of the characteristics of the image acquisition system, the absolute extrinsic parameter of the main camera is used as the shared absolute extrinsic parameter of multiple cameras. The absolute extrinsic parameter of the main camera is optimized by using depth information, which reduces the number of feature points required and improves the accuracy of pose information.

Benefits of technology

It improves the accuracy of pose estimation for objects with weak textures, reduces the relative sliding error between the camera and the turntable, and enhances the accuracy of pose information.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116894873B_ABST

Patent Text Reader

Abstract

The application provides a pose information processing method and device and a storage medium. The method comprises the following steps: obtaining to-be-processed pose information of a plurality of cameras in an image acquisition system, wherein the to-be-processed pose information comprises intrinsic parameters of the plurality of cameras, relative extrinsic parameters and absolute extrinsic parameters of a predetermined main camera in the plurality of cameras; obtaining a plurality of target images of a target object, wherein the plurality of target images are obtained by photographing the target object placed on a turntable by using the plurality of cameras based on the to-be-processed pose information; determining depth information of the plurality of target images according to the intrinsic parameters of the plurality of cameras and the relative extrinsic parameters; performing minimum re-projection error processing on the absolute extrinsic parameters according to the depth information to obtain optimized absolute extrinsic parameters; and determining the optimized absolute extrinsic parameters, the relative extrinsic parameters and the intrinsic parameters of the plurality of cameras as final pose information of the plurality of cameras. The application improves the pose information estimation accuracy of a weak texture object, reduces the error caused by the relative sliding between the camera and the turntable, and improves the accuracy of the pose information.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image processing technology, and in particular to a pose information processing method, device and storage medium. Background Technology

[0002] 3D reconstruction refers to establishing a mathematical model of a 3D object suitable for computer representation and processing. It forms the basis for processing, manipulating, and analyzing the properties of objects in a computer environment and is a key technology for creating virtual reality representations of the objective world within a computer. In computer vision, 3D reconstruction refers to the process of reconstructing 3D information from single-view or multi-view images. 3D reconstruction has significant application value in scenarios such as AR (Augmented Reality) and interactive 3D product displays.

[0003] Multi-view camera pose estimation is a necessary input for 3D object reconstruction. It marks the beginning of the process from 2D images to 3D reconstruction, and the accuracy of camera pose information directly affects the success or failure of the reconstruction. Therefore, camera calibration is necessary before performing 3D reconstruction on an object to obtain its pose information. In real-world scenarios, the camera pose estimation result depends on the texture characteristics of the object's surface. For objects with obvious texture features, pose estimation is more accurate. However, for objects with weak texture features, insufficient feature points cannot be extracted, leading to inaccurate pose estimation and consequently, inaccurate reconstruction. Summary of the Invention

[0004] The main objective of this application is to provide a pose information processing method, device, and storage medium, which improves the accuracy of pose information estimation for objects with weak textures and reduces the error caused by the relative sliding between the camera and the turntable, thereby improving the accuracy of pose information.

[0005] In a first aspect, embodiments of this application provide a pose information processing method applied to an image acquisition system, the image acquisition system including a turntable and multiple cameras; the method includes: acquiring pose information to be processed from multiple cameras in the image acquisition system, the pose information to be processed including: intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters of a predetermined master camera among the multiple cameras, wherein the relative extrinsic parameters represent the relative positional relationship between the multiple cameras, and the absolute extrinsic parameters represent the relative positional relationship between the master camera and the turntable; acquiring multiple target images of a target object, the multiple target images being captured by the multiple cameras based on the pose information to be processed on a target object placed on the turntable; determining depth information of the multiple target images according to the intrinsic parameters and the relative extrinsic parameters of the multiple cameras; performing a minimum reprojection error processing on the absolute extrinsic parameters according to the depth information to obtain optimized absolute extrinsic parameters, and determining the optimized absolute extrinsic parameters, the relative extrinsic parameters, and the intrinsic parameters of the multiple cameras as the final pose information of the multiple cameras.

[0006] In one embodiment, obtaining the pose information to be processed of multiple cameras in the image acquisition system includes: using the multiple cameras to acquire multiple calibration images of a preset calibration body placed on the turntable; calibrating the multiple cameras according to the multiple calibration images to obtain the pose information to be processed of the multiple cameras.

[0007] In one embodiment, calibrating the plurality of cameras based on the plurality of calibration images to obtain the pose information to be processed for the plurality of cameras includes: extracting first image features from the plurality of calibration images; performing feature matching on the plurality of calibration images based on the first image features; determining initial pose information for the plurality of cameras based on the first feature matching results of the plurality of calibration images, the initial pose information including: intrinsic parameters, relative extrinsic parameters of the plurality of cameras, and absolute extrinsic parameters of the main camera; and performing reprojection error minimization processing on the initial pose information to obtain the pose information to be processed for the plurality of cameras.

[0008] In one embodiment, the step of minimizing the reprojection error processing on the initial pose information to obtain the pose information to be processed of the plurality of cameras includes: performing image distortion correction processing on the initial pose information of the plurality of cameras; and using bundle adjustment to minimize the reprojection error processing on the distortion-corrected initial pose information to obtain the pose information to be processed of the plurality of cameras.

[0009] In one embodiment, determining the depth information of the plurality of target images based on the intrinsic parameters of the plurality of cameras and the relative extrinsic parameters includes: extracting second image features of the plurality of target images; performing feature matching on the plurality of target images based on the second image features; and determining the depth information of the plurality of target images based on the second feature matching results of the plurality of target images, the intrinsic parameters of the plurality of cameras, and the relative extrinsic parameters.

[0010] In one embodiment, determining the depth information of the plurality of target images based on the second feature matching result of the plurality of target images, the intrinsic parameters of the plurality of cameras, and the relative extrinsic parameters includes: performing triangulation localization processing on preset feature points based on the second feature matching result, the intrinsic parameters of the plurality of cameras, and the relative extrinsic parameters to obtain the depth information of the preset feature points in the plurality of target images.

[0011] In one embodiment, before extracting the second image features of the plurality of target images, the method further includes: performing distortion correction processing on the plurality of target images based on the intrinsic parameters of the plurality of cameras.

[0012] In one embodiment, the step of minimizing the reprojection error of the absolute extrinsic parameters based on the depth information to obtain optimized absolute extrinsic parameters includes: minimizing the reprojection error of the absolute extrinsic parameters using bundle adjustment based on the depth information to obtain optimized absolute extrinsic parameters of the plurality of cameras.

[0013] In one embodiment, the target object is an object that needs to be reconstructed in three dimensions.

[0014] Secondly, embodiments of this application provide a three-dimensional reconstruction method applied to an image acquisition system, the image acquisition system including a turntable and multiple cameras; the method includes: in response to a three-dimensional reconstruction request for a target object, acquiring pose information to be processed from multiple cameras in the image acquisition system; the pose information to be processed includes: intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters of the multiple cameras, wherein the relative extrinsic parameters represent the relative positional relationship between the multiple cameras, and the absolute extrinsic parameters represent the relative positional relationship between the multiple cameras and the turntable; acquiring multiple target images of the target object, the multiple target images being captured by the multiple cameras based on the pose information to be processed on the target object placed on the turntable; determining depth information of the multiple target images according to the intrinsic parameters of the multiple cameras and the relative extrinsic parameters; performing reprojection error minimization processing on the absolute extrinsic parameters according to the depth information to obtain optimized absolute extrinsic parameters, determining the optimized absolute extrinsic parameters, the relative extrinsic parameters, and the intrinsic parameters of the multiple cameras as the final pose information of the multiple cameras; and performing three-dimensional reconstruction of the target object according to the final pose information and the multiple target images to generate a three-dimensional virtual model of the target object.

[0015] Thirdly, embodiments of this application provide a three-dimensional reconstruction method applied to an image acquisition system, the image acquisition system including a turntable and multiple cameras; the method includes: in response to a three-dimensional reconstruction request for a target product, acquiring pose information to be processed from multiple cameras in the image acquisition system; the pose information to be processed includes: intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters of the multiple cameras, wherein the relative extrinsic parameters represent the relative positional relationship between the multiple cameras, and the absolute extrinsic parameters represent the relative positional relationship between the multiple cameras and the turntable; acquiring multiple target images of the target product, the multiple target images being captured by the multiple cameras based on the pose information to be processed on the target product placed on the turntable; determining depth information of the multiple target images according to the intrinsic parameters of the multiple cameras and the relative extrinsic parameters; performing reprojection error minimization processing on the absolute extrinsic parameters according to the depth information to obtain optimized absolute extrinsic parameters, determining the optimized absolute extrinsic parameters, the relative extrinsic parameters, and the intrinsic parameters of the multiple cameras as the final pose information of the multiple cameras; and performing three-dimensional reconstruction of the target product according to the final pose information and the multiple target images to generate a three-dimensional virtual model of the target product.

[0016] Fourthly, embodiments of this application provide a pose information processing device applied to an image acquisition system, the image acquisition system comprising: a turntable and multiple cameras; the device comprising:

[0017] The first acquisition module is used to acquire the pose information to be processed of multiple cameras in the image acquisition system. The pose information to be processed includes: intrinsic parameters, relative extrinsic parameters and absolute extrinsic parameters of the multiple cameras and a predetermined master camera among the multiple cameras. The relative extrinsic parameters represent the relative positional relationship between the multiple cameras and the absolute extrinsic parameters represent the relative positional relationship between the master camera and the turntable.

[0018] The second acquisition module is used to acquire multiple target images of the target object, wherein the multiple target images are obtained by using the multiple cameras to capture images of the target object placed on the turntable based on the pose information to be processed;

[0019] The determination module is used to determine the depth information of the multiple target images based on the intrinsic parameters of the multiple cameras and the relative extrinsic parameters;

[0020] An optimization module is used to minimize the reprojection error of the absolute extrinsic parameters based on the depth information to obtain optimized absolute extrinsic parameters, and to determine the optimized absolute extrinsic parameters, the relative extrinsic parameters, and the intrinsic parameters of the multiple cameras as the final pose information of the multiple cameras.

[0021] In one embodiment, the first acquisition module is used to acquire multiple calibration images of a preset calibration body placed on the turntable using the multiple cameras; and to calibrate the multiple cameras based on the multiple calibration images to obtain the pose information of the multiple cameras to be processed.

[0022] In one embodiment, the first acquisition module is specifically used to extract first image features from the plurality of calibration images; perform feature matching on the plurality of calibration images based on the first image features; determine the initial pose information of the plurality of cameras based on the first feature matching results of the plurality of calibration images, the initial pose information including: intrinsic parameters, relative extrinsic parameters of the plurality of cameras and absolute extrinsic parameters of the main camera; and perform reprojection error minimization processing on the initial pose information to obtain the pose information to be processed of the plurality of cameras.

[0023] In one embodiment, the first acquisition module is specifically used to perform image distortion correction processing on the initial pose information of the plurality of cameras; and to perform minimum reprojection error processing on the distortion-corrected initial pose information using bundle adjustment to obtain the pose information to be processed of the plurality of cameras.

[0024] In one embodiment, a determining module is configured to extract second image features from the plurality of target images; perform feature matching on the plurality of target images based on the second image features; and determine the depth information of the plurality of target images based on the second feature matching results of the plurality of target images, the intrinsic parameters of the plurality of cameras, and the relative extrinsic parameters.

[0025] In one embodiment, the determining module is specifically used to perform triangulation positioning processing on preset feature points based on the second feature matching result, the intrinsic parameters of the plurality of cameras and the relative extrinsic parameters, to obtain the depth information of the preset feature points in the plurality of target images.

[0026] In one embodiment, the determining module is further configured to perform distortion correction processing on the plurality of target images based on the intrinsic parameters of the plurality of cameras before extracting the second image features of the plurality of target images.

[0027] In one embodiment, the optimization module is used to minimize the reprojection error of the absolute extrinsic parameters by using bundle adjustment based on the depth information, so as to obtain the optimized absolute extrinsic parameters of the plurality of cameras.

[0028] In one embodiment, the target object is an object that needs to be reconstructed in three dimensions.

[0029] Fifthly, embodiments of this application provide an electronic device, including:

[0030] At least one processor; and

[0031] A memory that is communicatively connected to the at least one processor;

[0032] The memory stores instructions that can be executed by the at least one processor, which, when executed by the at least one processor, cause the electronic device to perform the method described in any of the above aspects.

[0033] Sixthly, embodiments of this application provide a cloud device, including:

[0034] At least one processor; and

[0035] A memory that is communicatively connected to the at least one processor;

[0036] The memory stores instructions that can be executed by the at least one processor, which, when executed by the at least one processor, cause the cloud device to perform the method described in any of the above aspects.

[0037] In a seventh aspect, embodiments of this application provide a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, implement the method described in any of the above aspects.

[0038] Eighthly, embodiments of this application provide a computer program product, including a computer program that, when executed by a processor, implements the method described in any of the above aspects.

[0039] The pose information processing method, device, and storage medium provided in this application are applied to a multi-camera image acquisition system. By dividing the pose information to be processed into the intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters of the cameras, and based on the characteristics of the image acquisition system, the absolute extrinsic parameters of the main camera are used as the shared absolute extrinsic parameters of multiple cameras. Based on the depth information of the target object image, only the absolute extrinsic parameters of the main camera are optimized, reducing the number of feature points required in the calculation of absolute extrinsic parameters, thereby improving the accuracy of pose information estimation for objects with weak textures. It can also reduce the error caused by the relative sliding between the camera and the turntable, and improve the accuracy of pose information. Attached Figure Description

[0040] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application. It is obvious that the drawings described below are some embodiments of the invention, and that those skilled in the art can obtain other drawings based on these drawings without any inventive effort.

[0041] Figure 1 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application;

[0042] Figure 2 This is a schematic diagram illustrating an application scenario of a pose information processing scheme provided in an embodiment of this application.

[0043] Figure 3 This is a schematic diagram of a scene architecture for an image acquisition system provided in an embodiment of this application;

[0044] Figure 4 This is a schematic diagram of a scene architecture for an image acquisition system provided in an embodiment of this application;

[0045] Figure 5 A flowchart illustrating a pose information processing method provided in an embodiment of this application;

[0046] Figure 6 A flowchart illustrating a pose information processing method provided in an embodiment of this application;

[0047] Figure 7 A flowchart illustrating a three-dimensional reconstruction method provided in an embodiment of this application;

[0048] Figure 8 A flowchart illustrating a three-dimensional reconstruction method provided in an embodiment of this application;

[0049] Figure 9 This is a schematic diagram of the structure of a pose information processing device provided in an embodiment of this application;

[0050] Figure 10 This is a schematic diagram of the structure of a cloud device provided in an embodiment of this application.

[0051] The accompanying drawings illustrate specific embodiments of this application, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the concept in any way, but rather to illustrate the concept of this application to those skilled in the art through reference to particular embodiments. Detailed Implementation

[0052] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application.

[0053] In this article, the term "and / or" is used to describe the relationship between related objects. Specifically, it means that there can be three kinds of relationships. For example, A and / or B can mean: A exists alone, A and B exist at the same time, or B exists alone.

[0054] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of the relevant data must comply with the relevant laws, regulations and standards of the relevant countries and regions, and corresponding operation portals are provided for users to choose to authorize or refuse.

[0055] To clearly describe the technical solutions of the embodiments of this application, the terms involved in this application are first defined as follows:

[0056] AR: Augmented Reality.

[0057] 3D: 3-dimensional.

[0058] 2D: 2-dimensional.

[0059] RGB image: A color mode image with three channels: red (R), green (G), and blue (B).

[0060] Depth image: Depth Map.

[0061] MVS: Multi View Stereo, is a stereoscopic technology for multiple views. Figure 3 The term "reconstruction" is a collective term for a series of methods.

[0062] nerf: neural radiation field.

[0063] SFM: Structure from Motion.

[0064] Pose: that is, position and orientation, is the position of a rigid body in space and its own orientation. The pose of a camera is the position of the camera in space and the orientation of the camera. The pose of a camera is also called the camera extrinsic parameter.

[0065] Unmask: Its function is to replace the image values of the masked areas with arbitrary values, while the unmasked areas retain their original values.

[0066] SP: SuperPoint, Self-Supervised Interest Point Detection and Description.

[0067] SG: SuperGlue, Learning Feature Matching with Graph Neural Networks.

[0068] BA: Bundle Adjustment.

[0069] DenseBA: Dense Bundle Adjustment Layer, maps a set of flow field corrections to a set of pose and pixel-level depth updates.

[0070] like Figure 1 As shown, this embodiment provides an electronic device 1, including: at least one processor 11 and a memory 12. Figure 1 Taking a processor as an example, the processor 11 and the memory 12 are connected via a bus 10. The memory 12 stores instructions that can be executed by the processor 11. The instructions are executed by the processor 11 to enable the electronic device 1 to perform all or part of the process of the method in the following embodiments, so as to improve the accuracy of pose information estimation for weakly textured objects and reduce the error caused by the relative sliding between the camera and the turntable, thereby improving the accuracy of pose information.

[0071] In one embodiment, the electronic device 1 may be a mobile phone, tablet computer, laptop computer, desktop computer, or a large computing system composed of multiple computers.

[0072] Figure 2This is a schematic diagram of a scene system 200 providing a pose information processing scheme according to an embodiment of this application. Figure 2 As shown, the system includes: a server 210 and a terminal 220, wherein:

[0073] Server 210 can be a data platform that provides pose information processing services, such as a platform for 3D reconstruction of objects. In a real-world scenario, a 3D reconstruction platform may have multiple servers 210. Figure 2 Taking a single server (210) as an example.

[0074] Terminal 220 can be a computer, mobile phone, tablet, or other device used by the user to log in to the 3D reconstruction platform. There can also be multiple terminals 220. Figure 2 The following example uses two terminals, 220, for illustration.

[0075] Terminal 220 and server 210 can transmit information via the Internet, enabling terminal 220 to access data on server 210. Both terminal 220 and / or server 210 can be implemented by electronic device 1.

[0076] The pose information processing scheme of this application embodiment can be deployed on server 210, on terminal 220, or partially on server 210 and partially on terminal 220. The choice can be made based on actual needs in a real-world scenario, and this embodiment does not impose any limitations.

[0077] When the pose information processing scheme is deployed entirely or partially on the server 210, the call interface can be opened to the terminal 220 to provide algorithm support to the terminal 220.

[0078] The method provided in this application embodiment can be implemented by electronic device 1 executing corresponding software code, and is achieved through data interaction with a server. Electronic device 1 can be a local terminal device. When the method runs on a server, it can be implemented and executed based on a cloud interaction system, which includes a server and client devices.

[0079] In one possible implementation, the method provided by the embodiments of the present invention provides a graphical user interface through a terminal device, wherein the terminal device may be the aforementioned local terminal device or a client device in the aforementioned cloud interaction system.

[0080] The pose information processing method of the embodiments of this application can be applied to any field that requires pose estimation.

[0081] With the development of e-commerce, more and more users are choosing to purchase goods on e-commerce platforms. Users want to see three-dimensional information about products; for example, with books, users might want to see what each surface looks like. Traditional image-based displays have poor interactivity and a bad user experience. With the development of AR technology, using AR to display product information and enable user interaction has become a trend. For example, books for sale can be reconstructed in 3D to generate 3D models, which can then be displayed on e-commerce platforms, allowing buyers to view the book models online from multiple perspectives using AR technology.

[0082] 3D reconstruction refers to establishing a mathematical model of a 3D object suitable for computer representation and processing. It forms the basis for processing, manipulating, and analyzing the properties of objects in a computer environment and is a key technology for creating virtual reality representations of the objective world within a computer. In computer vision, 3D reconstruction refers to the process of reconstructing 3D information from single-view or multi-view images. 3D reconstruction has significant application value in scenarios such as AR (Augmented Reality) and interactive 3D product displays.

[0083] Multi-view camera pose estimation is a necessary input for 3D object reconstruction. It marks the beginning of the transformation from a 2D image to 3D reconstruction, and the accuracy of camera pose information directly affects the success or failure of the reconstruction. In image measurement and machine vision applications, to determine the 3D geometric position of a point on the surface of a spatial object and its corresponding point in the image, a geometric model of the camera imaging must be established. These geometric model parameters are the camera parameters. Under most conditions, these parameters must be obtained through experimentation and calculation; this process of solving for these parameters is called camera calibration.

[0084] like Figure 3 As shown in the figure, an image acquisition system provided in an embodiment of this application includes a turntable and multiple cameras. Wherein:

[0085] A turntable is used to place the target object or calibration object. Figure 3 Taking the Sino-Israeli calibration body as an example.

[0086] Multiple cameras are used to photograph the target object or calibration object placed on the turntable, acquiring image information of the target object or calibration object. For example... Figure 3 As shown, taking cameras C1, C2, C3, C4 and C5 as an example, the five cameras are set on a fixed frame, which can be a vertically placed arc-shaped frame. From top to bottom, cameras C1, C2, C3, C4 and C5 are distributed at different angles to realize multi-view image acquisition of the calibration body or target object on the turntable.

[0087] In real-world scenarios, image acquisition systems typically operate in two phases when capturing images of a type of object: a calibration phase and a measurement phase. The calibration phase involves placing an object with rich texture and a similar volume to the target object to obtain intrinsic and extrinsic parameters. The measurement phase replaces the calibration object with the target object and then uses the extrinsic parameters from the calibration phase to estimate the object's pose. To better describe the pose calculation method, extrinsic parameters can be categorized into relative and absolute extrinsic parameters.

[0088] Calibration phase: First, a calibration body is used to calibrate multiple cameras. Specifically, the calibration body is placed on a turntable, and the focal length of each camera is adjusted for focusing. Then, the turntable is driven to rotate. During the rotation of the calibration body by the turntable, each time it rotates through a certain angle, the five cameras are driven to take a picture of the calibration body once, resulting in a set of calibration images. For example, after rotating and taking pictures all the way, multiple calibration images of the calibration body can be obtained. These multiple calibration images are used to calibrate the five cameras and obtain the intrinsic and extrinsic parameters of the five cameras.

[0089] After the calibration phase is completed, the measurement phase can proceed. For example... Figure 4 The diagram shows a schematic of the image acquisition system during the measurement phase. In this phase, the calibration body is removed from the turntable, and the target object (which can be the object to be reconstructed in 3D) is placed on the turntable. The turntable is then driven to rotate at the same speed as in the calibration phase, and at the same angular intervals. As the turntable rotates, the calibration body rotates, and every certain angle, five cameras capture an image of the target object, resulting in a set of target images. After one complete rotation, multiple target images are obtained. Based on these multiple images and the intrinsic and extrinsic parameters from the calibration phase, the pose of the target object is estimated, yielding pose information. This pose information includes the intrinsic parameters of the five cameras (e.g., focal length, principal point of the image plane), relative extrinsic parameters, and absolute extrinsic parameters. The relative intrinsic parameters characterize the relative positional relationships between the multiple cameras, such as... Figure 3 In this context, relative extrinsic parameters characterize the relative positional relationships between the five cameras. Absolute extrinsic parameters characterize the relative positional relationships between multiple cameras and the turntable, such as... Figure 3 In this system, there is also a positional relationship between the five cameras and the turntable. The absolute extrinsic parameter is a parameter that characterizes the relative positional relationship between the five cameras and the turntable.

[0090] Traditional camera systems, such as structured light systems or vehicle-mounted platforms, have no variables in their calibration systems. Once calibration is complete, the entire system is in a relatively stable state. If the system changes, recalibration or fine-tuning is required. However, in the image acquisition system described in the embodiments of this application, the turntable involves a movement process. Although this movement process can be strictly guaranteed to have the starting and ending points consistent through a gear system, gears also have tolerance and precision issues during movement, making it difficult to guarantee that they will stop at exactly the same position every time.

[0091] In one embodiment, to improve pose accuracy during the measurement phase, an image-based pose optimization method can be added. However, pose estimation failures still occur for small or weakly textured objects. For example, while joint optimization of the poses of five cameras has a high success rate, the lack of high-precision camera intrinsic parameters leads to unsatisfactory pose optimization results.

[0092] To address the issue of low intrinsic parameter accuracy, traditional camera calibration boards can be used for calibration, yielding good intrinsic and relative extrinsic parameters. However, for thinner target objects, such as glass plates (which are typically thin and borderless), obtaining absolute extrinsic parameters becomes more challenging.

[0093] In one embodiment, a calibration plate or calibration body can be used to calibrate the entire image acquisition system, and a high-precision instrument can be added to the turntable to measure the rotation of the turntable in order to obtain absolute external parameters. This method has high requirements for the turntable measuring instrument and limits its applicability.

[0094] In real-world scenarios, the estimation of camera pose depends on the texture characteristics of the object's surface. For objects with obvious texture features, the pose estimation will be more accurate. However, for objects with weak texture features, there are too few textures to extract enough feature points, resulting in inaccurate pose estimation and consequently, inaccurate reconstruction.

[0095] To address the aforementioned issues, this application provides a pose information processing scheme applied to a multi-camera image acquisition system. By dividing the pose information to be processed into the intrinsic, relative, and absolute extrinsic parameters of the cameras, and based on the characteristics of the image acquisition system, the absolute extrinsic parameter of the main camera is used as a shared absolute extrinsic parameter for multiple cameras. Based on the depth information of the target object image, only the absolute extrinsic parameter of the main camera is optimized, reducing the number of feature points required during the calculation of the absolute extrinsic parameter. This improves the accuracy of pose information estimation for objects with weak textures and reduces the error caused by the relative sliding between the camera and the turntable, thereby improving the accuracy of the pose information.

[0096] The following detailed description of some embodiments of this application is provided in conjunction with the accompanying drawings. Where there is no conflict between the embodiments, the following embodiments and features can be combined with each other. Furthermore, the timing of the steps in the following method embodiments is merely an example and not a strict limitation.

[0097] Please refer to Figure 5 This is a pose information processing method according to an embodiment of this application. The method can be performed by... Figure 1 The electronic device 1 shown is used to perform this action and can be applied to... Figures 2 to 4 In the 3D reconstruction application scenario shown, the aim is to improve the accuracy of pose information estimation for objects with weak textures and reduce errors caused by relative sliding between the camera and the turntable, thereby improving the accuracy of pose information. This embodiment uses terminal 220 as the execution end as an example, and the method includes the following steps:

[0098] Step 501: Obtain the pose information to be processed from multiple cameras in the image acquisition system. The pose information to be processed includes: intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters of the predetermined master camera among the multiple cameras. The relative extrinsic parameters represent the relative positional relationship between the multiple cameras, and the absolute extrinsic parameters represent the relative positional relationship between the master camera and the turntable.

[0099] In this step, as described above Figure 3 or Figure 4 Taking the image acquisition system shown as an example, the pose information to be processed can be the pose information of multiple cameras in the image acquisition system during the calibration phase. The main camera can be pre-selected from multiple cameras, for example, it can be selected from... Figure 3 Camera C2 serves as the main camera. The selection of the main camera can be based on actual needs, and this embodiment does not impose any limitations. Taking the aforementioned image acquisition system as an example, the pose information to be processed may include intrinsic parameters (such as focal length, principal point of the image plane, etc.), relative extrinsic parameters, and absolute extrinsic parameters of five cameras. The relative intrinsic parameters characterize the relative positional relationships between the multiple cameras, for example... Figure 3 In this context, relative extrinsic parameters characterize the relative positional relationships between the five cameras. Absolute extrinsic parameters characterize the relative positional relationships between multiple cameras and the turntable, such as... Figure 3 In this system, there is also a positional relationship between the five cameras and the turntable. Absolute extrinsic parameters are parameters characterizing the relative positional relationship between the five cameras and the turntable. In a real-world scenario, the mounting frame is fixed, and the relative positions of cameras C1, C2, C3, C4, and C5 fixed on the frame are also fixed. Therefore, the relative extrinsic parameters of the five cameras remain unchanged during the calibration and measurement phases. However, the turntable rotates, and during the calibration and measurement phases, the positional relationship between the turntable and the cameras may undergo slight changes. These slight changes affect the absolute extrinsic parameters of the cameras, thus requiring optimization of the absolute extrinsic parameters.

[0100] In one embodiment, multiple cameras can be paired to calculate absolute extrinsic parameters. Assuming a 5-point method is used, each camera needs to calculate at least 5 feature point pairs to obtain the absolute extrinsic parameters. Calculating the absolute extrinsic parameters for each of the 5 cameras individually results in a massive amount of data and requires many feature point pairs, which is unsuitable for objects with weak textures. In real-world scenarios, the positions of the 5 stationary cameras and the turntable change in a consistent manner. Therefore, one camera can be selected as the master camera. When calculating the absolute extrinsic parameters, only the master camera's absolute extrinsic parameters are calculated to determine the absolute extrinsic parameters of the other cameras. This reduces the number of feature points required during the absolute extrinsic parameter calculation process, thereby improving the accuracy of pose estimation for objects with weak textures.

[0101] In one embodiment, step 501 may specifically include: using multiple cameras to acquire multiple calibration images of a preset calibration body placed on a turntable; calibrating the multiple cameras based on the multiple calibration images to obtain the pose information of the multiple cameras to be processed.

[0102] In this embodiment, the preset calibration body corresponds to the target object that needs to be reconstructed in 3D. For example, it can be a calibration body similar in size to the target object. The calibration body can contain image markers with rich textures to ensure that the camera intrinsic parameters and relative extrinsic parameters can be successfully estimated. During the calibration stage, the positions of the five cameras on the fixed frame are adjusted according to the type and size of the target object. After adjustment, the preset calibration body is placed on the turntable, and the focal length of each camera is adjusted for focusing. Then, the turntable is driven to rotate. During the rotation of the calibration body by the turntable, the five cameras are driven to take a picture of the calibration body every time it rotates through a certain angle, resulting in a set of calibration images. For example, after rotating and taking pictures all the way, multiple calibration images of the calibration body can be obtained. These multiple calibration images are used to calibrate the five cameras to obtain the pose information of the five cameras to be processed.

[0103] In one embodiment, step 501, which calibrates multiple cameras based on multiple calibration images to obtain the pose information to be processed for the multiple cameras, may specifically include: extracting first image features from the multiple calibration images; performing feature matching on the multiple calibration images based on the first image features; determining the initial pose information of the multiple cameras based on the first feature matching results of the multiple calibration images, the initial pose information including: intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters of the multiple cameras; and performing reprojection error minimization processing on the initial pose information to obtain the pose information to be processed for the multiple cameras.

[0104] In this embodiment, during the calibration stage, multiple calibration images can be unmasked to make the textures of the calibrated objects easier to distinguish. Then, a SP-based feature extraction network can be used to extract the first image features from the multiple calibration images. These first image features characterize the texture features of the calibrated objects in the calibration images. SP uses an unsupervised training method to train a network for extracting image features and feature descriptors, resulting in more accurate image feature extraction. Then, feature matching is performed on the multiple calibration images based on the first image features. A SU-based feature matching network can be used for feature matching on the multiple calibration images. The SU-based feature matching network provides a 2D feature point matching method that can simulate the back-and-forth browsing process during human matching, resulting in more accurate first feature matching results. Based on the first feature matching results, keypoint adjustment and / or dense bundle adjustment methods can be used to calculate the initial pose information of multiple cameras for the calibration images in each camera directory. The initial pose information includes at least the intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters of the main camera for the multiple cameras. The initial pose information may contain noise. Further optimization processing can be performed on the initial pose information of multiple cameras. Specifically, the initial pose information can be processed to minimize the reprojection error, thereby obtaining the pose information to be processed from multiple cameras, reducing the influence of noise, and improving the accuracy of the pose information to be processed.

[0105] In one embodiment, step 501, which involves minimizing the reprojection error of the initial pose information to obtain the pose information to be processed for multiple cameras, may further include: performing image distortion correction processing on the initial pose information of the multiple cameras. The distortion-corrected initial pose information is then processed using bundle adjustment to minimize the reprojection error, thereby obtaining the pose information to be processed for multiple cameras.

[0106] In this embodiment, distortion is a type of aberration that creates an illusion (e.g., a straight line becomes a curve), but the image information is not lost. The presence of distortion leads to inaccurate subsequent calculation results. During the calibration stage, after obtaining the initial pose information of multiple cameras, distortion removal processing can be performed on the calibration images before bundle adjustment. Specifically, the distortion coefficients of multiple cameras can be obtained in advance. Distortion removal processing is then performed on the calibration images captured by each camera based on these camera distortion coefficients. The camera distortion coefficients can be obtained from the camera's parameter specifications or by performing distortion calibration on each camera using a predetermined distortion calibration method, such as the Zhang Zhengyou calibration method. This embodiment does not limit the method of obtaining the distortion coefficients. The multiple calibration images after distortion removal contain more realistic and accurate information, reducing the impact of image distortion on the pose information results and improving data accuracy.

[0107] Then, the initial pose information after distortion correction is processed using bundle adjustment (BA) with a 5-camera joint optimization approach. This minimizes the reprojection error of the initial pose information, resulting in pose information to be processed from multiple cameras. This processed pose information includes optimized camera intrinsic parameters, relative extrinsic parameters, and the absolute extrinsic parameters of the master camera. In this process, the five cameras share the absolute extrinsic parameters of the master camera. Therefore, the number of absolute extrinsic parameters is reduced to one-fifth of the number calculated by each of the five cameras individually, significantly reducing computation time and solution difficulty. This not only reduces the number of feature points required in the calculation of absolute extrinsic parameters, improving the accuracy of pose estimation for objects with weak textures, but also reduces errors caused by relative sliding between the camera and the turntable, further improving the accuracy of pose information.

[0108] Step 502: Acquire multiple target images of the target object. The multiple target images are obtained by using multiple cameras to capture images of the target object placed on the turntable based on the pose information to be processed.

[0109] In this step, the target object can be any object requiring 3D reconstruction, such as a product requiring AR interaction in an e-commerce scenario. Multiple target images are multi-view images of the target object. During the measurement phase, the calibration body can be removed from the turntable, and the target object placed on the turntable. Then, the turntable is driven to rotate at the same speed as in the calibration phase, and at the same angular intervals as in the calibration phase. During the rotation of the calibration body, each time the turntable rotates a certain angle, five cameras are driven to take a picture of the target object, resulting in a set of target images. After one complete rotation and shooting, multiple target images of the target object can be obtained. Here, the five cameras use the same parameters as in the calibration phase, meaning the five cameras capture the target object according to the pose information determined in the calibration phase. Multiple target images can be directly read from the memory of the five cameras. Alternatively, multiple target images can be stored on a remote server, and the execution terminal can retrieve multiple target images from the remote server.

[0110] Step 503: Determine the depth information of multiple target images based on the intrinsic and relative extrinsic parameters of multiple cameras.

[0111] In this step, depth information refers to the grayscale value of each pixel in the image, which can be used to characterize the distance of a point in the scene from the camera. An image containing depth information can be called a depth image. Based on the camera intrinsic parameters and relative extrinsic parameters, the depth information of multiple target images can be determined, and thus the corresponding depth image can be obtained.

[0112] In one embodiment, step 503 may specifically include: extracting second image features from multiple target images; performing feature matching on the multiple target images based on the second image features; and determining the depth information of the multiple target images based on the second feature matching results, the intrinsic parameters and relative extrinsic parameters of multiple cameras.

[0113] In this embodiment, to improve the accuracy of depth information, the target images can be preprocessed before calculating the depth information. Specifically, multiple target images can be unmasked first to make the textures of target objects in the target images easier to distinguish. Then, a SP-based feature extraction network can be used to extract second image features from multiple target images. The second image features can characterize the texture features of target objects in the target images. SP uses an unsupervised method to train a network for extracting image features and feature descriptors, which can make the perceived texture features more accurate. Then, feature matching is performed on multiple target images based on the second image features of the target images. A SU-based feature matching network can be used to perform feature matching on multiple target images. The SU-based feature matching network provides a 2D feature point matching method that can simulate the process of human browsing back and forth when matching, and obtain more accurate second feature matching results. Then, based on the second feature matching results, the intrinsic and relative extrinsic parameters of multiple cameras, the depth information of multiple target images is determined, and corresponding multiple depth images are obtained.

[0114] In one embodiment, step 503, before extracting the second image features of the multiple target images, may include: performing distortion correction processing on the multiple target images based on the intrinsic parameters of the multiple cameras.

[0115] In this embodiment, the target images captured during the measurement phase may contain considerable noise due to scene and hardware influences. This noise affects the camera's imaging performance, often leading to image distortion. Distortion, a type of aberration, creates a perceptual illusion (i.e., straight lines become curves), but the image information is not lost. However, the presence of distortion can lead to inaccurate feature extraction results. Therefore, before extracting features from the target images, distortion correction processing can be performed on multiple target images. Specifically, the distortion coefficients of multiple cameras can be pre-obtained, and distortion correction processing can be performed on the target images captured by each camera based on these coefficients. These camera distortion coefficients can be obtained from the camera's specifications or through a predetermined distortion calibration method, such as the Zhang Zhengyou calibration method. The distortion-corrected target images contain more accurate and realistic information, providing accurate input data for the subsequent feature extraction process and improving the accuracy of the feature extraction results.

[0116] In one embodiment, the step 503 of determining the depth information of multiple target images based on the second feature matching results of multiple target images and the intrinsic and relative extrinsic parameters of multiple cameras may specifically include: performing triangulation positioning processing on preset feature points based on the second feature matching results and the intrinsic and relative extrinsic parameters of multiple cameras to obtain the depth information of preset feature points in multiple target images.

[0117] In this embodiment, the preset feature point can be a three-dimensional spatial point. The preset feature points and their number can be selected based on actual needs, and this embodiment does not impose any limitations. The depth information of the preset feature point in the target image can be the spatial three-dimensional coordinates of the preset feature point. The depth information of the target image can be obtained by using the triangulation localization method. Triangulation processing can recover the 3D coordinates of the feature point based on the projection of the feature point under multiple cameras. Therefore, triangulation can be used to estimate the distance of pixels in the target image. That is, by using the position of a feature point P in the pixel of the target image in different frames, the coordinates of the feature point in three-dimensional space can be calculated, which is to obtain the depth information of the feature point P. Specifically, when a feature point is observed in a certain camera, an observation "ray" emanating from the center of the camera can be obtained in 3D space based on the camera intrinsic parameters, relative extrinsic parameters, and observation vector. Multiple camera pose observations will generate multiple observation rays. Ideally, these observation rays intersect at a point in space. The intersection of all observation rays is the position of the feature point in 3D space. By performing triangulation localization processing on the preset feature point, the depth information of the preset feature point in multiple target images is obtained.

[0118] Step 504: Minimize the reprojection error of the absolute extrinsic parameters based on the depth information to obtain the optimized absolute extrinsic parameters. The optimized absolute extrinsic parameters, relative extrinsic parameters, and intrinsic parameters of multiple cameras are determined as the final pose information of multiple cameras.

[0119] In this step, the reprojection error refers to the difference between the projection of a real 3D point onto the image plane (pixels in the image) and the reprojection (calculated virtual pixels). In real-world scenarios, due to various noise interferences, the calculated value generally does not perfectly match the actual situation; that is, this difference is generally not exactly zero. Therefore, it is necessary to minimize the sum of these differences to obtain the optimal camera pose parameters and the coordinates of the 3D points. In step 501, the intrinsic and relative extrinsic parameters of multiple cameras and the absolute extrinsic parameter of the main camera have been obtained. Based on the characteristics of the image acquisition system, the intrinsic and relative extrinsic parameters of the five cameras remain unchanged during the measurement phase compared to the calibration phase. However, the rotation of the turntable will cause changes in the absolute extrinsic parameters. Therefore, the absolute extrinsic parameters can be further optimized during the measurement phase to obtain more accurate absolute extrinsic parameters. In one embodiment, the absolute extrinsic parameters of the five cameras can be optimized and calculated separately during the measurement phase, resulting in a relatively large amount of data. This is not suitable for objects with weak texture information. Therefore, during the measurement phase, the absolute extrinsic parameters of the main camera can be used as the shared absolute extrinsic parameters of the five cameras. For example, camera C2 can be used as the main camera. With the camera's intrinsic and relative extrinsic parameters fixed, the absolute extrinsic parameters of the main camera C2 are optimized by minimizing the reprojection error based on the depth information of the target image. This optimized absolute extrinsic parameter can then be used as the shared absolute extrinsic parameter of the five cameras. The optimized absolute extrinsic parameter, the corresponding relative extrinsic parameters of each camera, and the intrinsic parameters of each camera are then used to determine the final pose information of the five cameras. In this way, at least five point pairs are needed among the five cameras to optimize the absolute extrinsic parameters, significantly reducing the requirement for the number of matching feature points and greatly reducing the computational load. This approach is not only more suitable for targets with extremely weak textures and thin surfaces but also improves computational efficiency.

[0120] In one embodiment, step 504 may specifically include: minimizing the reprojection error of the absolute extrinsic parameters by using bundle adjustment based on the depth information, thereby obtaining the optimized absolute extrinsic parameters of multiple cameras.

[0121] In this embodiment, bundle adjustment can be used to optimize the absolute extrinsic parameters. Bundle adjustment uses a beam of light composed of an image as the basic unit of adjustment, and the collinearity equation of the central projection as the basic equation of adjustment. By rotating and translating each beam of light in space, the light rays of common points between models achieve optimal intersection and the entire region is optimally incorporated into the known control point coordinate system. Bundle adjustment uses the collinearity equation as the mathematical model. The observed image plane coordinates of the image points are nonlinear functions of unknowns. After linearization, the calculation is performed according to the least squares principle. This calculation iterates successively to approach the optimal value based on an approximate solution. Using the depth information of the preset feature points on the target image obtained in step 503, bundle adjustment is used to minimize the reprojection error of the absolute extrinsic parameters of the main camera, thus obtaining the optimized absolute extrinsic parameters of multiple cameras. The optimized absolute extrinsic parameters reduce the influence of noise interference and improve the accuracy of the final pose information of the cameras.

[0122] In one embodiment, after obtaining the final pose information in step 504, some methods can be used to verify the validity of the result in order to ensure that the final pose information is more accurate.

[0123] In the experiment, for thinner target objects, such as those with the thinnest part only a few sheets of paper thick, the method of the present application embodiment can successfully optimize the pose information and thus successfully perform accurate 3D reconstruction. The obtained 3D reconstruction result has clear frontal texture without ghosting.

[0124] The aforementioned pose information processing method is applied to a multi-camera image acquisition system. By dividing the pose information to be processed into camera intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters, and based on the characteristics of the image acquisition system, the absolute extrinsic parameters of the main camera are used as shared absolute extrinsic parameters for multiple cameras. Based on the depth information of the target object image, optimization is performed only on the absolute extrinsic parameters of the main camera, reducing the number of feature points required in the calculation of absolute extrinsic parameters. This improves the accuracy of pose information estimation for objects with weak textures, significantly increasing the success rate. Furthermore, it reduces errors caused by relative sliding between the camera and the turntable, improving the accuracy of pose information. It solves the problem of slight sliding between the object and the turntable, and even with slight inconsistencies between the calibration object and the target object, relatively good pose reconstruction results can be obtained.

[0125] Please refer to Figure 6 This is a pose information processing method according to an embodiment of this application. The method can be performed by... Figure 1 The electronic device 1 shown is used to perform this action and can be applied to... Figures 2 to 4In the 3D reconstruction application scenario shown, the aim is to improve the accuracy of pose information estimation for objects with weak textures and reduce errors caused by relative sliding between the camera and the turntable, thereby improving the accuracy of pose information. This embodiment uses terminal 220 as the execution end as an example. Compared with the previous embodiment, this embodiment uses different pose optimization strategies in the calibration and measurement stages. Based on the principle that the camera's intrinsic and relative extrinsic parameters remain unchanged, pose calculation can also be divided into two processes: a calibration stage and a measurement stage. This method includes the following steps:

[0126] Calibration phase:

[0127] Step 601: Use multiple cameras to acquire multiple calibration images of a preset calibration object placed on a turntable.

[0128] Step 602: Extract the first image features from multiple calibration images.

[0129] Step 603: Perform feature matching on multiple calibration images based on the features of the first image.

[0130] Step 604: Based on the first feature matching results of multiple calibration images, determine the initial pose information of multiple cameras. The initial pose information includes: the intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters of the multiple cameras and the main camera.

[0131] Step 605: Perform image distortion correction processing on the initial pose information of multiple cameras.

[0132] Step 606: Minimize the reprojection error of the initial pose information after distortion correction using bundle adjustment to obtain the pose information to be processed from multiple cameras.

[0133] Measurement phase:

[0134] Step 607: Using multiple cameras, acquire multiple target images of the target object placed on the turntable based on the pose information to be processed.

[0135] Step 608: Perform distortion correction processing on multiple target images based on the intrinsic parameters of multiple cameras.

[0136] Step 609: Extract the second image features from multiple target images.

[0137] Step 610: Perform feature matching on multiple target images based on the second image features.

[0138] Step 611: Based on the second feature matching result, the intrinsic parameters and relative extrinsic parameters of multiple cameras, perform triangulation localization processing on the preset feature points to obtain the depth information of the preset feature points in multiple target images.

[0139] Step 612: Based on the depth information, the absolute extrinsic parameters are minimized by using bundle adjustment to obtain optimized absolute extrinsic parameters for multiple cameras.

[0140] The aforementioned pose information processing method achieves rapid pose calculation and optimization through a two-stage multi-camera joint optimization approach, solving the problem of reconstructing weak textures and ultra-thin objects. Furthermore, it eliminates the need for high-precision turntable calibration; high-precision extrinsic parameter calculation can be achieved solely through two joint optimization stages: calibration and measurement. In practical use, the calculation time has been reduced from 20 minutes to approximately 20 seconds. Memory usage has been optimized from 40GB to a few hundred megabytes, significantly improving data processing efficiency.

[0141] For details of each step of the above pose information processing method, please refer to the relevant descriptions of the foregoing embodiments, which will not be repeated here.

[0142] Please refer to Figure 7 This is a three-dimensional reconstruction method according to an embodiment of this application. The method can be derived from... Figure 1 The electronic device 1 shown is used to perform this action and can be applied to... Figures 2 to 4 In the 3D reconstruction application scenario shown, the aim is to improve the accuracy of pose information estimation for objects with weak textures, reduce errors caused by relative sliding between the camera and the turntable, improve the accuracy of pose information, and thus improve the precision of 3D reconstruction. This embodiment uses terminal 220 as the execution end as an example. Compared with the previous embodiment, this embodiment takes a 3D reconstruction scenario based on multi-view images acquired by the aforementioned image acquisition system as an example. The method includes the following steps:

[0143] Step 701: In response to the 3D reconstruction request of the target object, acquire the pose information to be processed from multiple cameras in the image acquisition system. The pose information to be processed includes: intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters of multiple cameras, wherein the relative extrinsic parameters represent the relative positional relationship between the multiple cameras, and the absolute extrinsic parameters represent the relative positional relationship between the multiple cameras and the turntable.

[0144] In this step, a 3D reconstruction request for the target object can be triggered by the user or automatically triggered by predetermined conditions. For example, in an e-commerce scenario, a merchant or relevant staff member may trigger a 3D reconstruction request for a certain product. In response to the 3D reconstruction request, the pose information to be processed from multiple cameras in the image acquisition system is first obtained. Here, the pose information to be processed is the pose information corresponding to the target object. It can be the pose information obtained in advance based on a preset calibration body by calibrating multiple cameras. For details, please refer to the description of the pose information to be processed in the previous embodiment, which will not be repeated here.

[0145] Step 702: Acquire multiple target images of the target object. These multiple target images are obtained by using multiple cameras to capture images of the target object placed on the turntable based on the pose information to be processed. For details, please refer to the description of step 502 in the foregoing embodiments.

[0146] Step 703: Determine the depth information of multiple target images based on the intrinsic and relative extrinsic parameters of multiple cameras. For details, please refer to the description of step 503 in the foregoing embodiments.

[0147] Step 704: Minimize the reprojection error of the absolute extrinsic parameters based on the depth information to obtain optimized absolute extrinsic parameters. The optimized absolute extrinsic parameters, relative extrinsic parameters, and intrinsic parameters of the multiple cameras are then used to determine the final pose information of the multiple cameras. For details, please refer to the description of step 504 in the foregoing embodiments.

[0148] Step 705: Based on the final pose information and multiple target images, perform 3D reconstruction of the target object to generate a 3D virtual model of the target object.

[0149] In this step, multiple target images, including multi-view images of the target object, can be used for stereo matching. For example, based on the SFM algorithm, matching feature points can be calculated pairwise for adjacent images. Generally, two images are first used for reconstruction to calculate an initial point cloud, and then subsequent images are continuously added for reconstruction. After calculating the matching feature points, the final pose information can be combined to perform 3D reconstruction of the target object, resulting in a 3D virtual model of the target item.

[0150] For example, if the target object is a packaging bottle in an e-commerce scenario, the packaging bottle can be reconstructed in three dimensions based on the above data to generate a three-dimensional virtual model of the packaging bottle. This three-dimensional virtual model of the packaging bottle can then be displayed on the e-commerce shopping platform for buyers to view, thereby improving the user's interactive experience.

[0151] For details of each step of the above-described three-dimensional reconstruction method, please refer to the relevant descriptions of the foregoing embodiments, which will not be repeated here.

[0152] Please refer to Figure 8 This is a three-dimensional reconstruction method according to an embodiment of this application. The method can be derived from... Figure 1 The electronic device 1 shown is used to perform this action and can be applied to... Figures 2 to 4In the 3D reconstruction application scenario shown, the aim is to improve the accuracy of pose information estimation for objects with weak textures, reduce errors caused by relative sliding between the camera and the turntable, improve the accuracy of pose information, and thus improve the precision of 3D reconstruction. This embodiment uses terminal 220 as the execution end as an example. Compared with the previous embodiment, this embodiment takes an e-commerce scenario, using goods as the target object, and uses the aforementioned image acquisition system to perform 3D reconstruction of the goods as an example. The method includes the following steps:

[0153] Step 801: In response to the 3D reconstruction request for the target product, acquire the pose information to be processed from multiple cameras in the image acquisition system. The pose information to be processed includes: intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters of multiple cameras, where the relative extrinsic parameters represent the relative positional relationship between the multiple cameras, and the absolute extrinsic parameters represent the relative positional relationship between the multiple cameras and the turntable.

[0154] Step 802: Acquire multiple target images of the target product. The multiple target images are obtained by using multiple cameras to capture images of the target product placed on the turntable based on the pose information to be processed.

[0155] Step 803: Determine the depth information of multiple target images based on the intrinsic and relative extrinsic parameters of multiple cameras.

[0156] Step 804: Minimize the reprojection error of the absolute extrinsic parameters based on the depth information to obtain the optimized absolute extrinsic parameters. The optimized absolute extrinsic parameters, relative extrinsic parameters, and intrinsic parameters of multiple cameras are determined as the final pose information of multiple cameras.

[0157] Step 805: Based on the final pose information and multiple target images, perform 3D reconstruction of the target product to generate a 3D virtual model of the target product.

[0158] For details of each step of the above-described three-dimensional reconstruction method, please refer to the relevant descriptions of the foregoing embodiments, which will not be repeated here.

[0159] Please refer to Figure 9 This is a pose information processing device 900 according to an embodiment of this application, which can be applied to... Figure 1 The electronic device 1 shown can be applied to... Figures 2 to 4 In the 3D reconstruction application scenario shown, the aim is to improve the accuracy of pose information estimation for objects with weak textures and reduce errors caused by relative sliding between the camera and the turntable, thereby improving the accuracy of pose information. The device includes: a first acquisition module 901, a second acquisition module 902, a determination module 903, and an optimization module 904. The functional principles of each module are as follows:

[0160] The first acquisition module 901 is used to acquire the pose information to be processed of multiple cameras in the image acquisition system. The pose information to be processed includes: intrinsic parameters, relative extrinsic parameters and absolute extrinsic parameters of a predetermined master camera among the multiple cameras. The relative extrinsic parameters represent the relative positional relationship between the multiple cameras and the absolute extrinsic parameters represent the relative positional relationship between the master camera and the turntable.

[0161] The second acquisition module 902 is used to acquire multiple target images of the target object. The multiple target images are obtained by using multiple cameras to capture images of the target object placed on the turntable based on the pose information to be processed.

[0162] The determination module 903 is used to determine the depth information of multiple target images based on the intrinsic and relative extrinsic parameters of multiple cameras.

[0163] The optimization module 904 is used to minimize the reprojection error of the absolute extrinsic parameters based on the depth information to obtain the optimized absolute extrinsic parameters. The optimized absolute extrinsic parameters, relative extrinsic parameters, and intrinsic parameters of multiple cameras are determined as the final pose information of multiple cameras.

[0164] In one embodiment, the first acquisition module 901 is used to acquire multiple calibration images of a preset calibration body placed on a turntable using multiple cameras. The multiple cameras are calibrated based on the multiple calibration images to obtain the pose information of the multiple cameras to be processed.

[0165] In one embodiment, the first acquisition module 901 is specifically used to extract first image features from multiple calibration images. Feature matching is performed on the multiple calibration images based on the first image features. Based on the first feature matching results of the multiple calibration images, initial pose information of multiple cameras is determined. The initial pose information includes: intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters of the multiple cameras. Minimization of reprojection error processing is applied to the initial pose information to obtain the pose information of the multiple cameras to be processed.

[0166] In one embodiment, the first acquisition module 901 is specifically used to perform image distortion correction processing on the initial pose information of multiple cameras. The initial pose information after distortion correction is processed using bundle adjustment to minimize reprojection error, thereby obtaining the pose information of the multiple cameras to be processed.

[0167] In one embodiment, a determining module 903 is used to extract second image features from multiple target images. Based on the second image features, feature matching is performed on the multiple target images. Based on the second feature matching results of the multiple target images, and the intrinsic and relative extrinsic parameters of multiple cameras, the depth information of the multiple target images is determined.

[0168] In one embodiment, the determining module 903 is specifically used to perform triangulation positioning processing on preset feature points based on the second feature matching result, the intrinsic parameters and relative extrinsic parameters of multiple cameras, to obtain the depth information of the preset feature points in multiple target images.

[0169] In one embodiment, the determining module 903 is further configured to perform distortion correction processing on the multiple target images based on the intrinsic parameters of the multiple cameras before extracting the second image features of the multiple target images.

[0170] In one embodiment, the optimization module 904 is used to minimize the reprojection error of the absolute extrinsic parameters by using bundle adjustment based on the depth information, so as to obtain the optimized absolute extrinsic parameters of multiple cameras.

[0171] In one embodiment, the target object is the object that needs to be reconstructed in three dimensions.

[0172] For a detailed description of the pose information processing device 900 described above, please refer to the description of the relevant method steps in the above embodiments. The implementation principle and technical effect are similar, and will not be repeated here in this embodiment.

[0173] Figure 10 This is a schematic diagram of the structure of a cloud device 100 provided as an exemplary embodiment of this application. The cloud device 100 can be used to run the methods provided in any of the above embodiments. Figure 10 As shown, the cloud device 100 may include: a memory 1004 and at least one processor 1005. Figure 10 Let's take a processor as an example.

[0174] The storage device 1004 is used to store computer programs and can be configured to store various other data to support operations on the cloud device 100. The storage device 1004 may be object storage (OSS).

[0175] The memory 1004 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk or optical disk.

[0176] The processor 1005, coupled to the memory 1004, is used to execute the computer program in the memory 1004 to implement the solution provided in any of the above method embodiments. The specific functions and technical effects that can be achieved will not be described here.

[0177] Furthermore, such as Figure 10The cloud device also includes other components such as firewall 1001, load balancer 1002, communication component 1006, and power supply component 1003. Figure 10 The diagram only shows some components and does not mean that cloud devices only include... Figure 10 The components shown.

[0178] In one embodiment, the above Figure 10 The communication component 1006 is configured to facilitate wired or wireless communication between the device containing the communication component 1006 and other devices. The device containing the communication component 1006 can access wireless networks based on communication standards, such as WiFi, 2G, 3G, 4G, LTE (Long Term Evolution), 5G, or combinations thereof. In one exemplary embodiment, the communication component 1006 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 1006 also includes a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID), Infrared Data Association (IrDA), Ultra Wide Band (UWB), Bluetooth, and other technologies.

[0179] In one embodiment, the above Figure 10 The power supply component 1003 provides power to various components of the device in which it resides. The power supply component 1003 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the device in which the power supply component resides.

[0180] This application also provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, implement the method of any of the foregoing embodiments.

[0181] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the method of any of the foregoing embodiments.

[0182] In the several embodiments provided in this application, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For instance, the division of modules is only a logical functional division, and there may be other division methods in actual implementation. For example, multiple modules may be combined or integrated into another system, or some features may be ignored or not executed.

[0183] The integrated modules described above, implemented as software functional modules, can be stored in a computer-readable storage medium. These software functional modules, stored in a storage medium, include several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute some steps of the methods of the various embodiments of this application.

[0184] It should be understood that the aforementioned processor can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), etc. A general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in the application can be directly manifested as execution by a hardware processor, or execution by a combination of hardware and software modules within the processor. The memory may include high-speed RAM (Random Access Memory), and may also include non-volatile memory (NVM), such as at least one disk storage device, and may also be a USB flash drive, external hard drive, read-only memory, disk, or optical disc, etc.

[0185] The aforementioned storage media can be implemented from any type of volatile or non-volatile storage device or a combination thereof, such as Static Random-Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk. The storage media can be any available medium accessible to general-purpose or special-purpose computers.

[0186] An exemplary storage medium is coupled to a processor, enabling the processor to read information from and write information to the storage medium. Alternatively, the storage medium can be an integral part of the processor. Both the processor and the storage medium can reside in an Application Specific Integrated Circuit (ASIC). Alternatively, the processor and storage medium can exist as discrete components in an electronic device or host device.

[0187] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.

[0188] The sequence numbers of the embodiments in this application are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0189] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods of the various embodiments of this application.

[0190] The collection, storage, use, processing, transmission, provision, and disclosure of user data and other information involved in the technical solution of this application all comply with the provisions of relevant laws and regulations and do not violate public order and good morals.

[0191] The above are merely preferred embodiments of this application and do not limit the patent scope of this application. Any equivalent structural or procedural transformations made using the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.

Claims

1. A method for processing pose information, characterized in that, The method is applied to an image acquisition system, which includes a turntable and multiple cameras; the image acquisition system includes: The multiple cameras are used to acquire multiple calibration images of a preset calibration object placed on the turntable; Extract the first image features from the plurality of calibration images, and perform feature matching on the plurality of calibration images based on the first image features; Based on the first feature matching results of the plurality of calibration images, the initial pose information of the plurality of cameras is determined. The initial pose information includes: the intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters of the plurality of cameras, and the predetermined master camera among the plurality of cameras. The relative extrinsic parameters represent the relative positional relationship between the plurality of cameras, and the absolute extrinsic parameters represent the relative positional relationship between the master camera and the turntable. The initial pose information is processed by minimizing the reprojection error to obtain the pose information to be processed from the multiple cameras; Acquire multiple target images of the target object, wherein the multiple target images are obtained by using the multiple cameras to capture images of the target object placed on the turntable based on the pose information to be processed; The depth information of the multiple target images is determined based on the intrinsic parameters of the multiple cameras and the relative extrinsic parameters; The absolute extrinsic parameters are processed to minimize the reprojection error based on the depth information to obtain optimized absolute extrinsic parameters. The optimized absolute extrinsic parameters, the relative extrinsic parameters, and the intrinsic parameters of the multiple cameras are determined as the final pose information of the multiple cameras.

2. The method according to claim 1, characterized in that, The step of minimizing the reprojection error of the initial pose information to obtain the pose information to be processed from the multiple cameras includes: Image distortion correction processing is performed on the initial pose information of the multiple cameras; The initial pose information after distortion correction is processed by bundle adjustment to minimize the reprojection error, thereby obtaining the pose information to be processed from the multiple cameras.

3. The method according to claim 1, characterized in that, Determining the depth information of the multiple target images based on the intrinsic parameters of the multiple cameras and the relative extrinsic parameters includes: Extract the second image features from the multiple target images; Based on the second image features, feature matching is performed on the plurality of target images; The depth information of the multiple target images is determined based on the second feature matching results of the multiple target images, the intrinsic parameters of the multiple cameras, and the relative extrinsic parameters.

4. The method according to claim 3, characterized in that, The step of determining the depth information of the multiple target images based on the second feature matching results of the multiple target images, the intrinsic parameters of the multiple cameras, and the relative extrinsic parameters includes: Based on the second feature matching result, the intrinsic parameters of the multiple cameras, and the relative extrinsic parameters, the preset feature points are triangulated and located to obtain the depth information of the preset feature points in the multiple target images.

5. The method according to claim 3, characterized in that, Before extracting the second image features from the plurality of target images, the method further includes: The distortion correction process is performed on the multiple target images based on the intrinsic parameters of the multiple cameras.

6. The method according to claim 1, characterized in that, The step of minimizing the reprojection error of the absolute extrinsic parameters based on the depth information to obtain optimized absolute extrinsic parameters includes: Based on the depth information, the absolute extrinsic parameters are minimized by using bundle adjustment to minimize the reprojection error, thereby obtaining the optimized absolute extrinsic parameters of the multiple cameras.

7. The method according to any one of claims 1-6, characterized in that, The target object is the object that needs to be reconstructed in three dimensions.

8. A three-dimensional reconstruction method, characterized in that, The method is applied to an image acquisition system, which includes a turntable and multiple cameras; the method includes: In response to a request for three-dimensional reconstruction of the target object, the multiple cameras are used to acquire multiple calibration images of a preset calibration body placed on the turntable; Extract the first image features from the plurality of calibration images, and perform feature matching on the plurality of calibration images based on the first image features; Based on the first feature matching results of the plurality of calibration images, the initial pose information of the plurality of cameras is determined. The initial pose information includes: the intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters of the plurality of cameras, and the predetermined master camera among the plurality of cameras. The relative extrinsic parameters represent the relative positional relationship between the plurality of cameras, and the absolute extrinsic parameters represent the relative positional relationship between the master camera and the turntable. The initial pose information is processed by minimizing the reprojection error to obtain the pose information to be processed from the multiple cameras; Using the multiple cameras, multiple target images of the target object placed on the turntable are acquired based on the pose information to be processed; The depth information of the multiple target images is determined based on the intrinsic parameters of the multiple cameras and the relative extrinsic parameters; The absolute extrinsic parameters are minimized based on the depth information to obtain optimized absolute extrinsic parameters. The optimized absolute extrinsic parameters, the relative extrinsic parameters, and the intrinsic parameters of the multiple cameras are determined as the final pose information of the multiple cameras. Based on the final pose information and the multiple target images, the target object is reconstructed in three dimensions to generate a three-dimensional virtual model of the target object.

9. A three-dimensional reconstruction method, characterized in that, The method is applied to an image acquisition system, which includes a turntable and multiple cameras; the method includes: In response to a request for 3D reconstruction of the target product, the multiple cameras are used to acquire multiple calibration images of a preset calibration body placed on the turntable; Extract the first image features from the plurality of calibration images, and perform feature matching on the plurality of calibration images based on the first image features; Based on the first feature matching results of the plurality of calibration images, the initial pose information of the plurality of cameras is determined. The initial pose information includes: the intrinsic parameters, relative extrinsic parameters, and absolute extrinsic parameters of the plurality of cameras, and the predetermined master camera among the plurality of cameras. The relative extrinsic parameters represent the relative positional relationship between the plurality of cameras, and the absolute extrinsic parameters represent the relative positional relationship between the master camera and the turntable. The initial pose information is processed to minimize the reprojection error, thereby obtaining the pose information to be processed for the multiple cameras; using the multiple cameras, multiple target images of the target product placed on the turntable are acquired based on the pose information to be processed. The depth information of the multiple target images is determined based on the intrinsic parameters of the multiple cameras and the relative extrinsic parameters; The absolute extrinsic parameters are minimized based on the depth information to obtain optimized absolute extrinsic parameters. The optimized absolute extrinsic parameters, the relative extrinsic parameters, and the intrinsic parameters of the multiple cameras are determined as the final pose information of the multiple cameras. Based on the final pose information and the multiple target images, the target product is reconstructed in three dimensions to generate a three-dimensional virtual model of the target product.

10. An electronic device, characterized in that, include: At least one processor; as well as A memory that is communicatively connected to the at least one processor; The memory stores instructions executable by the at least one processor, which, when executed by the at least one processor, cause the electronic device to perform the method according to any one of claims 1-9.

11. A cloud device, characterized in that, include: At least one processor; as well as A memory that is communicatively connected to the at least one processor; The memory stores instructions that can be executed by the at least one processor, which, when executed by the at least one processor, cause the cloud device to perform the method according to any one of claims 1-9.

12. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, implement the method as described in any one of claims 1-9.

Citation Information

Patent Citations

CN115423863A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

CN115423863A