A method for performing camera calibration through AI-based geometric estimation and a camera calibration device utilizing the same.
The method addresses inefficiencies in camera calibration by utilizing AI-based geometric estimation and Gaussian splatting models to achieve high-precision correction in diverse environments, allowing for rapid parameter convergence and intuitive 3D scene adjustment.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- SUPERB AI CO LTD
- Filing Date
- 2025-11-06
- Publication Date
- 2026-07-02
AI Technical Summary
Existing camera calibration methods are inefficient in dynamic scenes, require additional equipment, and are sensitive to feature detection failures, limiting their applicability and precision, especially in environments with insufficient texture or repetitive patterns.
A method using AI-based geometric estimation, SfM, and Gaussian splatting models for camera calibration, enabling high-precision correction without additional equipment, by integrating neural network-based geometric estimation, SfM, and neural rendering to rapidly converge correction parameters.
Enables high-precision camera calibration in various environments using monocular video, with rapid parameter convergence and intuitive adjustment of 3D scenes and camera parameters through visualization.
Smart Images

Figure 0007883808000001_ABST
Abstract
Description
[Technical Field]
[0001] The present invention relates to a method for performing camera calibration, and more particularly to a method for performing camera calibration through an AI-based geometric estimation model, an SfM model, and a Gaussian splatting model, and a camera calibration apparatus utilizing the same. [Background technology]
[0002] Techniques for reconstructing 3D structures based on 2D images have long been studied in the field of computer vision. Early research began with methods that estimated depth information using images taken from two or more viewpoints, based on stereo vision and multi-view geometry. While such methods enabled precise geometric modeling by generating point clouds through feature point matching and triangulation, they had limitations, being vulnerable to various problems such as changes in lighting, lack of texture in the subject, and failures in feature point detection.
[0003] Traditional Structure-from-Motion (SfM) techniques enable relatively precise reconstruction even on large image sets, but they have the disadvantage of being sensitive to feature point detection failures and matching errors, and relying on iterative nonlinear optimization (e.g., bundle adjustment), which requires a lot of computation.
[0004] In the field of camera calibration, the traditional approach began with calculating simple projection transformations based on a pinhole camera model. While this method offers relatively high accuracy, it requires capturing correction patterns such as a checkerboard, necessitating meticulous adjustment of the shooting angle, distance, and lighting conditions, and necessitating repeated shooting from various positions. Consequently, it is difficult to apply to dynamic scenes or real-time correction, and in particular, in environments where several cameras are operated simultaneously, such as numerous CCTV systems, the setup process is extremely inefficient.
[0005] To overcome these limitations, the proposed targetless calibration technique has the advantage of being highly applicable in the field because it does not require a separate correction pattern. However, since it still relies on handcrafted features (e.g., SIFT, ORB), its accuracy may decrease in situations where there is insufficient texture or many repetitive patterns, such as glass windows.
[0006] As a result, existing technologies still have limitations in terms of applicability in various environments, level of automation, preparation time, and cost. [Overview of the Initiative] [Problems that the invention aims to solve]
[0007] The present invention aims to solve all of the problems of the prior art described above.
[0008] Furthermore, another objective of the present invention is to enable high-precision correction using only monocular video, and to make it immediately applicable in various environments without the need for additional equipment.
[0009] Furthermore, the present invention also aims to generate a high-quality reconstruction of initial values by combining a neural network-based geometric estimation model, an SfM model, and neural rendering based on a Gaussian splatting model, thereby enabling rapid convergence of correction parameters.
[0010] Another objective of the present invention is to enable users to intuitively view and adjust the reconstructed 3D scene and camera parameters through visualization and detail adjustment functions via neural rendering. [Means for solving the problem]
[0011] According to one embodiment of the present invention, in a method for performing camera calibration through AI-based geometric estimation, (a) when a first 2D image to an nth 2D image (where n is an integer of 2 or more) is obtained from at least one camera capturing a specific space at different angles, the camera calibration device inputs each of the first 2D image to the nth 2D image into a neural network-based geometric estimation model, and uses the neural network-based geometric estimation model to generate a first feature map and first geometric feature information corresponding to the first 2D image (the first geometric feature information includes each of the first 3D coordinate values corresponding to each of the first pixels of the first 2D image) to an nth feature map and nth geometric feature information corresponding to the nth 2D image (the nth geometric feature information includes each of the nth 3D coordinate values corresponding to each of the nth pixels of the nth 2D image); and (b) the camera calibration device generates the first feature map and the first geometric feature information to the nth feature map and the nth geometric feature A method is provided which includes the step of performing calibration for at least one camera by inputting information into an SfM (Structure-from-Motion, SfM) model, using the SfM model to estimate a 3D model corresponding to the specific space by referring to the first feature map and the first to the nth feature map and the nth geometric feature information (the 3D model includes integrated 3D coordinate values obtained by integrating the first to the nth 3D coordinate values, and an integrated feature map obtained by integrating the first to the nth feature map), and causing the position information and orientation information of at least one camera to be estimated.
[0012] In one example, if a new 2D image obtained by photographing the specific space with a new camera that is not the at least one camera is acquired, the camera calibration device generates a new feature map for the new 2D image, matches the new feature map to the 3D model to calculate new 3D coordinate values corresponding to each of the new pixels of the new 2D image, and calculates the position information and orientation information of the new camera by referring to the new 3D coordinate values, thereby performing calibration for the new camera; and is further characterized by including this step.
[0013] In one example, the camera calibration device inputs the new 2D image into a convolutional neural network model, extracts new features corresponding to each of the new pixels of the new 2D image through at least one convolutional filter, determines matching features that are matched to each of the new features among the integrated features of the integrated feature map included in the 3D model, and estimates the matching 3D coordinate values corresponding to each of the matching features as the new 3D coordinate values corresponding to each of the new pixels of the new 2D image, and is characterized by this.
[0014] In one example, the camera calibration device not only estimates new external parameters as the position information and orientation information of the new camera, but also further includes at least one of the focal length, principal point, and asymmetry coefficient of the new camera as the new internal parameters of the new camera and estimates them, thereby performing calibration for the new camera, and is characterized by this.
[0015] In one example, in the step (b), the camera calibration device not only estimates external parameters as the position information and orientation information of the at least one camera, but also further estimates at least one of the focal length, principal point, and skew coefficient as the internal parameters of the at least one camera, thereby performing calibration for the at least one camera.
[0016] In one example, in the step (b), the camera calibration device inputs the integrated feature map and the integrated geometric feature information (the integrated geometric feature information includes the integrated 3D coordinate values) obtained by integrating the first geometric feature information to the nth geometric feature information into a Gaussian Splatting model, and uses the Gaussian Splatting model to optimize the position information and orientation information of the at least one camera and the integrated geometric feature information, thereby performing calibration for the at least one camera.
[0017] In one example, in step (b) above, the camera calibration device optimizes the position and orientation information of at least one camera and the integrated geometric feature information using the Gaussian splatting model, wherein the camera calibration device corrects each of the integrated 3D coordinate values and each of the Gaussian parameters corresponding to each of the integrated 3D coordinate values, which are obtained by integrating the first 3D coordinate values to the nth 3D coordinate values corresponding to each of the first 2D images to the nth 2D images, from the integrated geometric feature information, and projects each of the integrated 3D coordinate values and each of the corresponding Gaussian parameters onto the plane of the 2D image through the projection matrix of at least one camera to perform a differentiable form of tile rasterization (Differentiable Tile The method is characterized by generating a rendered 2D image by performing rasterization, calculating the error between the rendered 2D image and the first 2D image to the nth 2D image, and then iteratively correcting each of the integrated 3D coordinate values and each of the Gaussian parameters to minimize the error, thereby optimizing the camera's position information and orientation information and the integrated geometric feature information.
[0018] In one example, if the camera calibration device optimizes the position and orientation information of at least one camera and the integrated geometric feature information using the Gaussian splatting model, the following formula is used to calculate: JPEG0007883808000002.jpg1384 The aforementioned R i , said t i is an external camera parameter among the camera parameters of at least one camera, and X j σ are each of the integrated 3D coordinate values corresponding to the 3D model, and σ j The scale of the Gaussian splatting model, the cj where j is the rendering attribute of the Gaussian splatting model, j is the rendering function based on the Gaussian splatting model, and j is a loss function for calculating the error between the rendered 2D image output through rendering and the first 2D image to the nth 2D image.
[0019] In one example, in the step (b), when the camera calibration device generates the 3D model using the SfM model, it is calculated by the following formula: JPEG0007883808000003.jpg1479 j i j i is the external camera parameter among the camera parameters of the at least one camera, j are the integrated three-dimensional coordinate values corresponding to the 3D model, j are the first 2D coordinate values corresponding to the first feature map for the first 2D image to the nth 2D coordinate values corresponding to the nth feature map for the nth 2D image, j is a projection function, and j is a loss function for calculating the difference between the projection 2D coordinate values as the pixel coordinate system obtained by projecting the camera parameters of the at least one camera corresponding to each of the first 2D coordinate values to the nth 2D coordinate values and each of the integrated 3D coordinate values.
[0020] In one example, when the at least one camera is one camera, the camera calibration device is characterized by taking pictures at different angles while moving to preset first positions to nth positions to generate the first 2D image to the nth 2D image. <00001十一2> In one example, if the at least one camera is a plurality of cameras, namely the first to the nth camera, the camera calibration device is characterized in that it captures images of each of the first to the nth cameras at different angles from which they are installed, and generates the first 2D image obtained from the first camera and the nth 2D image obtained from the nth camera.
[0022] Furthermore, according to another embodiment of the present invention, a camera calibration device that performs camera calibration through AI-based geometric estimation includes at least one memory for storing instructions; and at least one processor configured to perform the instructions, wherein the processor (I) obtains a first 2D image to an nth 2D image (where n is an integer of 2 or more) of a specific space captured from at least one camera at different angles, and then processes each of the first 2D image to the nth 2D image using a neural network-based geometric estimation model (neural geometry estimation). (i) A process of inputting the model into a neural network and generating a first feature map and first geometric feature information corresponding to the first 2D image (the first geometric feature information includes each of the first 3D coordinate values corresponding to each of the first pixels of the first 2D image) to an nth feature map and nth geometric feature information corresponding to the nth 2D image (the nth geometric feature information includes each of the nth 3D coordinate values corresponding to each of the nth pixels of the nth 2D image) using the geometric estimation model based on the neural network; and (ii) a process of inputting the first feature map and the first geometric feature information to the nth feature map and the nth geometric feature information into SfM (Structure A camera calibration device is provided that performs a process of calibration for at least one camera by inputting a (from-Motion, SfM) model, using the SfM model to estimate a 3D model corresponding to the specific space by referring to the first feature map and the first to the nth feature map and the nth geometric feature information (the 3D model includes integrated 3D coordinate values obtained by integrating the first to the nth 3D coordinate values, and an integrated feature map obtained by integrating the first to the nth feature map), and causing the device to estimate the position information and orientation information of at least one camera.
[0023] In one example, (III) if a new 2D image is obtained from a new camera other than the at least one camera that captures the specific space, the processor further performs a calibration process for the new camera by generating a new feature map for the new 2D image, matching the new feature map to the 3D model to calculate new 3D coordinate values corresponding to each of the new pixels in the new 2D image, and calculating the position information and orientation information of the new camera by referring to the new 3D coordinate values.
[0024] In one example, the processor inputs the new 2D image into a convolutional neural network model, extracts new features corresponding to each of the new pixels of the new 2D image through at least one convolutional filter, determines matching features that match each of the new features in the integrated feature map included in the 3D model, and estimates the matching 3D coordinate values corresponding to each of the matching features as the new 3D coordinate values corresponding to each of the new pixels of the new 2D image.
[0025] In one example, the processor not only estimates new external parameters as positional and orientation information of the new camera, but also estimates new internal parameters of the new camera that include at least one of the focal length, principal point, and asymmetry coefficient of the new camera, thereby performing calibration on the new camera.
[0026] In one example, the process (II) is characterized in that the processor not only estimates external parameters as positional and orientation information of the at least one camera, but also estimates at least one of the focal length, principal point, and asymmetry coefficient as internal parameters of the at least one camera, thereby performing calibration on the at least one camera.
[0027] In one example, the process described in (II) above is characterized in that the processor inputs the integrated feature map and integrated geometric feature information (the integrated geometric feature information includes the integrated 3D coordinate values) obtained by integrating the first to n geometric feature information into a Gaussian splatting model, thereby performing calibration for the at least one camera by optimizing the Gaussian splatting model with respect to the position and orientation information of the at least one camera and the integrated geometric feature information.
[0028] In one example, in process (II), the processor optimizes the position and orientation information of at least one camera and the integrated geometric feature information using the Gaussian splatting model, correcting each of the integrated 3D coordinate values and each of the Gaussian parameters corresponding to each of the integrated 3D coordinate values, which are obtained by integrating the first 3D coordinate values to the nth 3D coordinate values corresponding to each of the first 2D images to the nth 2D images, from the integrated geometric feature information, and projecting each of the integrated 3D coordinate values and each of the corresponding Gaussian parameters onto the plane of the 2D image through the projection matrix of at least one camera to perform a differentiable form of tile rasterization (Differentiable Tile The method is characterized by generating a rendered 2D image by performing rasterization, calculating the error between the rendered 2D image and the first 2D image to the nth 2D image, and then iteratively correcting each of the integrated 3D coordinate values and each of the Gaussian parameters to minimize the error, thereby optimizing the camera's position information and orientation information and the integrated geometric feature information.
[0029] In one example, if the processor optimizes the position and orientation information of at least one camera and the integrated geometric feature information using the Gaussian splatting model, the following formula is used to calculate: JPEG0007883808000004.jpg1379 The aforementioned R i , said t i is an external camera parameter among the camera parameters of at least one camera, and X j σ are each of the integrated 3D coordinate values corresponding to the 3D model, and σ j The scale of the Gaussian splatting model, the c jThe method is characterized in that R is the rendering attribute of the Gaussian splatting model, R is a rendering function based on the Gaussian splatting model, and L is a loss function that calculates the error between the rendered 2D image output through rendering and the first 2D image to the nth 2D image.
[0030] In one example, when the processor generates the 3D model using the SfM model in the process (II) above, it is calculated using the following formula: JPEG0007883808000005.jpg1479 The aforementioned R i , said t i is an external camera parameter among the camera parameters of at least one camera, and X j x are each of the integrated 3D coordinate values corresponding to the 3D model, and x ij The first 2D coordinate values are each of the first 2D coordinate values corresponding to the first feature map for the first 2D image, and the nth 2D coordinate values are each of the nth 2D coordinate values corresponding to the nth feature map for the nth 2D image, wherein π is a projection function, and ρ is a loss function that calculates the difference between projected 2D coordinate values as a pixel coordinate system calculated by projecting the camera parameters of at least one camera corresponding to each of the first 2D coordinate values to the nth 2D coordinate values and each of the integrated 3D coordinate values.
[0031] In one example, if the at least one camera is a single camera, the processor generates the first to the nth 2D image by moving between pre-set first to nth positions and capturing images at different angles from each other.
[0032] In one example, if the at least one camera is a plurality of cameras, namely the first to the nth camera, the processor is characterized in that it captures images from the first to the nth camera at different angles from which each of the first to the nth cameras is installed, and generates the first 2D image obtained from the first camera and the nth 2D image obtained from the nth camera. [Effects of the Invention]
[0033] According to the present invention, high-precision correction can be performed using only monocular video, and it has the effect of being immediately applicable in various environments without the need for additional equipment.
[0034] According to the present invention, a neural network-based geometric estimation model, an SfM model, and neural rendering based on a Gaussian splatting model are combined to generate a reconstruction with high quality initial values, which has the added benefit of rapidly converging correction parameters.
[0035] According to the present invention, there is another effect: through visualization and detail adjustment functions via neural rendering, the user can intuitively check and adjust the reconstructed 3D scene and camera parameters. [Brief explanation of the drawing]
[0036] The following drawings, attached for use in describing embodiments of the present invention, represent only a portion of embodiments of the present invention, and a person with ordinary skill in the art to which the present invention pertains (hereinafter referred to as "ordinary art") can obtain the other drawings from these drawings without performing any inventive work.
[0037] [Figure 1] This is a simplified illustration of a camera calibration device that performs camera calibration through AI-based geometric estimation according to one embodiment of the present invention. [Figure 2]One embodiment of the present invention briefly illustrates a method for performing camera calibration through AI-based geometric estimation. [Figure 3] This embodiment of the present invention demonstrates the process of performing camera calibration through AI-based geometric estimation. [Figure 4] One embodiment of the present invention involves capturing images of a specific space from at least one camera at different angles to estimate a 3D model, and this embodiment briefly illustrates the process of estimating the position and orientation information of at least one camera. [Figure 5] This invention provides a simplified example of the process of estimating the position and orientation information of a new camera by referring to a new 2D image acquired from the new camera. [Modes for carrying out the invention]
[0038] The detailed description of the present invention, as described below, refers to the accompanying drawings illustrating specific embodiments in which the present invention may be carried out. These embodiments are described in sufficient detail so that those skilled in the art can carry out the present invention. It should be understood that the various embodiments of the present invention are different from one another but do not need to be mutually exclusive. For example, the specific shapes, structures and characteristics described herein can be embodied by modifying one embodiment to another without departing from the spirit and scope of the present invention. It should also be understood that the position or arrangement of individual components within each embodiment can be modified without departing from the spirit and scope of the present invention. Therefore, the detailed description below should not be taken as restrictive, and the scope of the present invention should be understood as encompassing the scope claimed in the claims and all equivalent scopes thereto. Similar reference numerals in the drawings refer to parts that are identical or have similar functions in various aspects.
[0039] In the following, several preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings, so that a person with ordinary skill in the art to which the present invention pertains can easily implement the present invention.
[0040] Figure 1 shows a simplified camera calibration device that performs camera calibration through AI-based geometric estimation according to one embodiment.
[0041] As shown in Figure 1, the camera calibration device 100 of the present invention may include a memory 110 and a processor 120.
[0042] Here, memory 110 can store instructions to be performed by processor 120, specifically, instructions which are codes generated for the purpose of causing the camera calibration device 100 to function in a particular manner, and which can be stored in computer-accessible or computer-readable memory that can be directed to a computer or other programmable data processing equipment. The instructions can perform processes for performing the functions described in the specification of the present invention.
[0043] The processor 120 may include hardware components such as an MPU (Micro Processing Unit) or CPU (Central Processing Unit), cache memory, and a data bus. The camera calibration device 100 may also further include operational configurations and software configurations for applications that perform specific purposes.
[0044] However, this does not exclude the case where the processor 120 of the camera calibration device 100 includes an integrated processor in which a medium, processor, and memory are integrated for carrying out the present invention.
[0045] Furthermore, the camera calibration device 100 can be linked with a neural network-based geometric estimation model 200, an SfM model 300, and a Gaussian splatting model 400. Incidentally, although Figure 1 shows the neural network-based geometric estimation model 200, the SfM model 300, and the Gaussian splatting model 400 as being configured separately from the camera calibration device 100, it is not limited to this, and at least a part of the neural network-based geometric estimation model 200, the SfM model 300, and the Gaussian splatting model 400 can also be included in the camera calibration device 100 and linked together.
[0046] Here, the neural network-based geometric estimation model 200, the SfM model 300, and the Gaussian splatting model 400 may already be trained.
[0047] Furthermore, the neural network-based geometric estimation model 200, SfM model 300, and Gaussian splatting model 400 may be any previously known model, but are not limited to them.
[0048] Furthermore, the camera calibration device 100 can receive images of a specific space captured from at least one camera at different angles.
[0049] Here, if n is 1, it can be assumed that the first camera 10_1 to the nth camera 10_n are all the same camera, and that one camera moves from the previously set first position to the nth position while taking pictures at different angles to generate the first 2D image 11_1 to the nth 2D image 11_n, as shown in Figure 3 below. However, if n is 2 or more, it can also be assumed that the first camera 10_1 to the nth camera 10_n are all multiple different cameras, and that the first camera 10_1 to the nth camera 10_n are all installed at different positions and take pictures at different angles to generate the first 2D image 11_1 obtained from the first camera 10_1 to the nth 2D image 11_n obtained from the nth camera 10_n.
[0050] A method and procedure using the camera calibration device 100 according to one embodiment of the present invention, configured as described above, will be explained below with reference to Figures 2 and 3.
[0051] Figure 2 shows a simplified method for performing camera calibration through AI-based geometric estimation according to one embodiment of the present invention, and Figure 3 shows the process of performing camera calibration through AI-based geometric estimation according to one embodiment of the present invention.
[0052] First, once a first 2D image 11_1 to the nth 2D image 11_n are obtained from at least one camera, capturing a specific space at different angles, the camera calibration device 100 inputs each of the first 2D image 11_1 to the nth 2D image 11_n into a neural network-based geometric estimation model 200, which in turn generates a first feature map 220_1 and first geometric feature information 210_1 corresponding to the first 2D image 11_1 to the nth feature map 220_n and nth geometric feature information 210_n corresponding to the nth 2D image 11_n (S100).
[0053] In this case, the first geometric feature information 210_1 may include each of the first 3D coordinate values corresponding to each of the first pixels of the first 2D image 11_1, and the nth geometric feature information 210_n may include each of the nth 3D coordinate values corresponding to each of the nth pixels of the nth 2D image 11_n, but it may also include, and is not limited to, the distance from at least one camera to the 3D coordinates, the surface normal vector, etc.
[0054] Next, the camera calibration device 100 inputs the first feature map 220_1 and the first geometric feature information 210_1 to the nth feature map 220_n and the nth geometric feature information 210_n into the SfM (Structure-from-Motion, SfM) model 300, and uses the SfM model 300 to estimate a 3D model corresponding to a specific space by referring to the first feature map 220_1 and the first geometric feature information 210_1 to the nth feature map 220_n and the nth geometric feature information 210_n, thereby estimating the position information and orientation information of at least one camera, and thus performing calibration for at least one camera (S200). Here, the 3D model may include an integrated 3D coordinate value 310 which integrates the first 3D coordinate value to the nth 3D coordinate value, and an integrated feature map 320 which integrates the first feature map to the nth feature map.
[0055] Here, the position and orientation information of at least one camera may include the installation position coordinates and shooting angle of at least one camera.
[0056] Furthermore, the camera calibration device 100 can perform calibration for at least one camera by not only estimating external parameters as position and orientation information for at least one camera, but also by further estimating at least one of the following as intrinsic parameters for at least one camera: the focal length, which is the distance between the lens center of at least one camera and the image sensor; the principal point, which is the lens center of at least one camera; and the asymmetry coefficient, which is the degree to which the Y side of the image sensor is tilted. Here, the external parameters of at least one camera can be used when converting from the world coordinate system to the camera coordinate system, and the intrinsic parameters of the camera can be used when converting from the camera coordinate system to the pixel coordinate system.
[0057] To explain in more detail, when generating the aforementioned 3D model using an SfM model, the following formula is used: JPEG0007883808000006.jpg1479 It can be calculated by [method].
[0058] Here, R i , t i This may be a rotation matrix and a translation vector as external camera parameters among the camera parameters of at least one camera, X j These are each of the 310 integrated 3D coordinate values corresponding to the 3D model, and x ij These are the first 2D coordinate values corresponding to the first feature map 220_1 of the first feature for the first 2D image 11_1, or the nth 2D coordinate values corresponding to the nth feature map 220_n of the nth feature for the nth 2D image 11_n, respectively, and π(K i , R i , t i , X i ) is the 3D coordinate value X j The intrinsic and external parameters (K) of at least one camera. i , R i , t i) is a projection function that projects onto ), and ρ may be a loss function that calculates the difference between projected 2D coordinate values as a pixel coordinate system calculated by projecting the camera parameters of at least one camera corresponding to each of the first to nth 2D coordinate values and each of the integrated 3D coordinate values 310. Here, K i This may be an internal parameter of at least one camera.
[0059] Here, the camera calibration device 100 inputs the integrated geometric feature information, which is obtained by integrating the integrated feature map 320 and the first geometric feature information 210_1 to the nth geometric feature information 210_n, into the Gaussian Splatting model 400. The Gaussian Splatting model 400 then optimizes the integrated geometric feature information by referring to the position and orientation information of at least one camera and the integrated feature map 320, thereby enabling calibration for at least one camera. Here, the integrated geometric feature information may, but is not limited to, the integrated 3D coordinate values 310.
[0060] Furthermore, the optimization of integrated geometric feature information, which refers to the position and orientation information of at least one camera and the integrated feature map 320, may involve correcting the position of the position and orientation information of at least one camera, and the positions of the first to the nth 3D coordinate values.
[0061] More specifically, the camera calibration device 100 optimizes the position and orientation information of at least one camera and the integrated geometric feature information using a Gaussian splatting model 400. The camera calibration device 100 corrects each of the integrated 3D coordinate values 310, which are point cloud data of a 3D model formed by integrating the first 3D coordinate values of the first geometric feature information 210_1 corresponding to the first 2D image 11_1 to the nth 2D image 11_n, and each of the Gaussian parameters corresponding to each of the integrated 3D coordinate values 310. It then projects each of the integrated 3D coordinate values 310 and each of the corresponding Gaussian parameters onto the plane of the 2D image through the projection matrix of at least one camera, resulting in a differentiable form of tile rasterization (Differentiable Tile). The process may involve generating a rendered 2D image by performing rasterization, calculating the error between the rendered 2D image and the first 2D image 11_1 to the nth 2D image 11_n, and then iteratively correcting each of the integrated 3D coordinate values 310 and each of the Gaussian parameters to minimize the error, thereby optimizing the position and orientation information of at least one camera and the integrated geometric feature information so that the 3D model is as similar as possible to the first 2D image 11_1 to the nth 2D image 11_n corresponding to a specific space. Here, the Gaussian parameter is a parameter that indicates the hue, opacity, etc., for each pixel corresponding to each of the 3D coordinate values, and may, as an example, be a feature corresponding to a pixel.
[0062] In connection with this, if we use the Gaussian splatting model 400 to optimize the integrated 3D coordinate values 310 of the position and orientation information of at least one camera and the integrated geometric feature information, then the following formula is used: JPEG0007883808000007.jpg1384 This can be calculated using R.i , t i This may be a rotation matrix and a translation vector as external camera parameters among the camera parameters of at least one camera, X j These are each of the 310 integrated 3D coordinate values corresponding to the 3D model, σ j This is a Gaussian splatting model 400 scale, c j Here, I is a rendering attribute of the Gaussian splatting model 400, R is a rendering function based on the Gaussian splatting model 400, and L may be a loss function that calculates the error between the rendered 2D image output through rendering and the first 2D image 11_1 to the nth 2D image 11_n. i The input image may also be used.
[0063] Next, once a new 2D image 21 is obtained from at least one new camera 20 capturing a specific space, the camera calibration device 100 generates a new feature map for the new 2D image 21, matches the new feature map to a 3D model to calculate new 3D coordinate values corresponding to each new pixel of the new 2D image 21, and estimates the position and orientation information of the new camera 20 by referring to the new 3D coordinate values, thereby performing calibration for the new camera 20 (S300).
[0064] More specifically, the camera calibration device 100 inputs the new 2D image 21 into a convolutional neural network model, extracts new features corresponding to each new pixel of the new 2D image 21 through at least one convolutional filter, determines matching features that match each new feature within the integrated features of the integrated feature map 320 included in the 3D model, and estimates new 3D coordinate values corresponding to each new pixel of the new 2D image 21 by referencing the integrated 3D coordinate values 310 for the matching 3D coordinate values corresponding to each matching feature.
[0065] Here, the camera calibration device 100 can perform calibration on the new camera 20 by not only estimating new external parameters as positional and orientation information of the new camera 20, but also by further estimating at least one of the focal length, principal point, and asymmetry coefficient of the new camera 20 as new internal parameters of the new camera 20.
[0066] An example of the camera calibration process is described below with reference to Figures 4 and 5.
[0067] First, Figure 4 shows a simplified example of the process of estimating a 3D model by capturing images of a specific space from at least one camera at different angles, according to one embodiment of the present invention, and estimating the position and orientation information of at least one camera.
[0068] Referring to Figure 4, let's assume that a specific space is photographed at four different angles through at least one camera, specifically a first camera through a fourth camera, to obtain a first 2D image 11_1 through a fourth 2D image 11_4. The first camera 10_1, which photographs at the first angle, can generate the first 2D image 11_1 for the specific space; the second camera 10_2, which photographs at the second angle, can generate the second 2D image 11_2 for the specific space; the third camera 10_3, which photographs at the third angle, can generate the third 2D image 11_3 for the specific space; and the fourth camera 10_4, which photographs at the fourth angle, can generate the fourth 2D image 11_4 for the specific space.
[0069] Here, at least one camera may be four different cameras positioned at positions 1 through 4, corresponding to angles 1 through 4, representing four distinct angles. Alternatively, one camera could move and film at positions 1 through 4, corresponding to angles 1 through 4, representing four distinct angles. However, for convenience, we will explain using the example of four cameras.
[0070] Next, the camera calibration device 100 can output first geometric feature information 210_1 and first feature maps 220_1 to 4th geometric feature information 210_4 and 4th feature map 220_4 for each of the first 2D images 11_1 to 4th 2D images 11_4 through a neural network-based geometric estimation model 200, and extract each of the nine first 3D coordinates (corresponding to ● (circle)) of the first feature corresponding to each of the first pixels of the first 2D image 11_1 captured at the angle of the first camera 10_1. It is possible to extract each of the nine second 3D coordinates (corresponding to ▲ (triangle)) of the second feature corresponding to each second pixel of the second 2D image 11_2 captured at the angle of the second camera 10_2, it is possible to extract each of the twelve third 3D coordinates (corresponding to ■ (square)) of the third feature corresponding to each third pixel of the third 2D image 11_3 captured at the angle of the third camera 10_3, and it is possible to extract each of the ten fourth 3D coordinates (corresponding to ★ (star)) of the fourth feature corresponding to each fourth pixel of the fourth 2D image 11_4 captured at the angle of the fourth camera 10_4.
[0071] Here, the first feature map 220_1 may be a set of each ● (circle) feature, the second feature map 220_2 may be a set of each ▲ (triangle) feature, the third feature map 220_3 may be a set of each ■ (square) feature, and the fourth feature map 220_4 may be a set of each ★ (star) feature.
[0072] By inputting the first geometric feature information 210_1 to the fourth geometric feature information 210_4, which include the first 3D coordinate (corresponding to ● (circle)) to the fourth 3D coordinate (corresponding to ★ (star)), and the first feature map 220_1 to the fourth feature map 220_4 into the SfM model 300, it is possible to estimate the positional information and orientation information of the first camera 10_1 to the fourth camera 10_4 by estimating a 3D model corresponding to a specific space where the first 2D image 11_1 to the fourth 2D image 11_4 were captured. Here, the 3D model can generate an integrated 3D coordinate value 310 that includes integrated geometric feature information by integrating the first geometric feature information 210_1 to the fourth geometric feature information 210_4, and can also generate and include an integrated feature map 320 by integrating the first feature map 220_1 to the fourth feature map 220_4. Here, the integrated geometric feature information may include an integrated 3D coordinate value 310 obtained by integrating the first 3D coordinate value or the nth 3D coordinate value.
[0073] Furthermore, each of the first 3D coordinates (● (circle)) as the first pixel of the first 2D image 11_1 captured at the angle of the first camera 10_1 and each of the second 3D coordinates (▲ (triangle)) as the second pixel of the second 2D image 11_2 captured at the angle of the second camera 10_2 may overlap depending on the shooting range at the respective camera positions, and each of these coordinates may overlap with the ● (circle) and ▲ (triangle) in Figure 4, and each of the second 3D coordinates (▲ (triangle)) of the second 2D image 11_2 captured at the angle of the second camera 10_2 and each of the third 3D coordinates (▲ (triangle)) as the third pixel of the third 2D image 11_3 captured at the angle of the third camera 10_3 Each of the coordinates (■ (square)) may overlap depending on the shooting range at each camera's position, and each of these coordinates may overlap with the ▲ (triangle) and ■ (square) in Figure 4. Each of the third 3D coordinates (■ (square)) of the third 2D image 11_3 captured at the angle of the third camera 10_3 and each of the fourth 3D coordinates (★ (star)) as the fourth pixel of the fourth 2D image 11_4 captured at the angle of the fourth camera 10_4 may overlap depending on the shooting range at each camera's position, and each of these coordinates may overlap with the ■ (square) and ★ (star) in Figure 4.
[0074] Next, the integrated coordinates obtained by combining these are as shown in Figure 5. This is shown in JPEG0007883808000008.jpg77, and in relation to this, please refer to Figure 5 below for further explanation.
[0075] Through this process, the positional and orientation information of each of the first camera 10_1 through the fourth camera 10_4 can be estimated.
[0076] At this time, the camera coordinates of the first camera 10_1 to the fourth camera 10_4 may differ from the world coordinates, which are coordinates in a specific space. Therefore, a matrix transformation is performed from the world coordinate system to the camera coordinate system, and the first geometric feature information 210_1 to the fourth geometric feature information 210_4 can be extracted.
[0077] In this case, by using the neural network-based geometric estimation model 200, unlike existing multi-view geometry approaches, it does not require camera parameters from at least one camera in advance to infer 3D coordinate values, thus requiring less prior data.
[0078] Next, the integrated 3D coordinate values of the integrated geometric feature information generated by integrating the first geometric feature information 210_1 to the fourth geometric feature information 210_4, and the integrated feature map generated by integrating the first feature map 220_1 to the fourth feature map 220_4 are input to the Gaussian splatting model 400 to correct the positions of the first to fourth 3D coordinate values, correct the corresponding Gaussian parameters, and project each of the integrated 3D coordinate values and each of the corresponding Gaussian parameters onto the plane of the 2D image through a projection matrix, referencing the intrinsic and extrinsic parameters of at least one camera, to perform a differentiable form of tile rasterization (Differentiable Tile). By performing rasterization to generate a rendered 2D image, calculating the error between the rendered 2D image and the first 2D image 11_1 to the fourth 2D image 11_4, and then iteratively correcting each of the integrated 3D coordinate values and each of the Gaussian parameters to minimize the error, accurate camera position and orientation information can be provided for subsequently added images.
[0079] Through this process, by utilizing the SfM model 300 and the Gaussian splatting model 400 together, both the processing speed and the quality of the results can be improved, and a search space can be formed to quickly find the location of the relevant image when a new image is given in the future.
[0080] Furthermore, since such corrections are performed through sophisticated alignment utilizing a pre-built SfM database of SfM model 300, stable and repeatable high-precision calibration is possible even in complex environments with many objects.
[0081] Here, each of the first feature map 220_1 to the fourth feature map 220_4 is not used as an optimization variable, such as the position and orientation information of at least one camera, or the position adjustment of the first to fourth 3D coordinate values, but can also be used as a fixed reference to maintain the matching relationship.
[0082] Next, Figure 5 shows a simplified example of the process of estimating the positional and orientation information of a new camera by referring to a new 2D image acquired from the new camera, according to one embodiment of the present invention.
[0083] First, if we integrate the nine features indicated by the first 3D coordinate (● (circle)) in Figure 4, the nine features indicated by the second 3D coordinate (▲ (triangle)), the twelve features indicated by the third 3D coordinate (■ (square)), and the ten features indicated by the fourth 3D coordinate (★ (star)), then for each of the integrated 3D coordinates, which are the integrated features in Figure 5, As shown in JPEG0007883808000009.jpg77, it can be represented in this way, and a unified coordinate system consisting of 36 integrated features can be generated.
[0084] Referring to Figure 5, once a new 2D image 21, which is a new image of a specific space captured from the new camera 30, is obtained by referring to the integrated 3D coordinate values 310 and integrated feature map 320, which have been corrected through Figure 4, the camera calibration device 100 can input the new 2D image 21 into a convolutional neural network model with reference to the image query, extract new features corresponding to each new pixel of the new 2D image 21 through at least one convolutional filter, and generate a new feature map with reference to this. It can also determine and match the position of each new feature in the new feature map by referring to the integrated feature map 320 and integrated 3D coordinate values 310 corresponding to the 3D model, and estimate the new 3D coordinate values corresponding to each new pixel of the new 2D image 21.
[0085] Through this series of processes, calibration can be performed on the new camera 30, which calculates the position and orientation information of the new camera 20 by referring to the new 3D coordinate values.
[0086] The embodiments of the present invention described above are embodied in the form of program instructions that can be executed through various computer components and can be recorded on a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc., individually or in combination. The program instructions recorded on the computer-readable recording medium may be specifically designed and configured for the present invention, or may be publicly known and usable by those skilled in the computer software field. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes; optical recording media such as CD-ROMs and DVDs; magneto-optical media such as floptical disks; and hardware devices specifically configured to store and execute program instructions, such as ROMs, RAMs, and flash memory. Examples of program instructions include not only machine code, such as that produced by a compiler, but also high-level language code executed by a computer using an interpreter, etc. The hardware devices may be configured to operate as one or more software modules to perform the processing according to the present invention, and vice versa.
[0087] Although the present invention has been described above with reference to specific components and other details, as well as limited embodiments and drawings, these are provided only to aid in a more general understanding of the invention. The present invention is not limited to the above embodiments, and any person with ordinary skill in the art to which the invention belongs can make various modifications and variations from this description.
[0088] Therefore, the concept of the present invention should not be limited to the embodiments described above, and it can be said that not only the claims described later, but also all modifications that are equivalent or equivalent to the claims of this invention, fall within the scope of the concept of the present invention.
Claims
1. In a method for performing camera calibration through AI-based geometric estimation, (a) Once a first 2D image to an nth 2D image (where n is an integer of 2 or more) is obtained from at least one camera capturing a specific space at different angles, the camera calibration device inputs each of the first 2D image to the nth 2D image into a neural network-based geometric estimation model to generate a first feature map and first geometric feature information corresponding to the first 2D image (the first geometric feature information includes each of the first 3D coordinate values corresponding to each of the first pixels of the first 2D image) to an nth feature map and nth geometric feature information corresponding to the nth 2D image (the nth geometric feature information includes each of the nth 3D coordinate values corresponding to each of the nth pixels of the nth 2D image), (b) Calibration for at least one camera by having the camera calibration device input the first feature map and the first to the nth feature map and the nth geometric feature information into an SfM (Structure-from-Motion, SfM) model, and using the SfM model to reference the first feature map and the first to the nth feature map and the nth geometric feature information to estimate a 3D model corresponding to the specific space (the 3D model includes integrated 3D coordinate values obtained by integrating the first to the nth 3D coordinate values, and an integrated feature map obtained by integrating the first to the nth feature map), thereby estimating the position information and orientation information of at least one camera; Includes, In step (b) above, A method characterized in that the camera calibration device inputs the integrated feature map and integrated geometric feature information (the integrated geometric feature information including the integrated 3D coordinate values) obtained by integrating the first geometric feature information to the n geometric feature information into a Gaussian splatting model, thereby performing calibration for the at least one camera by optimizing the Gaussian splatting model with respect to the position information and orientation information of the at least one camera and the integrated geometric feature information.
2. (c) Once a new 2D image is obtained from a new camera other than the at least one camera capturing the specific space, the camera calibration device performs calibration on the new camera by generating a new feature map for the new 2D image, matching the new feature map to the 3D model to calculate new 3D coordinate values corresponding to each of the new pixels in the new 2D image, and calculating the position information and orientation information of the new camera by referring to the new 3D coordinate values. The method according to claim 1, further comprising:
3. The method according to claim 2, characterized in that the camera calibration device inputs the new 2D image into a convolutional neural network model, extracts new features corresponding to each of the new pixels of the new 2D image through at least one convolutional filter, determines matching features that match each of the new features in the integrated feature map included in the 3D model, and estimates the matching 3D coordinate values corresponding to each of the matching features as the new 3D coordinate values corresponding to each of the new pixels of the new 2D image.
4. The method according to claim 3, characterized in that the camera calibration device not only estimates new external parameters as positional and orientation information of the new camera, but also estimates new internal parameters of the new camera that further include at least one of the focal length, principal point, and asymmetry coefficient of the new camera, thereby performing calibration on the new camera.
5. In step (b) above, The method according to claim 1, characterized in that the camera calibration device not only estimates external parameters as positional and orientation information of the at least one camera, but also estimates at least one of the focal length, principal point, and asymmetry coefficient as internal parameters of the at least one camera, thereby performing calibration on the at least one camera.
6. In step (b) above, In the camera calibration device, the Gaussian splatting model is used to optimize the position and orientation information of at least one camera and the integrated geometric feature information, and the camera calibration device corrects each of the integrated 3D coordinate values and each of the Gaussian parameters corresponding to each of the integrated 3D coordinate values, which are obtained by integrating the first 3D coordinate values to the nth 3D coordinate values corresponding to each of the first 2D image to the nth 2D image, from the integrated geometric feature information, and projects each of the integrated 3D coordinate values and each of the Gaussian parameters corresponding thereto onto the plane of the 2D image through the projection matrix of at least one camera, and performs a differentiable form of tile rasterization (Differentiable Tile). The method according to claim 1, characterized in that a rendered 2D image is generated by performing Rasterization, an error is calculated between the rendered 2D image and the first 2D image to the nth 2D image, and then optimization is performed on the camera's position information and orientation information and the integrated geometric feature information by iteratively correcting each of the integrated 3D coordinate values and each of the Gaussian parameters so that the error is minimized.
7. If the camera calibration device optimizes the position and orientation information of at least one camera and the integrated geometric feature information using the Gaussian splatting model, the following formula is used to calculate: The method according to claim 6, characterized in that Ri and ti are external camera parameters among the camera parameters of at least one camera, Xj is each of the integrated 3D coordinate values corresponding to the 3D model, σj is the scale of the Gaussian splatting model, cj is the rendering attribute of the Gaussian splatting model, R is a rendering function based on the Gaussian splatting model, and L is a loss function that calculates the error between the rendered 2D image output through rendering and the first 2D image to the nth 2D image.
8. In step (b) above, When the camera calibration device generates the 3D model using the SfM model, it calculates the following formula: The method according to claim 1, characterized in that Ri and ti are external camera parameters among the camera parameters of the at least one camera, Xj is each of the integrated 3D coordinate values corresponding to the 3D model, xij is each of the first 2D coordinate values corresponding to the first feature map for the first 2D image to the nth 2D coordinate value corresponding to the nth feature map for the nth 2D image, π is a projection function, and ρ is a loss function that calculates the difference between projected 2D coordinate values as a pixel coordinate system calculated by projecting the camera parameters of the at least one camera corresponding to each of the first 2D coordinate values to the nth 2D coordinate values and each of the integrated 3D coordinate values.
9. The method according to claim 1, characterized in that, when the at least one camera is a single camera, the camera calibration device generates the first to the nth 2D image by moving between a previously set first to nth position and taking photographs at different angles from each other.
10. The method according to claim 1, wherein, if the at least one camera is a plurality of cameras, namely a first camera to an nth camera, the camera calibration device generates a first 2D image obtained from the first camera and an nth 2D image obtained from the nth camera by taking photographs at different angles from which each of the first camera to the nth camera is installed.
11. In a camera calibration device that performs camera calibration through AI-based geometric estimation, At least one memory to store instructions, Includes at least one processor configured to perform the aforementioned instructions, The processor performs the following processes: (I) When a first 2D image to an nth 2D image (where n is an integer of 2 or more) is obtained from at least one camera capturing a specific space at different angles, it inputs each of the first 2D image to the nth 2D image into a neural network-based geometric estimation model, and uses the neural network-based geometric estimation model to generate a first feature map and first geometric feature information corresponding to the first 2D image (the first geometric feature information includes each of the first 3D coordinate values corresponding to each of the first pixels of the first 2D image) to an nth feature map and nth geometric feature information corresponding to the nth 2D image (the nth geometric feature information includes each of the nth 3D coordinate values corresponding to each of the nth pixels of the nth 2D image); and (II) The first feature map and the first geometric feature information to the nth feature map and the nth geometric feature information into Sf The process involves inputting data into an M (Structure-from-Motion, SfM) model, using the SfM model to estimate a 3D model corresponding to the specific space by referencing the first feature map and the first to the nth feature map and the nth geometric feature information (the 3D model includes integrated 3D coordinate values obtained by integrating the first to the nth 3D coordinate values, and an integrated feature map obtained by integrating the first to the nth feature map), and performing calibration for at least one camera by estimating the position and orientation information of at least one camera; In the above process (II), A camera calibration device characterized in that the processor inputs the integrated feature map and integrated geometric feature information (the integrated geometric feature information including the integrated 3D coordinate values) obtained by integrating the first geometric feature information to the n geometric feature information into a Gaussian splatting model, thereby performing calibration for at least one camera by optimizing the Gaussian splatting model with respect to the position information and orientation information of at least one camera and the integrated geometric feature information.
12. (III) If a new 2D image is obtained from a new camera other than the at least one camera that captures the specific space, the processor further performs a process of calibration for the new camera by generating a new feature map for the new 2D image, matching the new feature map to the 3D model to calculate new 3D coordinate values corresponding to each of the new pixels of the new 2D image, and calculating position information and orientation information of the new camera by referring to the new 3D coordinate values; the camera calibration apparatus according to claim 11.
13. The camera calibration apparatus according to claim 12, characterized in that the processor inputs the new 2D image into a convolutional neural network model, extracts new features corresponding to each of the new pixels of the new 2D image through at least one convolutional filter, determines matching features that match each of the new features in the integrated feature map included in the 3D model, and estimates the matching 3D coordinate values corresponding to each of the matching features as the new 3D coordinate values corresponding to each of the new pixels of the new 2D image.
14. The camera calibration device according to claim 13, characterized in that the processor not only estimates new external parameters as position information and orientation information of the new camera, but also estimates new internal parameters of the new camera that further include at least one of the focal length, principal point, and asymmetry coefficient of the new camera.
15. In the above process (II), The camera calibration apparatus according to claim 11, characterized in that the processor not only estimates external parameters as position information and orientation information of the at least one camera, but also estimates at least one of the focal length, principal point, and asymmetry coefficient as internal parameters of the at least one camera.
16. In the above process (II), The processor optimizes the position and orientation information of at least one camera and the integrated geometric feature information using the Gaussian splatting model, correcting each of the integrated 3D coordinate values and each of the Gaussian parameters corresponding to each of the integrated 3D coordinate values, which are obtained by integrating the first 3D coordinate values to the nth 3D coordinate values corresponding to each of the first 2D image to the nth 2D image, from the integrated geometric feature information, and projecting each of the integrated 3D coordinate values and each of the Gaussian parameters corresponding thereto onto the plane of the 2D image through the projection matrix of at least one camera, thereby performing a differentiable form of tile rasterization (Differentiable Tile). The camera calibration apparatus according to claim 11, characterized in that it generates a rendered 2D image by performing Rasterization, calculates the error between the rendered 2D image and the first 2D image to the nth 2D image, and then optimizes the camera's position information and orientation information and the integrated geometric feature information by iteratively correcting each of the integrated 3D coordinate values and each of the Gaussian parameters so that the error is minimized.
17. If the processor optimizes the position and orientation information of at least one camera and the integrated geometric feature information using the Gaussian splatting model, the following formula is used to calculate: The camera calibration apparatus according to claim 16, characterized in that Ri and ti are external camera parameters among the camera parameters of at least one camera, Xj is each of the integrated 3D coordinate values corresponding to the 3D model, σj is the scale of the Gaussian splatting model, cj is the rendering attribute of the Gaussian splatting model, R is a rendering function based on the Gaussian splatting model, and L is a loss function that calculates the error between the rendered 2D image output through rendering and the first 2D image to the nth 2D image.
18. In the above process (II), When the processor generates the 3D model using the SfM model, it calculates the model using the following formula: The camera calibration apparatus according to claim 11, characterized in that Ri and ti are external camera parameters among the camera parameters of the at least one camera, Xj is each of the integrated 3D coordinate values corresponding to the 3D model, xij is each of the first 2D coordinate values corresponding to the first feature map for the first 2D image to the nth 2D coordinate value corresponding to the nth feature map for the nth 2D image, π is a projection function, and ρ is a loss function that calculates the difference between projected 2D coordinate values as a pixel coordinate system calculated by projecting the camera parameters of the at least one camera corresponding to each of the first 2D coordinate values to the nth 2D coordinate values and each of the integrated 3D coordinate values.
19. The camera calibration apparatus according to claim 11, characterized in that, when the at least one camera is a single camera, the processor generates the first 2D image to the nth 2D image by moving between a previously set first position to the nth position and capturing images at different angles from each other.
20. The camera calibration apparatus according to claim 11, characterized in that, when the at least one camera is a plurality of cameras, including a first camera to an nth camera, the processor generates a first 2D image obtained from the first camera and an nth 2D image obtained from the nth camera by taking images at mutually different angles in which each of the first camera to the nth camera is installed.