Method for estimating three-dimensional information of object and apparatus for performing same
The method and device enhance 3D object detection by calculating 3D coordinates from 2D images, addressing speed and recall issues in existing technologies, providing accurate 3D information for autonomous vehicles.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- 42DOT INC
- Filing Date
- 2025-12-24
- Publication Date
- 2026-07-02
Smart Images

Figure KR2025022825_02072026_PF_FP_ABST
Abstract
Description
Method for estimating three-dimensional information of an object and apparatus for performing the same
[0001] The following disclosure relates to a method for estimating three-dimensional information of an object and an apparatus for performing the same.
[0002] 2DOD (2-dimensional object detection) technology classifies objects contained in 2D images and detects the 2D locations of objects, offering the advantages of high accuracy and fast inference speed. However, in the field of autonomous driving, it may be important to obtain 3D information about objects located in front of the vehicle.
[0003] 3DOD (3-dimensional object detection) technology can be a technology that classifies objects using one or more images and detects the location of objects in a 3D world coordinate system. While 3DOD technology has the advantage of being stable and having low error in the 3D location of objects, it may have the disadvantages of slow inference speed and low recall for small objects.
[0004] The background technology described above is possessed or acquired by the inventor in the process of deriving the content of the disclosure of the present application, and cannot necessarily be considered as prior art disclosed to the general public prior to the filing of this application.
[0005] One embodiment may provide a technique for estimating three-dimensional information of an object from a two-dimensional image.
[0006] One embodiment may provide a technique for estimating three-dimensional information in the current frame by using three-dimensional information of an object in the previous frame.
[0007] However, technical challenges are not limited to the technical challenges described above, and other technical challenges may exist.
[0008] A method according to one embodiment may include the operation of detecting a box corresponding to the face of another vehicle located around an ego-vehicle, the operation of obtaining a yaw angle of the other vehicle based on the contact point between the wheel of the other vehicle and the ground and the box, and the operation of calculating three-dimensional information of the other vehicle based on the type of the other vehicle, the box, and the yaw angle.
[0009] According to one embodiment, the surface includes at least one surface that is included in the field of view of a camera mounted on the vehicle, and the wheel may include at least one wheel located in the opposite direction of the surface.
[0010] According to one embodiment, the three-dimensional information may include the coordinates of the three-dimensional center point of the other vehicle.
[0011] According to one embodiment, the operation of obtaining the yaw angle may include the operation of calculating the yaw angle based on the coordinates of the vertices of the box and the coordinates of the contact points.
[0012] According to one embodiment, the operation of calculating the three-dimensional information may include the operation of obtaining a width corresponding to the other vehicle based on the type of the other vehicle, and the operation of calculating the three-dimensional information of the other vehicle based on the width, the box, and the yaw angle.
[0013] According to one embodiment, the operation of acquiring the box may include the operation of acquiring a bounding box corresponding to the surface of the other vehicle using an object detection algorithm and the operation of determining the bounding box as the box.
[0014] According to one embodiment, the coordinates of the vertex and the coordinates of the contact point may be calculated by performing an inverse projection matrix operation based on correction parameters corresponding to a camera mounted on the vehicle.
[0015] According to one embodiment, a computer-readable recording medium storing one or more computer programs may include instructions for performing the method in a processor.
[0016] A device according to one embodiment may include at least one processor including a processing circuit and a memory for storing instructions. Based on the instructions being executed individually or collectively by the at least one processor, the device may detect a box corresponding to the face of another vehicle located around an ego-vehicle, obtain a yaw angle of the other vehicle based on the contact point between the wheel of the other vehicle and the ground and the box, and calculate three-dimensional information of the other vehicle based on the type of the other vehicle, the box, and the yaw angle.
[0017] According to one embodiment, the surface includes at least one surface that is included in the field of view of a camera mounted on the vehicle, and the wheel may include at least one wheel located in the opposite direction of the surface.
[0018] According to one embodiment, the three-dimensional information may include the coordinates of the three-dimensional center point of the other vehicle.
[0019] According to one embodiment, the instructions may be executed individually or collectively by the at least one processor, thereby enabling the device to calculate the yaw angle based on the coordinates of the vertices of the box and the coordinates of the contact points.
[0020] According to one embodiment, based on the instructions being executed individually or collectively by the at least one processor, the device may obtain a width corresponding to the other vehicle based on the type of the other vehicle, and calculate three-dimensional information of the other vehicle based on the width, the box, and the yaw angle.
[0021] According to one embodiment, based on the instructions being executed individually or collectively by at least one processor, the device may be enabled to obtain a bounding box corresponding to the surface of the other vehicle using an object detection algorithm and determine the bounding box as the box.
[0022] According to one embodiment, the coordinates of the vertex and the coordinates of the contact point may be calculated by performing an inverse projection matrix operation based on correction parameters corresponding to a camera mounted on the vehicle.
[0023] In relation to the description of the drawings, the same or similar reference numerals may be used for identical or similar components.
[0024] FIG. 1 is a drawing for explaining an object detection system according to one embodiment.
[0025] FIG. 2 is a diagram illustrating 2D cuboid detection according to one embodiment.
[0026] FIG. 3 is a diagram illustrating detection information for another vehicle according to one embodiment.
[0027] FIG. 4 is a diagram illustrating an inverse projection matrix operation according to one embodiment.
[0028] FIG. 5 is a diagram illustrating the operation of calculating three-dimensional information of another vehicle according to one embodiment.
[0029] FIGS. 6a to 6d are drawings for explaining the operation of calculating the coordinates of a three-dimensional center point of another vehicle according to one embodiment.
[0030] FIG. 7 is a graph for explaining the estimated location of another vehicle according to one embodiment.
[0031] FIG. 8 is a flowchart illustrating a method for estimating three-dimensional information of another vehicle according to one embodiment.
[0032] FIG. 9 is a flowchart illustrating a method according to one embodiment.
[0033] FIG. 10 is a schematic block diagram of an electronic device according to one embodiment.
[0034] Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be modified and implemented in various forms. Accordingly, actual implementations are not limited to the specific embodiments disclosed, and the scope of this specification includes modifications, equivalents, or substitutions included in the technical concept described by the embodiments.
[0035] Terms such as "first" or "second" may be used to describe various components, but these terms should be interpreted solely for the purpose of distinguishing one component from another. For example, the first component may be named the second component, and similarly, the second component may be named the first component.
[0036] When it is stated that a component is "connected" to another component, it should be understood that it may be directly connected to or coupled with that other component, or that there may be other components in between.
[0037] Singular expressions include plural expressions unless the context clearly indicates otherwise. In this document, phrases such as “A or B,” “at least one of A and B,” “at least one of A or B,” “A, B or C,” “at least one of A, B and C,” and “at least one of A, B, or C” may each include any one of the items listed together with the corresponding phrase, or all possible combinations thereof. In this specification, terms such as “comprising” or “having” are intended to designate the existence of the described feature, number, step, action, component, part, or combination thereof, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof.
[0038] Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by those skilled in the art. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant technology, and should not be interpreted in an ideal or overly formal sense unless explicitly defined in this specification.
[0039] As used herein, the term "module" may include a unit implemented in hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit. A module may be a component formed integrally, or a minimum unit of said component or a part thereof that performs one or more functions. For example, according to one embodiment, a module may be implemented in the form of an application-specific integrated circuit (ASIC).
[0040] As used in this document, the term "part" refers to software or hardware components, such as FPGAs or ASICs, and the "part" performs certain roles. However, the meaning of "part" is not limited to software or hardware. The "part" may be configured to reside in an addressable storage medium or configured to operate one or more processors. For example, the "part" may include components such as software components, object-oriented software components, class components, and task components, as well as processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables. The functions provided within the components and "parts" may be combined into a smaller number of components and "parts" or further separated into additional components and "parts." Furthermore, the components and "parts" may be implemented to operate one or more CPUs within a device or secure multimedia card. Additionally, '~part' may include one or more processors.
[0041] Hereinafter, embodiments will be described in detail with reference to the attached drawings. In the description with reference to the attached drawings, identical components are given the same reference numeral regardless of the drawing number, and redundant descriptions thereof will be omitted.
[0042]
[0043] FIG. 1 is a drawing for explaining an object detection system according to one embodiment.
[0044] Referring to FIG. 1, according to one embodiment, an object detection system (10) may be a system that detects an object located around a vehicle (e.g., vehicle (110)) and estimates three-dimensional information of the object. The object detection system (10) may include a vehicle (110), a three-dimensional information estimation device (100), and a server (130). The three-dimensional information estimation device (100) may be a device that detects an object located around the vehicle (110) (e.g., an object such as another vehicle or two-wheels) and estimates three-dimensional information of the object. For example, the three-dimensional information estimation device (100) may estimate three-dimensional information of another vehicle located around the vehicle (100). The three-dimensional information may include information such as the width, height, length, coordinates of a 3D center point, and yaw angle of the object. The three-dimensional information estimation device (100) may be mounted on the vehicle (110) or installed outside the vehicle (110) (e.g., on a server (130)). For example, the three-dimensional information estimation device (100) may be a software module implemented on a processor (not shown) of the vehicle (110), but is not limited thereto. The vehicle (110) means a vehicle that transports people and / or goods and may include a vehicle such as an automobile. The vehicle (110) may be an autonomous vehicle.
[0045] The 3D information estimation device (100) can detect a box (e.g., the box (310) of FIG. 3) corresponding to a face of another vehicle located around an ego-vehicle (e.g., vehicle (110)). For example, the 3D information estimation device (100) can detect a box corresponding to the rear of another vehicle located in front of the vehicle (110). The face of the other vehicle may include at least one face included in the field of view of a camera mounted on the ego-vehicle (e.g., vehicle (110)). For example, when a camera mounted on the vehicle (110) captures the front of the vehicle (110), the 3D information estimation device (100) can detect a box corresponding to a face of another vehicle located in front of the vehicle (110). The 3D information estimation device (100) can obtain a bounding box corresponding to a face of another vehicle using an object detection algorithm. The faces of the other vehicle may include the front, rear, and sides (e.g., left side, right side). The object detection algorithm may include an algorithm that recognizes and detects objects, such as 2D object detection (2D DOD) and 2D cuboid detection (2D cuboid detection), but is not limited to the above examples. The 3D information estimation device (100) may determine a bounding box corresponding to the face of the other vehicle as a box corresponding to the face of the other vehicle (e.g., the box (310) of FIG. 3).
[0046] The 3D information estimation device (100) can obtain the yaw angle of another vehicle based on a contact point between the wheel of another vehicle and the ground (e.g., contact point (320) in FIG. 3) and a box corresponding to the face of the other vehicle. The wheel of the other vehicle may include at least one wheel located in the opposite direction of the face of the other vehicle. For example, when the rear of the other vehicle is included in the field of view of a camera mounted on the vehicle (110), the 3D information estimation device (100) can obtain the yaw angle of the other vehicle based on a contact point between the front wheel of the other vehicle and the ground and a box corresponding to the rear of the other vehicle. The 3D information estimation device (100) can calculate the yaw angle based on the coordinates of the vertex of a box corresponding to the face of the other vehicle (e.g., box (310)) and the coordinates of the contact point between the wheel of the other vehicle and the ground (e.g., contact point (320)). For example, the 3D information estimation device (100) can calculate the yaw angle of the other vehicle based on the coordinates of the vertex corresponding to the right rear wheel of the other vehicle among the vertices of the box corresponding to the rear of the other vehicle, and the coordinates of the contact point between the right front wheel and the ground. The coordinates of the vertex of the box corresponding to the face of the other vehicle (e.g., box (310)) and the coordinates of the contact point between the wheel of the other vehicle and the ground (e.g., contact point (320)) may be calculated by performing an inverse projection matrix operation based on correction parameters corresponding to a camera mounted on the self vehicle (e.g., vehicle (110)). The inverse projection matrix operation based on correction parameters will be explained in detail later with reference to FIG. 4.
[0047] The 3D information estimation device (100) can calculate 3D information of another vehicle based on the type of other vehicle, a box corresponding to the face of the other vehicle, and the yaw angle of the other vehicle. The 3D information may include the coordinates of the 3D center point of the other vehicle. The 3D information estimation device (100) can obtain a width corresponding to the other vehicle based on the type of other vehicle located around the vehicle (110). The 3D information estimation device (100) can approximate the width of the other vehicle based on the type of other vehicle. For example, the 3D information estimation device (100) can obtain the width of the other vehicle based on a width classified according to a specified standard. The 3D information estimation device (100) can calculate 3D information of the other vehicle based on the width of the other vehicle obtained based on the type of other vehicle, a box corresponding to the face of the other vehicle, and the yaw angle of the other vehicle. For example, the 3D information estimation device (100) can use correction parameters associated with a camera mounted on the vehicle (110) to set the ground as a reference point Z=0, derive the coordinates of each vertex of a cuboid surrounding another vehicle, and calculate the coordinates of the 3D center point of the other vehicle. The 3D information estimation device (100) can calculate 3D information more accurately by using a width obtained based on the type of other vehicle, rather than using only the pixel width of the other vehicle in the image captured using the camera mounted on the vehicle (110). The operation of the 3D information estimation device (100) calculating the coordinates of the 3D center point of the other vehicle will be explained in detail later with reference to FIGS. 3 to 5.
[0048] The three-dimensional information estimation device (100), vehicle (110), and server (130) can communicate using a network (not shown). For example, the network may include a Local Area Network (LAN), a Wide Area Network (WAN), a Value Added Network (VAN), a mobile radio communication network, a satellite communication network, and combinations thereof. The network is a comprehensive data communication network that enables the three-dimensional information estimation device (100) and the server (130) to communicate smoothly with each other, and may include wired internet, wireless internet, and mobile wireless communication networks. Additionally, the wireless communication network may include, for example, wireless LAN (Wi-Fi), Bluetooth, Bluetooth Low Energy, Zigbee, WFD (Wi-Fi Direct), UWB (ultra-wideband), infrared communication (IrDA, infrared Data Association), NFC (Near Field Communication), but is not limited thereto.
[0049]
[0050] FIG. 2 is a diagram illustrating 2D cuboid detection according to one embodiment.
[0051] Referring to FIG. 2, according to one embodiment, FIG. 2 may illustrate the result of performing 2-dimensional cuboid detection on an object included in the field of view of a camera mounted on a vehicle (e.g., the vehicle (110) of FIG. 1). 2-dimensional cuboid detection may involve recognizing and classifying the type of object in a 2-dimensional image and detecting a cuboid corresponding to the object in a pixel coordinate system. For example, a cuboid (215) corresponding to a vehicle (210) included in an image taken of the front of the vehicle (110) may be detected, and the recognized type (e.g., CAR (211)) may be labeled. The 3D information estimation device (100) can perform 2D cuboid detection to obtain a box (e.g., box (310) of FIG. 3) corresponding to at least one face of another vehicle included in the field of view of a camera mounted on the vehicle (110). The 3D information estimation device (100) can perform 2D cuboid detection to obtain a box (e.g., object detection box (330) of FIG. 3) corresponding to the other vehicle itself.
[0052] Similar to 2DOD (2-dimensional object detection), 2D cuboid detection offers the advantages of high recall and precision, as well as fast inference speed. By detecting vertices that allow for the inference of the faces of the cuboids corresponding to the objects, 2D cuboid detection can derive additional 3D information, such as the yaw angle of the objects. However, since 2D cuboid detection derives 3D information based on the assumption that the object is located on the ground (e.g., Z=0), significant errors may occur in driving situations where the vehicle's pitch is unstable or on hills. Additionally, there is a problem in that it is difficult to derive 3D information for objects whose physical width is difficult to determine, such as those located in the adjacent lane or other vehicles turning right.
[0053]
[0054] FIG. 3 is a diagram illustrating detection information for another vehicle according to one embodiment.
[0055] Referring to FIG. 3, according to one embodiment, a three-dimensional information estimation device (e.g., the three-dimensional information estimation device (100) of FIG. 1) can detect objects (e.g., other vehicles, two-wheeled vehicles) located in the surroundings of a self-vehicle (e.g., the vehicle (110) of FIG. 1) and estimate three-dimensional information of the objects. The three-dimensional information may include information such as the width, height, length, coordinates of the three-dimensional center point, and yaw angle of the objects.
[0056] The three-dimensional information estimation device (100) can detect a box (e.g., box (310)) corresponding to a surface of another vehicle located around the vehicle (110). The surface of the other vehicle may include at least one surface included in the field of view of a camera mounted on the vehicle (110). For example, the three-dimensional information estimation device (100) can detect a box (310) corresponding to the rear of another vehicle located in front of the vehicle (110). For example, when a camera mounted on the vehicle (110) photographs the rear of the vehicle (110), the three-dimensional information estimation device (100) can detect a box corresponding to the front of another vehicle located behind the vehicle (110).
[0057] The 3D information estimation device (100) can obtain a bounding box corresponding to the surface of another vehicle by using an object detection algorithm. For example, the 3D information estimation device (100) can input an object detection algorithm (e.g., 2D cuboid detection algorithm) so that the 'rear of the vehicle' is detected, and obtain the output bounding box. The 3D information estimation device (100) can determine the bounding box corresponding to the surface of another vehicle as a box (310) corresponding to the surface of another vehicle. For example, the 3D information estimation device (100) can determine the bounding box corresponding to the rear of another vehicle as a box (310) corresponding to the rear of another vehicle.
[0058] The object detection box (330) may be a bounding box that detects the other vehicle itself through an object detection algorithm. For example, the 3D information estimation device (100) may input an object detection algorithm (e.g., 2D cuboid detection algorithm) to detect a 'vehicle' and obtain the output object detection box (330). The 3D information estimation device (100) may obtain the yaw angle of the other vehicle based on the contact point (e.g., contact point (320)) between the wheel of the other vehicle and the ground and the box (310). The wheel may be a wheel located in the opposite direction to at least one surface included in the field of view of the camera mounted on the vehicle (110). The 3D information estimation device (100) may calculate the yaw angle of the other vehicle based on the coordinates of the vertex of the box (310) and the coordinates of the contact point (320). For example, the three-dimensional information estimation device (100) can calculate the yaw angle of another vehicle based on the coordinates of the contact point (320) and the vertex corresponding to the right rear wheel of the other vehicle among the four vertices of the box (310) (e.g., the vertex at the bottom right). Since the contact point (320) has the same Z coordinate as the ground, the Z coordinate can be set to 0. The coordinates of the vertices of the box (310) and the coordinates of the contact point (320) may be calculated by performing an inverse projection matrix operation based on correction parameters corresponding to a camera mounted on the vehicle (e.g., vehicle (110)). The inverse projection matrix operation based on correction parameters will be explained in detail later with reference to FIG. 4.
[0059]
[0060] FIG. 4 is a diagram illustrating an inverse projection matrix operation according to one embodiment.
[0061] Referring to FIG. 4, according to one embodiment, an image captured using a camera mounted on a vehicle (e.g., vehicle (110) of FIG. 1) may require conversion from a pixel coordinate system to a three-dimensional world coordinate system. The world coordinate system may be a coordinate system expressed by an X-axis, a Y-axis, and a Z-axis with respect to an arbitrary origin to represent the position of an object. The camera coordinate system has the camera focal point (e.g., the center of the lens) as the origin, with the front direction as the Z-axis (e.g., Z c Axis), right direction is the X-axis (e.g., X c axis), downward direction is the Y-axis (e.g., Y c It can be a coordinate system with axes. The pixel coordinate system is an image coordinate system where the top-left corner of the image is the origin, the right direction is the X-axis, and the downward direction is the Y-axis. The plane determined by the X-axis and Y-axis of the pixel coordinate system can be defined as the image plane.
[0062] A point in 3D space (e.g., [X, Y, Z]) is connected to the camera's focal point (e.g., the origin of the camera coordinate system) to a point on the image plane (e.g., P img It can be projected onto ). A normalized image coordinate system is an image coordinate system from which the influence of the camera's intrinsic parameters has been removed, and the distance from the camera's focal point may be adjusted by translating the image plane. The camera's intrinsic parameters may be internal parameters of the camera itself, such as focal length, principal point, and skew coefficient. The normalized coordinate system may be introduced to eliminate differences caused by factors such as the type of camera used and camera settings.
[0063] The equation for converting a point in the 3D world coordinate system (e.g., [X, Y, X]) to a point in the pixel coordinate system (e.g., [x, y]) can be as shown in Equation 1 below.
[0064]
[0065] Here, the camera's external parameters (e.g., rotation matrix and translation vector (t x , t y , t z Using )), a point in the world coordinate system can be converted to the camera coordinate system, and using intrinsic parameters, a point in the camera coordinate system can be converted to a coordinate in the pixel coordinate system.
[0066]
[0067] Extrinsic parameters of a camera are parameters that represent the relationship between the camera coordinate system and the world coordinate system, and their values may vary depending on the position and orientation in which the camera is installed. Extrinsic parameters may include pitch, yaw, roll, and the coordinates where the camera is installed (e.g., x, y, z). To obtain 3D information of an object located in 3D space using a 2D image, inverse projection matrix operations may be required. However, since depth values cannot be determined solely from the coordinates in the 2D image, it may be necessary to assume a ground Z-value or require additional information. For example, assuming the object is placed on the ground, the X and Y coordinates can be calculated by assuming the object's Z-value is 0. However, errors in depth values may increase when the vehicle moves up or down hills or when the camera's pitch shakes, so additional sensor data and / or correction algorithms may be required.
[0068]
[0069] FIG. 5 is a diagram illustrating the operation of calculating three-dimensional information of another vehicle according to one embodiment.
[0070] Referring to FIG. 5, according to one embodiment, a three-dimensional information estimation device (e.g., the three-dimensional information estimation device (100) of FIG. 1) can acquire an image captured by a camera mounted on a vehicle (e.g., the vehicle (110) of FIG. 1). For example, the three-dimensional information estimation device (100) can acquire an image of the front of the vehicle (110) captured by a camera mounted on the vehicle (110). The three-dimensional information estimation device (100) can perform 2D cuboid detection on the image captured by the camera mounted on the vehicle (110). The three-dimensional information estimation device (100) can detect an object (e.g., another vehicle) located around the vehicle (110) by performing 2D cuboid detection. For example, the three-dimensional information estimation device (100) can detect another vehicle located in front of the vehicle (110) by performing 2D cuboid detection. The three-dimensional information estimation device (100) can obtain information about other vehicles. For example, the three-dimensional information estimation device (100) can obtain the type of other vehicle, a bounding box corresponding to the other vehicle (e.g., object detection box (330) of FIG. 3), a box corresponding to the surface of the other vehicle (e.g., box (310) of FIG. 3), and a contact point between the wheel of the other vehicle and the ground (e.g., contact point (320) of FIG. 3).
[0071] The 3D information estimation device (100) can perform 2D cuboid detection to obtain the type of other vehicle located around the vehicle (110). For example, the 3D information estimation device (100) can recognize the type of other vehicle located in front of the vehicle (110) and classify it into categories such as passenger cars (e.g., light cars, small cars, medium cars, large cars), SUVs (sport utility vehicles) (e.g., small SUVs, large SUVs), vans (e.g., minivans, buses), trucks, and special vehicles (e.g., campervans, fire trucks, ambulances, tow trucks). Based on the obtained type of other vehicle, the 3D information estimation device (100) can obtain the width corresponding to the other vehicle. The 3D information estimation device (100) can approximate the width of the other vehicle based on the type of other vehicle. The three-dimensional information estimation device (100) can obtain the width of another vehicle based on a width classified according to a predetermined standard. For example, the three-dimensional information estimation device (100) can determine 1.6 meters as the width of another vehicle in response to the determination that the type of other vehicle is a light car. For example, the three-dimensional information estimation device (100) can determine 2.3 meters as the width of another vehicle in response to the determination that the type of other vehicle is a bus. Note that the width of the other vehicle described above is assumed for convenience of explanation and the approximate width value of the other vehicle is not limited to the above figures.
[0072] The 3D information estimation device (100) can perform undistortion and back-projection matrix operations based on object detection boxes (330), boxes (310), and contact points (320) corresponding to other vehicles. Distortion correction may be a process of removing lens distortion that may be caused by a camera. Distortion correction may be a process that allows coordinate system transformation to be performed more accurately by correcting distortion present in the image using distortion parameters. Back-projection matrix operations may be for obtaining 3D information of an object using a 2D image. The 3D information estimation device (100) can obtain the yaw angle of the other vehicle, the coordinates of the vertices of a box (e.g., box (310) in FIG. 3) corresponding to the face of the other vehicle (e.g., xl, yl), the pixel width of the other vehicle in the image, and the height in pixel units by performing distortion correction and back-projection matrix operations. The angle can be calculated based on the 3D coordinates of the vertices of the box corresponding to the face of the other vehicle and the 3D coordinates of the contact point.
[0073] A three-dimensional information estimation device (100) can obtain three-dimensional information of another vehicle based on the width, yaw angle, and camera correction parameters of another vehicle obtained based on the type of other vehicle. The camera correction parameters may include intrinsic parameters and extrinsic parameters. Extrinsic parameters are parameters representing the relationship between the camera coordinate system and the world coordinate system, and their values may vary depending on the location and direction in which the camera is installed. Extrinsic parameters may include pitch, yaw, roll, and the coordinates where the camera is installed (e.g., x, y, z). Intrinsic parameters may be intrinsic parameters of the camera itself, such as focal length, principal point, and skew coefficient. Intrinsic parameters may include distortion parameters used for distortion correction. The three-dimensional information estimation device (100) can calculate three-dimensional information of another vehicle based on the width, yaw angle, and camera correction parameters of another vehicle. 3D information may include the coordinates of the 3D center point of another vehicle.
[0074] In 3DOD, the coordinates of the 3D center point of an object can be obtained directly, but there is a disadvantage that the inference speed is slower than in 2DOD and the recall rate is low for small objects. The 3D information estimation device (100) can obtain the coordinates of the 3D center point of another vehicle at a fast inference speed by using 2D cuboid detection performed based on a 2D image.
[0075]
[0076] FIGS. 6a to 6d are drawings for explaining the operation of calculating the coordinates of a three-dimensional center point of another vehicle according to one embodiment.
[0077] Referring to FIGS. 6a through 6d, according to one embodiment, a three-dimensional information estimation device (e.g., the three-dimensional information estimation device (100) of FIG. 1) can calculate three-dimensional information of another vehicle based on the calculated yaw angle of another vehicle. The three-dimensional information estimation device (100) can normalize the yaw angle of another vehicle and calculate three-dimensional information of another vehicle in response to the normalized yaw angle falling within a predetermined threshold range. The three-dimensional information estimation device (100) has a predetermined period (e.g., Based on ), the yaw angle can be normalized. For example, the 3D information estimation device (100) has the yaw angle of another vehicle In the case of, since, It can be normalized. The 3D information estimation device (100) can calculate 3D information based on the yaw angle of another vehicle and the width of another vehicle in response to the yaw angle of another vehicle (e.g., normalized yaw angle) falling within a predetermined threshold range. For example, the 3D information estimation device (100) can calculate the absolute value of the normalized yaw angle of another vehicle Less than, or In response to exceeding, three-dimensional information of the other vehicle can be calculated. The operation of calculating three-dimensional information of the other vehicle when the yaw angle of the other vehicle (e.g., normalized yaw angle) does not fall within a defined threshold range will be explained in detail later with reference to FIG. 8.
[0078] The 3D information estimation device (100) can calculate 3D information by dividing cases based on the yaw angle of the other vehicle and its position in the camera coordinate system in order to calculate 3D information of the other vehicle while minimizing distortion caused by the camera. The 3D information estimation device (100) can calculate the coordinates of the point closest to the center point of the camera and calculate 3D information of the other vehicle based on the calculated coordinates in order to minimize distortion caused by the camera. FIGS. 6a to 6d may be intended to explain the operation of the 3D information estimation device (100) according to one embodiment calculating coordinates corresponding to the wheels of the other vehicle based on the yaw angle of the other vehicle located in front of the vehicle (e.g., vehicle (110) of FIG. 1). Note that mathematical formulas described with reference to FIGS. 6a to 6d can also be applied to other vehicles not located in front of the vehicle (110) through a transformation of the coordinate system (e.g., transformation to the world coordinate system).
[0079] Fig. 6a shows that the yaw angle of the other vehicle is greater than 0, and Smaller than and located to the left of the camera (e.g., X c It may represent the case of <0).
[0080] Referring to FIG. 6a, according to one embodiment, a three-dimensional information estimation device (100) uses the following mathematical formulas 2 and 3 to obtain a coordinate (X) corresponding to the right rear wheel of another vehicle in a camera coordinate system. r , Z r ) can be calculated.
[0081]
[0082]
[0083] In mathematical formulas 2 and 3, is the camera's focal length, and W is the width of the other vehicle obtained based on the type of the other vehicle, is the pixel width of other vehicles in the image, is the center coordinates of the camera (e.g., optical center), It can represent the coordinates corresponding to the right rear wheel of another vehicle on a 2D image.
[0084] The 3D information estimation device (100) calculates the coordinates (X r , Z r Based on ), the coordinates of the 3D center point of another vehicle can be calculated. Coordinates (X r , Z r ) is a coordinate in the camera coordinate system, which can then be converted to a coordinate in the 3D world coordinate system.
[0085]
[0086] Fig. 6b shows that the yaw angle of the other vehicle is greater than 0, and Smaller than and located to the right of the camera (e.g., X c It may represent the case of >0).
[0087] Referring to FIG. 6b, according to one embodiment, a three-dimensional information estimation device (100) uses the following mathematical formulas 4 and 5 to obtain a coordinate (X) corresponding to the left rear wheel of another vehicle in the camera coordinate system. l , Z l ) can be calculated.
[0088]
[0089]
[0090] In mathematical formulas 4 and 5, It can represent the coordinates corresponding to the left rear wheel of another vehicle on a 2D image.
[0091]
[0092] The 3D information estimation device (100) calculates the coordinates (X l , Z l Based on ), the coordinates of the 3D center point of another vehicle can be calculated. Coordinates (X l , Zl ) is a coordinate in the camera coordinate system, which can then be converted to a coordinate in the 3D world coordinate system.
[0093]
[0094] Fig. 6c shows the yaw angle of the other vehicle Greater than and less than 0, located to the left of the camera (e.g., X c It may represent the case of <0).
[0095] Referring to FIG. 6c, according to one embodiment, a three-dimensional information estimation device (100) uses Equations 2 and 3 described with reference to FIG. 6a to obtain a coordinate (X) corresponding to the right rear wheel of another vehicle in the camera coordinate system. r , Z r ) can be calculated. Coordinates (X in Fig. 6c) r , Z r The operation of calculating ) can be substantially the same as in the case of FIG. 6a. Calculated coordinates (X r , Z r ) can be as follows.
[0096]
[0097]
[0098] The 3D information estimation device (100) calculates the coordinates (X r , Z r Based on ), the coordinates of the 3D center point of another vehicle can be calculated. Coordinates (X r , Z r ) is a coordinate in the camera coordinate system, which can then be converted to a coordinate in the 3D world coordinate system.
[0099]
[0100] Fig. 6d shows the yaw angle of the other vehicle Greater than and less than 0, located to the right of the camera (e.g., X c It may represent a case where it is to the left of >0).
[0101] Referring to FIG. 6d, according to one embodiment, a three-dimensional information estimation device (100) uses Equations 4 and 5 described with reference to FIG. 6b to obtain a coordinate (X) corresponding to the left rear wheel of another vehicle in the camera coordinate system. l , Z l ) can be calculated. Coordinates (X in Fig. 6d) l , Z l The operation of calculating ) can be substantially the same as in the case of FIG. 6b. Calculated coordinates (X l , Z l ) can be as follows.
[0102]
[0103]
[0104] The 3D information estimation device (100) calculates the coordinates (X l , Z l Based on ), the coordinates of the 3D center point of another vehicle can be calculated. Coordinates (X l , Z l ) is a coordinate in the camera coordinate system, which can then be converted to a coordinate in the 3D world coordinate system.
[0105]
[0106] FIG. 7 is a graph for explaining the estimated location of another vehicle according to one embodiment.
[0107] Referring to FIG. 7, according to one embodiment, the graph (700) may be a graph showing the results of 2D cuboid detection.
[0108] The X-axis of the graph (700) represents time, and the Y-axis may represent a depth value, which is the distance from the vehicle (e.g., the vehicle (110) of FIG. 1). The graph (700) may represent the change in distance from the vehicle (110) over time when another vehicle located in front of the vehicle (110) turns right, goes up a hill, or is located in an adjacent lane. Values (710) and (715) are values calculated using only general 2D cuboid detection, and values (730) and (735) may be values calculated by a 3D information estimation device (e.g., the 3D information estimation device (100) of FIG. 1) using the 3D information estimation method described with reference to FIG. 1 to FIG. 6d. Value (701) is ground truth data, and the closer the result is to value (701), the better the performance may be.
[0109]
[0110] FIG. 8 is a flowchart illustrating a method for estimating three-dimensional information of another vehicle according to one embodiment.
[0111] Referring to FIG. 8, according to one embodiment, a three-dimensional information estimation device (e.g., the three-dimensional information estimation device (100) of FIG. 1) may calculate three-dimensional information differently in response to the fact that the yaw angle of another vehicle does not fall within a predetermined threshold range. Since the face (e.g., rear) of another vehicle may not be recognized depending on the yaw angle of another vehicle, the three-dimensional information estimation device (100) may calculate the three-dimensional information of another vehicle differently. For example, the three-dimensional information estimation device (100) [calculates] the absolute value of the normalized yaw angle of another vehicle That is all, In response to the following, the three-dimensional information of other vehicles can be calculated differently from the method described with reference to FIGS. 1 to 7.
[0112] The 3D information estimation device (100) may use adaptive IPM and height ratio-based IPM in addition to general inverse projection matrix operation (IPM), width-based IPM, and width-based IPM considering heading. The 3D information of another vehicle in the current frame can be calculated using the 3D information of another vehicle in the previous frame. The inverse projection matrix operation, width-based IPM, and width-based IPM considering heading may be substantially the same as the method of estimating the 3D information of another vehicle by the 3D information estimation device (100) described with reference to FIGS. 1 to 7.
[0113] Adaptive IPM may determine alignment with an object by assuming that a camera mounted on the self-vehicle (e.g., vehicle (110) of FIG. 1) faces forward and adjusting the Z-axis of the world coordinate system within a defined range. For example, the 3D information estimation device (100) can detect the position that most aligns with the Z-value of the other vehicle by adjusting the Z-axis from -1.5 to 1.5. Height ratio-based IPM (inverse projection matrix) may estimate the 3D information of the other vehicle by assuming that the height of the other vehicle in the previous frame (e.g., the t-th frame) and the current frame (e.g., the t+1-th frame) is constant. Height ratio-based IPM may be performed based on the following Equation 6. The 3D information estimation device (100) uses the following Equation 6 to determine Z, which is the Z-value of the other vehicle in the t+1-th frame. t+1 It can calculate.
[0114]
[0115] Here, is the pixel height of the other vehicle in the t-th frame, is the pixel height of the other vehicle in the t+1th frame, can represent the focal length of the camera.
[0116] The 3D information estimation device (100) can determine whether the current frame is an initial tracking frame. For example, it can determine whether there was a frame among the frames prior to the current frame in which the yaw angle was included within a defined threshold range, allowing for the calculation of 3D information based on the width of the other vehicle. In response to the fact that the 3D information of the other vehicle within the current frame is being calculated for the first time, the 3D information estimation device (100) can determine the priority of the method to be used to calculate the 3D information. For example, the 3D information estimation device (100) can determine the priority of the operation to be performed first among the remaining operations, excluding the height ratio-based IPM. Note that the priority can be determined without being limited to a specific order. In response to the availability of previous frames, the 3D information estimation device (100) can search for a nearby tracking result (e.g., a frame among the previous frames that calculated 3D information). Based on the nearby tracking result, the 3D information estimation device (100) can use the height ratio-based IPM. For example, the 3D information estimation device (100) can calculate the Z value of the other vehicle in the current frame based on the above mathematical formula 6. The calculated result is input into the UKF (unscented Kalman filter) CA (constant acceleration) model, and the 3D information of the other vehicle can be calculated. UKF is a filtering technique used to estimate the state in a nonlinear system, and may be designed to resolve the inaccuracy of the linearization process by extending the Kalman filter. The CA model is a dynamic model that assumes an object moves with constant acceleration, and UKF can be used to estimate the state (e.g., position, velocity) by considering the nonlinearity in the CA model. The 3D information estimation device (100) can use the UKF CA model to estimate the state (e.g., position, velocity) of the other vehicle by assuming that the other vehicle is moving with constant acceleration.Note that the UKF CA model is an example of a dynamic model that a 3D information estimation device can use to estimate the state (e.g., position, velocity) of other vehicles, and is not limited to the UKF CA model. The estimated 3D information can be updated with pixel information for the current frame.
[0117]
[0118] FIG. 9 is a flowchart illustrating a method according to one embodiment.
[0119] Referring to FIG. 9, according to one embodiment, operations 910 to 950 may be operations performed by the three-dimensional information estimation device (100) of FIG. 1 described with reference to FIG. 1 to FIG. 8.
[0120] According to one embodiment, operations 910 to 950 may be understood to be performed in a processor (e.g., processor (1030) of FIG. 10) of a three-dimensional information estimation device (100) described with reference to FIG. 1 (e.g., electronic device (1000) of FIG. 10).
[0121] In operation 910, the three-dimensional information estimation device (100) can detect a box (e.g., the box (310) of FIG. 3) corresponding to the surface of another vehicle located around the self-vehicle (e.g., the vehicle (110) of FIG. 1). The three-dimensional information estimation device (100) can use an object detection algorithm to obtain a bounding box corresponding to the surface of the other vehicle and determine the bounding box as the box (310). The surface of the other vehicle may include at least one surface that is included in the field of view of a camera mounted on the self-vehicle (e.g., the vehicle (110) of FIG. 1).
[0122] In operation 930, the three-dimensional information estimation device (100) can obtain the yaw angle of the other vehicle based on the contact point between the wheel of the other vehicle and the ground (e.g., the contact point (320) in FIG. 3) and the box (310). The wheel of the other vehicle may include at least one wheel located in the opposite direction of at least one surface included in the field of view of the camera mounted on the self vehicle (e.g., the vehicle (110) in FIG. 1). The three-dimensional information estimation device (100) can calculate the yaw angle based on the coordinates of the vertex of the box (310) and the coordinates of the contact point (320).
[0123] In operation 950, the three-dimensional information estimation device (100) can calculate three-dimensional information of another vehicle based on the type, box, and yaw angle of the other vehicle. The three-dimensional information may include the coordinates of the three-dimensional center point of the other vehicle.
[0124] Operations 910 through 950 may be performed sequentially, but are not limited thereto. For example, two or more operations may be performed in parallel.
[0125]
[0126] FIG. 10 is a schematic block diagram of an electronic device according to one embodiment.
[0127] Referring to FIG. 10, according to one embodiment, an electronic device (1000) (e.g., a three-dimensional information estimation device (100) of FIG. 1) may include a memory (1010) and a processor (1030).
[0128] The memory (1010) can store instructions (or programs) executable by the processor (1030). For example, the instructions may include instructions for executing the operation of the processor (1030) and / or the operation of each component of the processor (1030).
[0129] The memory (1010) may include one or more computer-readable storage media. The memory (1010) may include non-volatile storage devices (e.g., magnetic hard disc, optical disc, floppy disc, flash memory, EPROM (electrically programmable memories), EEPROM (electrically erasable and programmable)).
[0130] The memory (1010) may be a non-transitory medium. The term "non-transitory" may indicate that the storage medium is not implemented by a carrier wave or a propagated signal. However, the term "non-transitory" should not be interpreted as meaning that the memory (1010) is immobile.
[0131] The processor (1030) can process data stored in memory (1010). The processor (1030) can execute computer-readable code (e.g., software) stored in memory (1010) and instructions triggered by the processor (1030).
[0132] The processor (1030) may be a data processing device implemented in hardware having a circuit having a physical structure for executing desired operations. For example, the desired operations may include code or instructions included in a program.
[0133] For example, a data processing device implemented in hardware may include a microprocessor, a central processing unit, a processor core, a multi-core processor, a multiprocessor, an Application-Specific Integrated Circuit (ASIC), and a Field Programmable Gate Array (FPGA).
[0134] The processor (1030) can cause the electronic device (1000) to perform one or more operations by executing code and / or instructions stored in memory (1010). The operations performed by the electronic device (1000) may be substantially the same as the operations performed by the three-dimensional information estimation device (100) described with reference to FIGS. 1 through 10. Such redundant descriptions are omitted.
[0135]
[0136] The embodiments described above may be implemented as hardware components, software components, and / or combinations of hardware and software components. For example, the devices, methods, and components described in the embodiments may be implemented using a general-purpose computer or a special-purpose computer, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing unit may execute an operating system (OS) and software applications executed on said operating system. Additionally, the processing unit may access, store, manipulate, process, and generate data in response to the execution of the software. For ease of understanding, the processing unit may be described as being used as a single unit, but those skilled in the art will understand that the processing unit may include multiple processing elements and / or multiple types of processing elements. For example, the processing unit may include multiple processors or one processor and one controller. In addition, other processing configurations, such as parallel processors, are also possible.
[0137] Software may include computer programs, code, instructions, or a combination of one or more of these, and may configure a processing unit to operate as desired or instruct the processing unit independently or collectively. Software and / or data may be stored on any type of machine, component, physical device, virtual equipment, computer storage medium, or device so as to be interpreted by the processing unit or to provide instructions or data to the processing unit. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on computer-readable recording media.
[0138] The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may store program instructions, data files, data structures, etc., either individually or in combination, and the program instructions recorded on the medium may be those specifically designed and configured for the embodiment or those known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes; optical recording media such as CD-ROMs and DVDs; magneto-optical media such as floptical disks; and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, and flash memory. Examples of program instructions include machine code, such as that generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.
[0139] The hardware device described above may be configured to operate as one or more software modules to perform the operation of the embodiment, and vice versa.
[0140] Although the embodiments have been described above with reference to the limited drawings, those skilled in the art can apply various technical modifications and variations based thereon. For example, suitable results may be achieved even if the described techniques are performed in a different order than described, and / or if the components of the described system, structure, device, circuit, etc. are combined or assembled in a form different from described, or replaced or substituted by other components or equivalents.
[0141] Therefore, other implementations, other embodiments, and equivalents to the claims also fall within the scope of the claims set forth below.
Claims
1. An operation to detect a box corresponding to the face of another vehicle located around the ego-vehicle; An operation to obtain the yaw angle of the other vehicle based on the contact point between the wheel of the other vehicle and the ground and the box; and Operation of calculating three-dimensional information of the other vehicle based on the type of the other vehicle, the box, and the yaw angle. A method including 2. In Paragraph 1, The above surface is, At least one surface included in the field of view of the camera mounted on the vehicle above Includes, The above wheel is, At least one wheel located on the opposite side of the above surface A method including 3. In Paragraph 1, The above three-dimensional information is, Coordinates of the 3D center point of the aforementioned other vehicle A method including 4. In Paragraph 1, The operation of obtaining the above angle is, The operation of calculating the yaw angle based on the coordinates of the vertices of the above box and the coordinates of the above contact points. A method including 5. In Paragraph 1, The operation of calculating the above three-dimensional information is, An operation to obtain a width corresponding to the other vehicle based on the type of the other vehicle; and Operation of calculating three-dimensional information of the other vehicle based on the width, the box, and the yaw angle. A method including 6. In Paragraph 1, The operation of acquiring the above box is, The operation of obtaining a bounding box corresponding to the surface of the other vehicle using an object detection algorithm; and The operation of determining the above bounding box as the above box A method including 7. In Paragraph 4, The coordinates of the above vertex and the coordinates of the above contact point are, A method calculated by performing an inverse projection matrix operation based on correction parameters corresponding to a camera mounted on the vehicle.
8. A computer program stored on a computer-readable recording medium in combination with hardware to execute the method of any one of claims 1 through 7.
9. In the device, At least one processor including processing circuitry; and memory that stores instructions Includes, Based on the above instructions being executed individually or collectively by the at least one processor, the device, Detect boxes corresponding to the faces of other vehicles located around the ego-vehicle, and Based on the contact point between the wheel of the other vehicle and the ground and the box, the yaw angle of the other vehicle is obtained, and A device for calculating three-dimensional information of the other vehicle based on the type of the other vehicle, the box, and the yaw angle.
10. In Paragraph 9, The above surface is, At least one surface included in the field of view of the camera mounted on the vehicle above Includes, The above wheel is, At least one wheel located on the opposite side of the above surface A device including 11. In Paragraph 10, The above three-dimensional information is, Coordinates of the 3D center point of the aforementioned other vehicle A device including 12. In Paragraph 10, Based on the above instructions being executed individually or collectively by the at least one processor, the device, A device for calculating the yaw angle based on the coordinates of the vertices of the above box and the coordinates of the above contact points.
13. In Paragraph 10, Based on the above instructions being executed individually or collectively by the at least one processor, the device, Based on the type of the other vehicle mentioned above, obtain a width corresponding to the other vehicle, and A device for calculating three-dimensional information of the other vehicle based on the width, the box, and the yaw angle.
14. In Paragraph 10, Based on the above instructions being executed individually or collectively by the at least one processor, the device, Using an object detection algorithm, a bounding box corresponding to the surface of the other vehicle is obtained, and A device that determines the bounding box as the box.
15. In Paragraph 12, The coordinates of the above vertex and the coordinates of the above contact point are, A device calculated by performing an inverse projection matrix operation based on correction parameters corresponding to a camera mounted on the vehicle.