Object recognition device

By combining 3D information and texture information, the object recognition method solves the problem of unstable object recognition in existing technologies and achieves high-precision object recognition in complex environments.

CN116348937BActive Publication Date: 2026-06-26ASTEMO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ASTEMO LTD
Filing Date
2021-09-10
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies can only determine the category of an object based on local information when the object spans both the 3D region and the monocular region, resulting in unstable recognition.

Method used

The method combines three-dimensional information and texture information. The three-dimensional information acquisition unit and the texture information acquisition unit acquire the three-dimensional and texture information of the object. The confidence level is calculated by the confidence level calculation unit, and the object category is determined by the object category determination unit.

Benefits of technology

It achieves high-precision object recognition across overlapping and non-overlapping areas of the field of view, improving the stability and accuracy of recognition, especially under conditions such as night and rain.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116348937B_ABST
    Figure CN116348937B_ABST
Patent Text Reader

Abstract

The object recognition device of the present application includes: a three-dimensional information acquisition unit (102) that acquires three-dimensional information from a field of view overlap region of a first sensor (100) and a second sensor (101); a texture information acquisition unit (103) that acquires texture information of the field of view overlap region and a field of view non-overlap region of the first sensor and the second sensor; an object detection unit (104) that detects an object photographed by the field of view overlap region and the field of view non-overlap region based on information acquired by the three-dimensional information acquisition unit and the texture information acquisition unit; a first confidence calculation unit (106) that calculates a confidence of an identification result of the object based on the three-dimensional information of the field of view overlap region; a second confidence calculation unit (107) that calculates a confidence of an identification result of the object based on the texture information of the field of view overlap region and the field of view non-overlap region; and an object category determination unit (108) that determines a category of the object based on the confidence.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to an object recognition device. Background Technology

[0002] To achieve autonomous driving and prevent traffic accidents, there is high expectation for technologies that can recognize stereoscopic objects in wide-angle sensing areas. To achieve wide-angle sensing, Patent Document 1 discloses a method for recognizing objects in a stereo camera, wherein the stereo camera has a stereo region with overlapping fields of view and a monocular region with non-overlapping fields of view. In Patent Document 1, assuming an object moves from the monocular region to the stereo region, it determines whether the object detected in the monocular region enters the stereo region in the next frame. If it is determined to have entered, the object's category is determined based on parallax information in subsequent frames.

[0003] Existing technical documents

[0004] Patent documents

[0005] Patent Document 1: Japanese Patent Application Publication No. 2014-67198 Summary of the Invention

[0006] The technical problem that the invention aims to solve

[0007] Therefore, in cases where an object is captured across both the stereoscopic and monocular regions, Patent Document 1 identifies the object based on the parallax information of the stereoscopic region. However, in frames where the object enters the stereoscopic region, only a portion of the object area is captured in the stereoscopic region, so the category can only be determined based on local information, making stable recognition a problem to be solved.

[0008] In view of the problems mentioned above, the present invention aims to provide an object recognition device that can identify objects captured across overlapping and non-overlapping areas of the field of view with high precision.

[0009] Technical means to solve the problem

[0010] The object recognition device of the present invention, which solves the above problems, is characterized by comprising: a three-dimensional information acquisition unit that acquires three-dimensional information from the overlapping area of ​​the field of view of a first sensor and a second sensor; a texture information acquisition unit that acquires texture information of the overlapping area and the non-overlapping area of ​​the field of view of the first sensor and the second sensor; an object detection unit that detects objects captured by the overlapping area and the non-overlapping area of ​​the field of view based on the information acquired by the three-dimensional information acquisition unit and the texture information acquisition unit; a confidence calculation unit having a first confidence calculation unit and a second confidence calculation unit, wherein the first confidence calculation unit calculates a first confidence level as an object recognition result based on the three-dimensional information of the overlapping area of ​​the field of view, and the second confidence calculation unit calculates a second confidence level as an object recognition result based on the texture information of the overlapping area and the non-overlapping area of ​​the field of view; and an object category determination unit that determines the object category based on the confidence level calculated by the confidence calculation unit.

[0011] Invention Effects

[0012] According to the present invention, objects captured across overlapping and non-overlapping areas of the field of view can be identified with high precision. Other features related to the present invention will be described in this specification and with reference to the accompanying drawings. Furthermore, technical problems, technical features, and technical effects other than those described above will be clarified through the following description of embodiments. Attached Figure Description

[0013] Figure 1 This is a functional block diagram illustrating the structure of the object recognition device according to the first embodiment of the present invention.

[0014] Figure 2 For explanation Figure 1 Example 1 of the operation of the object recognition device in the disclosed embodiments is a diagram showing the positional relationship of the objects to be recognized.

[0015] Figure 3 It means Figure 1 A flowchart of an example of the operation of the object recognition device in the disclosed embodiments.

[0016] Figure 4 yes Figure 1 A flowchart of the object recognition device in the disclosed embodiments regarding the detection of three-dimensional objects.

[0017] Figure 5 yes Figure 1 A flowchart illustrating the weight calculation of the recognition score by the object recognition device in the disclosed embodiments.

[0018] Figure 6 yes Figure 1 A flowchart of the object recognition device in the disclosed embodiment regarding vehicle control.

[0019] Figure 7 yes Figure 1 A conceptual diagram of the weight calculation based on a convolutional neural network in the disclosed embodiment of the object recognition device.

[0020] Figure 8 yes Figure 1 A conceptual diagram of the parallax folding processing of the object recognition device in the disclosed embodiments.

[0021] Figure 9 yes Figure 1 A flowchart of the object detection method of the object recognition device in the disclosed embodiments.

[0022] Figure 10 yes Figure 1 A flowchart of the method for calculating the three-dimensional position of the object recognition device in the disclosed embodiments. Detailed Implementation

[0023] The embodiments of the present invention will be described in detail below.

[0024] (Example 1)

[0025] Figure 1 This is a functional block diagram showing the structure of the object recognition device 1 in Embodiment 1.

[0026] The object recognition device 1 of this embodiment is installed in the vehicle itself, for example, to identify and process objects located in front of the vehicle. The object recognition device 1 consists of a camera, a computer, a memory (RAM), and a storage device, etc. The computer operates as various functional units by executing control programs stored in the memory (RAM).

[0027] like Figure 1 As shown, in the object recognition device 1, the functional units realized by the actions of the camera and the computer include a first sensor 100, a second sensor 101, a three-dimensional information acquisition unit 102, a texture information acquisition unit 103, an object detection unit 104, a confidence calculation unit 105 composed of a first confidence calculation unit 106 and a second confidence calculation unit 107, an object category determination unit 108, and a vehicle control unit 109.

[0028] The first sensor 100 and the second sensor 101 are composed of a camera capable of acquiring images, a millimeter-wave radar capable of acquiring three-dimensional information, a LiDAR, etc. The first sensor 100 and the second sensor 101 have structures capable of acquiring texture (brightness) information and three-dimensional information. For example, a structure formed by combining two cameras (a so-called stereo camera), or a structure formed by combining a camera and a millimeter-wave radar. The first sensor 100 and the second sensor 101 have overlapping fields of view that share a view and non-overlapping fields of view that do not overlap. Furthermore, the field of view in this invention is not limited to the field of view of an image, but is a broad concept representing the detection range of the sensor.

[0029] The 3D information acquisition unit 102 acquires 3D information based on information from at least one of the first sensor 100 and the second sensor 101. Specifically, when the first sensor 100 and the second sensor 101 are composed of two cameras, 3D information is acquired by performing triangulation based on the positional relationship of each camera and the internal parameters of the cameras. Furthermore, when the first sensor 100 is a millimeter-wave radar and the second sensor 101 is a camera, the measurement results from the millimeter-wave radar are acquired as 3D information.

[0030] The texture information acquisition unit 103 acquires information from at least one of the first sensor 100 and the second sensor 101 as the texture information of the object. When the first sensor 100 and the second sensor 101 are composed of two cameras, the image information acquired by the first sensor 100 and the second sensor 101 is the texture information. Furthermore, when the first sensor 100 is a millimeter-wave radar and the second sensor 101 is a camera, the image information acquired by the second sensor 101 is the texture information.

[0031] The object detection unit 104 detects objects based on information acquired by the 3D information acquisition unit 102 and the texture information acquisition unit 103. As an object detection method, the detection position can be expanded based on texture information after detection processing based on 3D information, or the detection position can be corrected based on 3D information after object detection based on texture information. As an object detection method based on 3D information, there are 3D information clustering methods. When the first and second sensors are cameras, 3D information can be projected into 2D image information to generate a distance image or disparity image, and then clustering processing based on distance or disparity can be performed to detect objects. As an object detection method based on texture information, there are methods using statistical machine learning. Alternatively, a template image corresponding to the object to be identified can be pre-calculated, and the object can be detected through template matching.

[0032] The confidence calculation unit 105 includes a first confidence calculation unit 106 that calculates an object recognition score (recognition result) based on 3D information, and a second confidence calculation unit 107 that calculates an object recognition score (recognition result) based on texture information. The first confidence calculation unit 106 calculates a first confidence score based on the 3D information of the region detected by the object detection unit 104, and performs object recognition processing based on the first confidence score. In the 3D information-based recognition processing, machine learning can be used with inputs such as the output distance of a millimeter-wave radar, point group information of a LiDAR, and distance information of a stereo camera. Alternatively, distance images and disparity images can be generated by overlaying the acquired 3D information onto a 2D image, and recognition processing can be performed based on the generated distance images and disparity images. The second confidence calculation unit 107 calculates a second confidence score based on the texture information of the region detected by the object detection unit 104, and performs object recognition processing based on the second confidence score. The object recognition processing can be template matching using a specific template, or object recognition can be performed through statistical machine learning.

[0033] The object category determination unit 108 determines the object category based on the confidence score calculated by the confidence score calculation unit 105. Specifically, it integrates the confidence scores, i.e., the recognition scores, calculated by the first confidence score calculation unit 106 and the second confidence score calculation unit 107, and outputs the category with the highest integrated recognition score and the score of that category.

[0034] The vehicle control unit 109 determines the collision risk with the identified object based on its three-dimensional position, and implements vehicle control if a collision is deemed possible. Furthermore, it can select the appropriate vehicle control method based on the score output by the object category determination unit 108.

[0035] (Action Example 1)

[0036] Next, refer to Figure 3 The flowchart, for Figure 2 The operation of the object recognition device 1 of this embodiment in the illustrated scenario will be described in detail. In the following operation examples, the object recognition device 1 is assumed to be configured to monitor the area in front of a vehicle. Furthermore, it is assumed that both the first sensor 100 and the second sensor 101 are cameras.

[0037] Figure 2This scene depicts vehicle V100 traveling in the overtaking lane adjacent to this vehicle. This vehicle is currently traveling in the left lane, while vehicle V100 is ahead of it in the right lane, traveling in the same direction. F101 represents the overlapping field of view of the two cameras, an area capable of capturing both texture and 3D information. F100 and F102 represent the non-overlapping fields of view of the two cameras, areas capable of capturing only texture information. R100 represents the area of ​​vehicle V100 captured by the overlapping field of view F101, where 3D information is captured. R101 encompasses the entire area of ​​vehicle V100 captured across both the overlapping and non-overlapping fields of view F101, and is an area capable of capturing texture information.

[0038] In Action Example 1, the object recognition device 1 sequentially performs texture acquisition processing (P101), parallax calculation processing (P102), stereo object detection processing (P103), parallax score calculation processing (P104), texture score calculation processing (P105), weight calculation processing (P106), category judgment processing (P107), and vehicle control processing (P108) to implement vehicle control for the vehicle V100.

[0039] In the texture acquisition process (P101), textures captured by two cameras are obtained. Figure 2 Texture information of the image in regions F100, F101, and F102.

[0040] In the disparity calculation process (P102), a corresponding point search is performed for the two cameras, calculating the offset, i.e., the disparity, between the images of the first and second cameras. Here, a valid / invalid disparity determination is performed. During the corresponding point search, disparities with a matching score above a threshold are considered valid, while those with a matching score below a specified value are considered invalid. Furthermore, the distance from the camera is determined based on the calculated disparity information, the camera's position / pose, and the camera's internal parameters.

[0041] In the stereo object detection process (P103), initial object detection based on parallax information is performed. Then, the detection box is expanded based on texture information to determine the region of the object in the image. Figure 4 This describes the processing flow for 3D object detection and processing (P103).

[0042] In the stereo detection process (P103), disparity grouping processing (P201) (the first detection process) is first performed. In disparity grouping processing, objects are determined by clustering the same disparity on the disparity image. Figure 2The object region R100 in the overlapping field of view shown in F101. Next, stereo filtering (P202) (second detection process) is performed. In stereo filtering (P202), only the object region R100 in the overlapping field of view is extracted. Figure 2 Objects are detected around the boundary of the overlapping field of view shown in F101. Thus, only objects with a high probability of being captured that cross both the overlapping and non-overlapping fields of view are extracted. If an object is successfully extracted (meets the criteria), a processing region expansion process (P203) is performed on the extracted object. In the processing region expansion process (P203), a region R103 is set based on the object region R100 detected in the overlapping field of view F101, which includes the vehicle V100 captured that crosses both the overlapping and non-overlapping fields of view F102. The method for setting region R103 is to determine the distance to the object and set the processing region according to the distance and camera parameters. In the object re-detection process (P204), the object is detected by analyzing the texture information of region R103. A convolutional neural network is used as the detection method. The convolutional neural network outputs the upper-left and lower-right image positions of the object. The output of a convolutional neural network can be used to determine the region R101 of a vehicle V100 that is being photographed across the overlapping region F101 and the non-overlapping region F102 of the field of view.

[0043] In the disparity score calculation process (P104), object recognition is performed based on the disparity information of the object region R100 detected by the stereo object detection process (P103). A convolutional neural network is used in object recognition to output a recognition score Score_D.

[0044] In the texture scoring calculation process (P105), object recognition is performed based on the texture information of R101 detected by the 3D object detection process (P103). A convolutional neural network is used in object recognition to output a recognition score Score_T.

[0045] In the weight calculation process (P106), the weights are calculated when integrating the recognition score Score_D calculated by the disparity score calculation process (P104) and the recognition score Score_T calculated by the texture score calculation process (P105). Figure 5This describes the processing flow of the weight calculation process (P106). In the disparity acquisition area and density calculation process (P301), the area of ​​the object region R100, which is the processing area, is calculated as Area_D (the number of pixels). Additionally, the number of pixels Area_V deemed valid during disparity calculation is calculated within Area_D. The disparity density Density_D is calculated as Area_V / Area_D (the number of pixels Area_V divided by the number of pixels Area_D). In the texture acquisition area calculation process (P302), the number of pixels Area_T in region R101 is calculated. Afterwards, the disparity and texture weight calculation process (P303) is performed. In the disparity and texture weight calculation process (P303), the weight Weight_D for Score_D and the weight Weight_T for Score_T are obtained. Weight_D and Weight_T are calculated using the following formula.

[0046] Weight_D=(Area_D) / (Area_D+Area_T)*Density_D……(1)

[0047] Weight_T=(Area_T) / (Area_D+Area_T)……(2)

[0048] In the category determination process (P107), based on the recognition score Score_D calculated by the disparity score calculation process (P104), the recognition score Score_T calculated by the texture score calculation process (P105), and the weights Weight_D and Weight_T calculated by the weight calculation process (P106), the object's recognition score Total_Score is calculated to determine the object's category. The object's recognition score Total_Score is calculated using the following formula.

[0049] Total_Score=Score_D*Weight_D+Score_T*Weight_T……(3)

[0050] Calculate Total_Score according to formula (3) and compare it with the specified threshold. If it is above the threshold, it is determined to be an object to be identified.

[0051] In the vehicle control process (P108), the possibility of collision with the object is determined based on the distance information to the object, and vehicle control is implemented if the possibility of collision is determined to exist. Figure 6The processing flow of vehicle control processing (P108) is described. In the 3D position measurement processing (P401), the 3D position of an object is determined based on the distance information calculated by the disparity calculation processing (P102) and the detection results of the stereo detection processing (P103). Specifically, the distance to the object is calculated by obtaining the median of the distance information of the detected object region R100. In addition, the 3D position of the object is calculated based on the calculated distance to the object and the lateral and longitudinal positions on the image. In the vehicle control content determination processing (P402), the recognition score Total_Score in the category judgment processing (P107) determines what kind of vehicle control to implement. In the vehicle control content determination processing (P402), the vehicle control content is determined based on two thresholds: Brake_Thre and Warning_Thre. The braking implementation threshold Brake_Thre and the warning implementation threshold Warning_Thre satisfy the relationship Brake_Thre > Warning_Thre. If Total_Score > Brake_Thre, braking control is selected as the control content. Conversely, if Brake_Thre > Total_Score > Warning_Thre, warning control is selected as the control content. Furthermore, if Warning_Thre > Total_Score, it is determined that vehicle control will not be implemented, and the vehicle control process ends (P108). If braking control or warning control is selected in the vehicle control content determination process (P402), the vehicle control implementation judgment process (P403) is executed. In the vehicle control implementation judgment process (P403), the probability of collision with an object is determined based on the three-dimensional position information of the object in the past and current frames. Specifically, curve fitting processing is performed on the object's trajectory based on the three-dimensional position information of the past and current frames. Similarly, the future trajectory of the vehicle is calculated based on the current vehicle speed sensor and yaw angle sensor. If the moment when the curve obtained through curve fitting intersects with the vehicle's trajectory is equal, a collision is deemed likely. If a collision is deemed likely, the vehicle control actions determined in the vehicle control action decision process (P402) are executed.

[0052] Based on the above description, the object recognition device 1 of this embodiment performs recognition processing by combining the limited parallax information obtained from the overlapping area of ​​the field of view with the large-scale texture information obtained from the overlapping area of ​​the field of view and the non-overlapping area of ​​the field of view for objects captured across the overlapping area and the non-overlapping area of ​​the field of view, thereby enabling more stable object recognition.

[0053] Furthermore, in 3D object detection processing, object detection is performed first based on parallax information, followed by analysis of texture information for further object detection. This allows for the limitation of areas where texture analysis, which typically has a high processing load, should be performed, thus reducing the processing burden.

[0054] Furthermore, when integrating the recognition scores based on disparity and texture, weighting is applied according to the area of ​​the acquired disparity and the area of ​​the acquired texture. This allows for more stable object recognition even when the areas of acquired disparity and acquired texture differ significantly, or when the confidence levels of the recognition scores themselves differ. Additionally, the calculation of disparity weights considers not only area but also disparity density, thus adaptively adjusting the weights corresponding to the disparity recognition scores to accommodate variations in effective disparity across frames, such as at night or in rainy conditions. This enables high-precision object recognition even at night and in rainy conditions.

[0055] Furthermore, when implementing vehicle control, the vehicle control content is adjusted accordingly based on the object recognition score. This prevents erroneous braking control from being executed when the confidence level of recognition is low.

[0056] (Action Example 2)

[0057] Next, the second action example of object recognition device 1 will be explained. The processing flow of action example 2 is the same as... Figure 3 The processing flow of Action Example 1 is the same, but the processing content implemented in the weight calculation process (P106) is different. Therefore, only the weight calculation process (P106) in Action Example 2 will be explained below.

[0058] In the weight calculation process of Action Example 2 (P106), the weights of disparity score and texture score are calculated by statistical machine learning. Figure 7 A conceptual diagram representing the weight calculation process (P106). Figure 7 In this diagram, I101 represents the texture image (luminance image) used in the texture scoring calculation process (P105), and I102 represents the disparity image used in the disparity scoring calculation process (P104). In the weight calculation process (P106), the texture information used in I101 and the disparity information used in I102 are used as inputs to a convolutional neural network to output weights Weight_D and Weight_T. The convolutional neural network used in the weight calculation process (P106) is as follows: Figure 7The learning (training) is performed as shown. During learning, the texture and disparity information shown in the luminance image I101 and disparity image I102 are used as inputs, along with a texture-based recognition score Score_T and a disparity-based recognition score Score_D. The luminance image I101 and disparity image I102 are input into a convolutional neural network, which outputs Weight_T and Weight_D. An integrated score is calculated according to Score_T, Score_D, and Equation (3). Learning is implemented in a manner that minimizes the false recognition rate of the calculated integrated score.

[0059] Action Example 2 can analyze texture and parallax information and adaptively output weights based on the analysis results. In particular, by implementing learning that considers the relationship between input information and recognition scores, it can reduce weights for patterns that the recognition process is not good at and increase weights for patterns that the recognition process is good at, thus enabling more accurate object recognition.

[0060] (Action Example 3)

[0061] Next, the third action example of the object recognition device 1 will be explained. The processing flow of action example 3 is the same as... Figure 3 The processing flow of Action Example 1 is the same, but the processing content implemented in the disparity score calculation process (P104) and the weight calculation process (P106) is different. Therefore, only the disparity score calculation process (P104) and the weight calculation process (P106) in Action Example 3 will be explained below.

[0062] In the disparity scoring calculation process (P104), the disparity information obtained in the overlapping areas of the visual field is extended to the non-overlapping areas of the visual field. Figure 8 A conceptual diagram representing the parallax scoring calculation process (P104). Figure 8 In the diagram, R200 represents the visual field overlap region where disparity has been calculated. R202 represents the region of the non-overlapping visual field detected in the stereo detection process (P103). In the disparity scoring calculation process (P104), the lateral center position R203 of the region detected in the stereo detection process (P103) is first calculated. Then, using the lateral center position R203 as a reference, the disparity shown in the visual field overlap region R200 is flipped. That is, the disparity image shown in the visual field overlap region R200 is flipped to the right with the lateral center position R203 as the axis, and the disparity image is extended to the visual field non-overlapping region R202. Then, the disparity information in the region R202 of the visual field non-overlapping region is calculated. In the disparity scoring calculation process (P104), recognition processing is performed based on the disparity information R204 obtained from the visual field overlap region and the visual field non-overlapping region.

[0063] In the weight calculation process (P106), the weight Weight_D is calculated based on the disparity information R204 obtained from the overlapping and non-overlapping regions of the visual field. The area Area_D, from which the disparity is obtained, is the number of pixels in R204. Furthermore, Area_V, used in the density calculation, is calculated by also flipping the effective / ineffective disparity information. Weight_D is then calculated based on the calculated Area_D and Area_V.

[0064] Action Example 3 assumes the object has left-right symmetry and performs parallax flipping processing based on the object's center position. This allows for flexible utilization of information from overlapping visual fields to obtain parallax information that is typically incalculable in non-overlapping visual fields. Consequently, recognition processing can be performed based on a wider range of parallax information, improving recognition performance.

[0065] (Action Example 4)

[0066] Next, the fourth action example of the object recognition device 1 will be explained. The processing flow of action example 4 is the same as... Figure 3 The processing flow of Action Example 1 is the same, but the processing content implemented in the 3D object detection processing (P103) is different. Therefore, only the 3D object detection processing (P103) in Action Example 4 will be explained below.

[0067] Figure 9 This describes the processing flow of the stereo object detection process (P103) in Action Example 4. In the stereo object detection process (P103), texture-based object detection is first performed, and the detection box is corrected by analyzing the disparity information in the region where the object is detected. In the texture analysis process (P501), a convolutional neural network with texture information as input is used to detect objects. Object filtering is then performed on the detected regions (P502). In the object filtering process (P502), it is determined whether the detected object is contained within the overlapping region of the field of view. If it is determined to be contained (meets the condition), disparity analysis is performed (P503); otherwise, the stereo object detection process (P103) ends. In the disparity analysis process (P503), the disparity image is analyzed, and the regions detected in the texture analysis process (P501) are corrected. The analysis of the disparity image is performed using the change in disparity. Pixels whose disparity values ​​differ by a certain amount from adjacent pixels are marked. For example, the image is scanned horizontally, and pixels whose distance difference with adjacent pixels is greater than a threshold are colored (marked), thereby drawing a line along the boundary between the object and the background. Line fitting is performed on the marked pixels, and the resulting line is identified as the end of the object, thus correcting the detection results of the texture analysis process (P501). This correction allows the detection box to be closer to the end of the object.

[0068] Example 4 demonstrates how to correct object regions detected through texture analysis by analyzing parallax. The distance difference between 3D objects and the background is typically large. Therefore, correcting object boundaries based on parallax allows for more accurate object region detection.

[0069] (Action Example 5)

[0070] Next, the fifth action example of the object recognition device 1 will be explained. The processing flow of action example 5 is the same as... Figure 3 The processing flow of Action Example 1 is the same, but the processing content implemented in the vehicle control process (P108) is different. Therefore, only the vehicle control process (P108) in Action Example 5 will be explained below.

[0071] The difference between Action Example 1 and Action Example 5 in the vehicle control processing (P108) is the method of measuring the three-dimensional position of the object. Figure 10 This describes the method for determining the three-dimensional position of the object in Action Example 5. In the parallax acquisition area and density calculation process (P601), the area of ​​parallax acquisition, Area_D, and the parallax density, Density_D (density information), in the acquired area are calculated. Next, in the texture acquisition area calculation process (P602), the area of ​​the acquired texture, Area_T, is calculated. In the distance measurement method determination process (P603), a method for changing the three-dimensional position of the object is performed based on the acquired area information or density information of the three-dimensional information used in the first confidence calculation unit 106 and the acquired area information of the texture information used in the second confidence calculation unit 107. Specifically, in the distance measurement method determination process (P603), the following formula is used to calculate the distance based on the area of ​​parallax acquisition, Area_D, the parallax density, Density_D, and the area of ​​the acquired texture, Area_T.

[0072] Area_D*Density_D<α*(Area_T)……(4)

[0073] Here, α is a parameter for adjusting the confidence level of texture and disparity, and an appropriate value is determined experimentally. Under the condition of satisfying equation (4), the three-dimensional position of the object is measured using a texture-based ranging method. That is, the texture-based ranging method is selected when the range of the obtained disparity image is smaller than the specified value. On the other hand, under the condition of not satisfying equation (4), the three-dimensional position of the object is measured using a disparity-based ranging method. That is, the disparity-based ranging method is selected when the range of the obtained disparity image is larger than the specified value. In the disparity-based ranging method, the three-dimensional position of the object is measured based on the disparity information of the overlapping region of the field of view. The distance to the object is calculated by obtaining the median of the disparity information of the detected object region. Then, the three-dimensional position of the object is calculated based on the calculated distance to the object and the lateral and longitudinal positions on the image. In the texture-based ranging method, the distance to the object is determined based on the detection position information of the object in the non-overlapping region of the field of view. In the road surface estimation processing (P605), the disparity information in the overlapping region of the field of view is analyzed, and the road surface distance information is obtained for each longitudinal position on the image. In the ground contact location determination process (P606), the lower end of the object in the overlapping area of ​​the field of view is determined as the ground contact location. Then, based on the distance information of the road surface calculated in the road surface estimation process (P605) and the information of the object's ground contact location, the distance information of the object is calculated. The three-dimensional position of the object is obtained based on the calculated distance information.

[0074] In Action Example 5, the object distance calculation method is adaptively switched according to the number of parallaxes and textures that can be obtained. When the number of parallaxes that can be obtained from an object is very limited and the object distance cannot be accurately calculated using parallax, the three-dimensional position of the object can be accurately calculated by performing distance calculation based on texture information.

[0075] Alternatively, the above equation (4) can be changed to the following equation (5) under the following conditions.

[0076] Score_D<β*(Score_T)……(5)

[0077] Here, β is a parameter for adjusting the confidence level of texture and disparity, and an appropriate value is determined experimentally. Based on equation (5), the ranging method can be switched accordingly based on the recognition score. When the recognition score based on disparity is low, it can be determined that the confidence level of disparity itself is low. Therefore, the error in distance estimation caused by erroneous disparity information can be reduced.

[0078] Alternatively, a distance measurement method can be used that changes the distance using the following formula instead of formulas (4) and (5).

[0079] Density_D≤0……(6)

[0080] In equation (6) above, when the value of Density_D is greater than 0, i.e., when there is more than one effective disparity value, distance estimation is performed based on the disparity information of the overlapping visual field. Thus, even when the ground position cannot be determined due to occlusion by other objects in the non-overlapping visual field area, the three-dimensional position of the object can be determined by using the disparity information of the overlapping visual field area.

[0081] The invention of this application has been described above with reference to Embodiment 1, but the invention of this application is not limited to the above-described embodiments.

[0082] Various modifications that can be made to the structure and details of the invention as understood by the parties involved are possible within the scope of the invention.

[0083] The embodiments of the present invention have been described in detail above, but the present invention is not limited to the above embodiments. Various design changes can be made within the spirit of the present invention as described in the claims. For example, the above embodiments have been described in detail for ease of understanding of the present invention, but are not limited to having all the structures described. In addition, a part of the structure of a certain embodiment can be replaced with the structure of another embodiment, and the structure of another embodiment can be added to the structure of a certain embodiment. Furthermore, for a part of the structure of each embodiment, other structures can be added, deleted, or replaced.

[0084] Explanation of reference numerals in the attached figures

[0085] 100 First Sensor

[0086] 101 Second Sensor

[0087] 102 Three-Dimensional Information Acquisition Department

[0088] 103 Texture Information Acquisition Department

[0089] 104 Object Detection Department

[0090] 105 Confidence Calculation Department

[0091] 106 First Confidence Calculation Department

[0092] 107 Second Confidence Calculation Department

[0093] 108 Object Category Determination Department

[0094] 109 Vehicle Control Unit (Vehicle Control Device)

Claims

1. An object recognition device, characterized in that, include: The three-dimensional information acquisition unit acquires three-dimensional information from the overlapping area of ​​the field of view of the first sensor and the second sensor; The texture information acquisition unit acquires texture information of the overlapping and non-overlapping areas of the field of view of the first and second sensors. An object detection unit detects objects captured by the overlapping region of the field of view and the non-overlapping region of the field of view based on information obtained by the three-dimensional information acquisition unit and the texture information acquisition unit. The confidence calculation unit has a first confidence calculation unit and a second confidence calculation unit. The first confidence calculation unit calculates a first confidence level as the object recognition result based on the three-dimensional information of the overlapping visual field region. The second confidence calculation unit calculates a second confidence level as the object recognition result based on the texture information of the overlapping visual field region and the non-overlapping visual field region. and The object category determination unit determines the object category based on the confidence level calculated by the confidence level calculation unit. In the object category determination unit, a weight is calculated for the confidence level calculated by the confidence level calculation unit. Based on the calculated weight, the first confidence level and the second confidence level calculated by the first confidence level calculation unit and the second confidence level calculation unit are integrated to determine the object category.

2. The object recognition device as described in claim 1, characterized in that: The object category determination unit calculates the weight based on the acquisition area information or density information of the three-dimensional information used in the first confidence calculation unit and the acquisition area information of the texture information used in the second confidence calculation unit.

3. The object recognition device as described in claim 1, characterized in that: The object category determination unit uses a convolutional neural network with the three-dimensional information used in the first confidence calculation unit and the texture information used in the second confidence calculation unit as input to calculate the weights.

4. The object recognition device as described in claim 1, characterized in that: The object detection unit performs a first detection process to detect the region of the object based on information obtained by the three-dimensional information acquisition unit, and detects the object based on the texture information of the region including the region of the object detected by the first detection process.

5. The object recognition device as described in claim 1, characterized in that: The object detection unit performs a second detection process to detect regions of the object based on information obtained by the texture information acquisition unit, and corrects the detection result of the second detection process based on disparity information of regions contained within the regions of the object detected by the second detection process, thereby detecting the object.

6. The object recognition device as described in claim 1, characterized in that: The first sensor and the second sensor are cameras that capture images.

7. The object recognition device as described in claim 1, characterized in that: The first confidence calculation unit extends the three-dimensional information obtained from the overlapping region of the field of view to the non-overlapping region of the field of view based on the center position of the object detected by the object detection unit, and calculates the first confidence based on the three-dimensional information after extension to the overlapping region and the non-overlapping region of the field of view.

8. The object recognition device as described in claim 1, characterized in that: The object detection unit determines the three-dimensional position of the object based on the acquisition area information or density information of the three-dimensional information used in the first confidence calculation unit and the acquisition area information of the texture information used in the second confidence calculation unit.

9. A vehicle control device, characterized in that: The vehicle control unit controls the vehicle based on information from the object detection unit of the object recognition device as described in claim 1, the confidence level calculated by the confidence level calculation unit, and the object category determined by the object category determination unit. The vehicle control unit changes the vehicle control content according to the confidence level value.