Method and system for camera-to-ground alignment

By receiving images and speed data of the vehicle environment, and using a 3D projection method to determine feature points and ground normal vectors, the problem of low camera-to-ground alignment efficiency is solved, and the effect of efficiently generating a vehicle surround view is achieved.

CN116645650BActive Publication Date: 2026-06-12GM GLOBAL TECHNOLOGY OPERATIONS LLC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GM GLOBAL TECHNOLOGY OPERATIONS LLC
Filing Date
2022-10-24
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies are computationally intensive in determining camera alignment with the ground, and are particularly inefficient in real-time execution, making it difficult to efficiently generate virtual views of the vehicle's surroundings.

Method used

By receiving image data and speed data of the vehicle environment, feature points are determined using a 3D projection method. A subset of feature points is selected as ground points, ground normal vectors are calculated and display data is generated. The ground normal vectors are optimized to determine the alignment value from the camera to the ground.

🎯Benefits of technology

It improves the efficiency and accuracy of camera alignment with the ground, enabling the generation of high-quality vehicle surround views in real time, supporting autonomous driving and virtual scene display.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116645650B_ABST
    Figure CN116645650B_ABST
Patent Text Reader

Abstract

A system and method for generating a virtual view of a scene associated with a vehicle is presented. In one implementation, the method includes receiving image data defining a plurality of images associated with an environment of the vehicle; receiving vehicle data indicative of a speed of the vehicle, wherein the vehicle data is associated with the image data; determining, by a processor, feature points within at least one image based on the vehicle data and a three-dimensional projection method; selecting, by the processor, a subset of the feature points as ground points; determining, by the processor, a ground plane based on the subset of feature points; determining, by the processor, a ground normal vector from the ground plane; determining, by the processor, a camera-to-ground alignment value based on the ground normal vector; and generating, by the processor, display data based on the ground normal vector.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This technical field generally relates to surround view images of vehicles, and more specifically, to generating virtual views from camera images based on improved camera-to-ground alignment. Background Technology

[0002] Modern vehicles are typically equipped with one or more optical cameras configured to provide image data to the vehicle's occupants. This image data can display a virtual scene of the environment surrounding the vehicle. This virtual scene can be generated based on multiple images captured from different perspectives. For example, the images may be taken from different image sources located at different positions around the vehicle, or from a single source rotated relative to the vehicle. Based on the camera-to-ground alignment information, the images are merged into a single perspective, such as a bird's-eye view. The methods used to determine the camera-to-ground alignment information can be computationally intensive, especially if performed in real time.

[0003] Therefore, it is desirable to provide an improved system and method for determining camera-to-ground alignment. Furthermore, other desirable features and characteristics of the invention will become apparent from the following detailed description and appended claims, taken in conjunction with the accompanying drawings and the foregoing technical and background information. Summary of the Invention

[0004] A method and system are proposed for generating a virtual view of a scene associated with a vehicle. In one embodiment, the method includes: receiving image data defining a plurality of images associated with the environment of the vehicle; receiving vehicle data indicating the speed of the vehicle, wherein the vehicle data is associated with the image data; determining, by a processor, at least one feature point within the images based on the vehicle data and a three-dimensional projection method; selecting, by the processor, a subset of the feature points as ground points; determining, by the processor, a ground plane based on the subset of feature points; determining, by the processor, a ground normal vector from the ground plane; determining, by the processor, a camera-to-ground alignment value based on the ground normal vector; and generating display data based on the ground normal vector.

[0005] In various implementations, the three-dimensional projection method uses the back projection of feature points from a two-dimensional image to a unit plane to determine the feature points from a sequence of two-dimensional images from multiple images.

[0006] In various implementations, the three-dimensional projection method also determines the feature points from a two-dimensional image sequence by determining feature offsets in a unit plane based on vehicle data and by predicting three-dimensional points of the feature points based on the feature offsets.

[0007] In various implementations, the three-dimensional projection method also determines the feature points by projecting the three-dimensional feature points back to the two-dimensional image based on the feature offset.

[0008] In various implementations, the selection of a subset of feature points is based on a fixed two-dimensional image road mask and a three-dimensional region.

[0009] In various implementations, the selection of a subset of feature points is based on a dynamic two-dimensional image road mask.

[0010] In various implementations, the selection of a subset of feature points is based on homography constraints.

[0011] In various embodiments, the method includes: receiving lighting data indicating lighting conditions associated with the vehicle's environment; and selecting at least one region of interest within at least one image based on the lighting data. The determination of feature points is based on the at least one region of interest.

[0012] In various implementations, the lighting data includes at least one of ambient lighting data, climate data, and time of day data.

[0013] In various embodiments, the method includes: assessing vehicle conditions to determine when the vehicle is traveling smoothly and in a straight line. In response to determining that the vehicle is traveling smoothly and in a straight line, determining the feature points.

[0014] In various implementations, the vehicle condition includes vehicle acceleration, vehicle speed, and steering wheel angle.

[0015] In various implementations, the vehicle status includes a defined distance traveled between two of a plurality of images.

[0016] In various implementations, the method includes: determining that the camera pose is unreliable based on the rotation and translation of a reference camera; and filtering feature points for images with camera poses determined to be unreliable. The determination of a subset of feature points is based on the filtered feature points.

[0017] In various implementations, the method includes optimizing ground normal vectors based on a sliding window approach associated with multiple images.

[0018] In various implementations, the optimization of the ground normal vector is also based on minimizing the transfer distance calculated based on homography pairs.

[0019] In various implementations, the determination of the ground plane is based on constraints associated with the reference ground normal vector and the direction of motion.

[0020] In another embodiment, the system includes: a data storage element including computer-readable instructions; and a processor configured to execute the computer-readable instructions, the computer-readable instructions controlling the processor to perform operations. The operations include: receiving image data defining a plurality of images associated with a vehicle's environment; receiving vehicle data indicating vehicle speed, wherein the vehicle data is associated with the image data; determining feature points within at least one image based on the vehicle data and a three-dimensional projection method; selecting a subset of the feature points as ground points; determining a ground plane based on the subset of feature points; determining a ground normal vector from the ground plane; determining a camera-to-ground alignment value based on the ground normal vector; and generating display data based on the ground normal vector.

[0021] In various implementations, the three-dimensional projection method uses the back projection of feature points from a two-dimensional image to a unit plane to determine the feature points from a sequence of two-dimensional images from multiple images.

[0022] In various implementations, the three-dimensional projection method also determines the feature points from a two-dimensional image sequence by determining feature offsets in a unit plane based on vehicle data and predicting three-dimensional points of the feature points based on the feature offsets, and projecting the three-dimensional feature points back to the two-dimensional image based on the feature offsets.

[0023] In another embodiment, the vehicle includes: a camera configured to capture images of the vehicle's environment; and a controller having a processor configured to: receive the images; receive vehicle data indicating the vehicle's speed, wherein the vehicle data is associated with the images; determine at least one feature point within the images based on the vehicle data and a three-dimensional projection method; select a subset of the feature points as ground points; determine a ground plane based on the subset of feature points; determine a ground normal vector from the ground plane; and determine an alignment value of the camera to the ground based on the ground normal vector.

[0024] Option 1. A method for generating a virtual view of a scene associated with a vehicle, the method comprising:

[0025] Receive image data from multiple images that are defined to be associated with the vehicle's environment;

[0026] Receive vehicle data indicating the speed of the vehicle, wherein the vehicle data is associated with the image data;

[0027] The processor determines at least one feature point within the image based on the vehicle data and a 3D projection method;

[0028] The processor selects a subset of the feature points as ground points;

[0029] The processor determines the ground plane based on a subset of feature points;

[0030] The processor determines the ground normal vector from the ground plane;

[0031] The processor determines the camera alignment value to the ground based on the ground normal vector; and

[0032] The processor generates display data based on the ground normal vector.

[0033] Option 2. According to the method of Option 1, wherein the three-dimensional projection method uses the back projection of the feature points from the two-dimensional image to a unit plane to determine the feature points from a sequence of two-dimensional images from multiple images.

[0034] Option 3. According to the method of Option 2, wherein the three-dimensional projection method further determines the feature points from the two-dimensional image sequence by determining the feature offset in the unit plane based on the vehicle data and by predicting the three-dimensional points of the feature points based on the feature offset.

[0035] Option 4. According to the method of Option 3, wherein the three-dimensional projection method further determines the feature points by projecting the three-dimensional feature points back to the two-dimensional image based on the feature offset.

[0036] Option 5. According to the method described in Option 1, wherein the selection of a subset of feature points is based on a fixed two-dimensional image road mask and a three-dimensional region.

[0037] Option 6. The method according to Option 1, wherein the selection of a subset of feature points is based on a dynamic two-dimensional image road mask.

[0038] Option 7. The method described in Option 1, wherein the selection of a subset of feature points is based on homography constraints.

[0039] Option 8. The method according to Option 1, further comprising:

[0040] Receive lighting data indicating lighting conditions associated with the environment of the vehicle;

[0041] Based on the lighting data, at least one region of interest is selected within the at least one image; and

[0042] The determination of the feature points is based on the at least one region of interest.

[0043] Option 9. The method according to Option 7, wherein the lighting data includes at least one of ambient lighting data, climate data, and time of day data.

[0044] Option 10. The method according to Option 1, further comprising:

[0045] Assess the vehicle's condition to determine when the vehicle is traveling smoothly and in a straight line; and

[0046] Specifically, in response to determining that the vehicle is traveling smoothly and in a straight line, the determination of the feature points is performed.

[0047] Option 11. The method according to Option 10, wherein the vehicle condition includes vehicle acceleration, vehicle speed, and steering wheel angle.

[0048] Option 12. The method according to Option 10, wherein the vehicle condition includes a determined distance traveled between two of the plurality of images.

[0049] Option 13. The method according to Option 1, further comprising:

[0050] Determining camera attitude based on the rotation and translation of a reference camera is unreliable.

[0051] For images with camera poses determined to be unreliable, filter feature points; and

[0052] The determination of the subset of feature points is based on the filtered feature points.

[0053] Option 14. The method according to Option 1, further comprising:

[0054] The ground normal vector is optimized based on a sliding window method associated with the plurality of images.

[0055] Option 15. The method according to Option 1, wherein the optimization of the ground normal vector is further based on minimizing the transfer distance calculated based on homography.

[0056] Option 16. The method according to Option 1, wherein the determination of the ground plane is based on constraints associated with the reference ground normal vector and the direction of motion.

[0057] Option 17. A computer-implemented system for generating a virtual view of a scene associated with a vehicle, the system comprising:

[0058] Data storage element, the data storage element including computer-readable instructions; and

[0059] A processor configured to execute computer-readable instructions that control the processor to perform operations, including:

[0060] Receive image data from multiple images that are defined to be associated with the vehicle's environment;

[0061] Receive vehicle data indicating the speed of the vehicle, wherein the vehicle data is associated with the image data;

[0062] Based on the vehicle data and the 3D projection method, at least one feature point within the image is determined;

[0063] Select a subset of the feature points as ground points;

[0064] Determine the ground plane based on a subset of feature points;

[0065] Determine the ground normal vector from the ground plane;

[0066] The alignment value of the camera to the ground is determined based on the ground normal vector; and

[0067] Display data is generated based on the ground normal vector.

[0068] Option 18. The system according to Option 17, wherein the three-dimensional projection method uses the back projection of the feature points from the two-dimensional image to a unit plane to determine the feature points from a sequence of two-dimensional images from multiple images.

[0069] Solution 19. The system according to Solution 18, wherein the three-dimensional projection method further determines the feature point from the two-dimensional image sequence by determining a feature offset in the unit plane based on the vehicle data and predicting a three-dimensional point of the feature point based on the feature offset, and projecting the three-dimensional feature point back to a two-dimensional image based on the feature offset.

[0070] Option 20. Vehicles, including:

[0071] A camera, configured to capture images of the vehicle's environment; and

[0072] A controller having a processor configured to: receive the image; receive vehicle data indicating the speed of the vehicle, wherein the vehicle data is associated with the image; determine at least one feature point within the image based on the vehicle data and a three-dimensional projection method; select a subset of the feature points as ground points; determine a ground plane based on the subset of feature points; determine a ground normal vector from the ground plane; and determine a camera-to-ground alignment value based on the ground normal vector. Attached Figure Description

[0073] Exemplary embodiments will now be described in conjunction with the accompanying drawings, wherein the same numbers denote the same elements, and wherein:

[0074] Figure 1This is a schematic diagram illustrating a vehicle according to various embodiments, the vehicle having a controller that performs functions to generate a virtual view;

[0075] Figure 2 This is a data flow diagram illustrating the controller of a vehicle according to various implementations;

[0076] Figure 3A and 3B It is a diagram of image data, feature points, and regions of interest according to various implementation methods;

[0077] Figure 4 , 5 Figures 6, 7, and 8 are illustrations of methods executed by a controller according to various embodiments; and

[0078] Figure 9 This is a flowchart illustrating methods performed by a vehicle and a controller according to various implementations. Detailed Implementation

[0079] The following detailed description is exemplary in nature and is not intended to limit application or use. Furthermore, it is not intended to be bound by any express or implied theory set forth in the foregoing technical fields, background art, summary of the invention, or the following detailed description. As used herein, the term "module" refers to any hardware, software, firmware, electronic control components, processing logic, and / or processor device, individually or in any combination including, but not limited to: application-specific integrated circuits (ASICs), electronic circuits, processors (shared, dedicated, or grouped), and memory executing one or more software or firmware programs, combinational logic circuits, and / or other suitable components providing the described functionality.

[0080] Embodiments of this disclosure may be described herein in terms of functional and / or logical block components and various processing steps. It should be understood that such block components can be implemented by any number of hardware, software, and / or firmware components configured to perform specific functions. For example, embodiments of this disclosure may use various integrated circuit components, such as memory elements, digital signal processing elements, logic elements, lookup tables, etc., which can perform various functions under the control of one or more microprocessors or other control devices. Furthermore, those skilled in the art will understand that embodiments of this disclosure can be implemented in conjunction with any number of systems, and the system described herein is merely an exemplary embodiment of this disclosure.

[0081] For the sake of brevity, conventional techniques relating to signal processing, data transmission, signaling, control, and other functional aspects of the system (and its various operating components) may not be described in detail herein. Furthermore, the connecting lines shown in the various figures herein are intended to represent exemplary functional relationships and / or physical connections between the various elements. It should be noted that many alternative or additional functional relationships or physical connections may exist in embodiments of this disclosure.

[0082] refer to Figure 1 The diagram illustrates a vehicle 10 with a surround view display system 100, according to various embodiments. Typically, the surround view display system 100 displays image data on a display 50 of the vehicle 10 to show a surround view of the external environment of the vehicle 10 from a defined viewpoint (e.g., but not limited to a bird's-eye view). As will be discussed in more detail below, the surround view display system 100 generates display data based on a camera-to-ground alignment method and system.

[0083] like Figure 1 As shown, vehicle 10 typically includes a chassis 12, a body 14, front wheels 16, and rear wheels 18. The body 14 is mounted on the chassis 12 and substantially surrounds the components of vehicle 10. The body 14 and chassis 12 may together form a frame. Wheels 16 and 18 are each rotatably connected to the chassis 12 near a corresponding corner of the body 14.

[0084] In various embodiments, vehicle 10 is an autonomous vehicle. An autonomous vehicle is, for example, a vehicle automatically controlled to transport passengers from one location to another. In the illustrated embodiment, vehicle 10 is described as a passenger car, but it should be understood that any other means of transportation, including motorcycles, trucks, sport utility vehicles (SUVs), recreational vehicles (RVs), ships, aircraft, etc., may also be used. In one exemplary embodiment, the autonomous vehicle is a Level 2 or higher level automation system. A Level 2 automation system indicates partial automation. However, in other embodiments, the autonomous vehicle may be a so-called Level 3, Level 4, or Level 5 automation system. A Level 3 automation system indicates conditional automation. A Level 4 system indicates high automation, referring to the driving mode-specific performance of the automated driving system in all aspects of a dynamic driving task, even if the human driver does not respond appropriately to intervention requests. A Level 5 system indicates full automation, referring to the full-time performance of the automated driving system in all aspects of a dynamic driving task under all road and environmental conditions that can be managed by a human driver.

[0085] However, it should be understood that vehicle 10 may also be a conventional vehicle without any autonomous driving capabilities. According to this disclosure, vehicle 10 can implement functions and methods for generating virtual views with coordinated colors.

[0086] As shown in the figure, vehicle 10 typically includes a propulsion system 20, a transmission system 22, a steering system 24, a braking system 26, a sensor system 28, an actuator system 30, at least one data storage device 32, at least one controller 34, and a communication system 36. In various embodiments, the propulsion system 20 may include an internal combustion engine, an electric motor such as a traction motor, a fuel cell propulsion system, and / or combinations thereof. The transmission system 22 is configured to transmit power from the propulsion system 20 to wheels 16 and 18 according to a selectable speed ratio. According to various embodiments, the transmission system 22 may include a step-ratio automatic transmission, a continuously variable transmission (CVT), a manual transmission, or other suitable transmission.

[0087] Braking system 26 is configured to provide braking torque to wheels 16 and 18. In various embodiments, braking system 26 may include friction brakes, brake-by-wire brakes, regenerative braking systems such as motors, and / or other suitable braking systems. Steering system 24 affects the position of wheels 16 and 18. Although described for illustrative purposes as including a steering wheel, in some embodiments covered by this disclosure, steering system 24 may not include a steering wheel.

[0088] Sensor system 28 includes one or more sensing devices 40a-40n that sense observable conditions of the external and / or internal environment of vehicle 10. Sensing devices 40a-40n may include, but are not limited to, radar, lidar, global positioning system (GPS), optical cameras, thermal cameras, ultrasonic sensors, and / or other sensors. Sensing devices 40a-40n are also configured to sense observable conditions of vehicle 10. Sensing devices 40a-40n may include, but are not limited to, speed sensors, position sensors, inertial measurement sensors, temperature sensors, pressure sensors, etc.

[0089] The actuator system 30 includes one or more actuator devices 42a-42n that control one or more vehicle features, such as, but not limited to, the propulsion system 20, the transmission system 22, the steering system 24, and the braking system 26. In various embodiments, vehicle features may also include interior and / or exterior vehicle features, such as, but not limited to, doors, trunks, and cabin features, such as air, music, lighting, etc. (not numbered).

[0090] Communication system 36 is configured to wirelessly transmit information to and from other entities 48, such as, but not limited to, other vehicles (“V2V” communication), infrastructure (“V2I” communication), remote systems and / or personal devices (regarding...). Figure 2(Described in more detail). In one exemplary embodiment, communication system 36 is a wireless communication system configured to communicate using the IEEE 802.11 standard or via a wireless local area network (WLAN) using cellular data communication. However, additional or alternative communication methods, such as dedicated short-range communication (DSRC) channels, are also considered to be within the scope of this disclosure. A DSRC channel refers to a unidirectional or bidirectional short- to medium-range wireless communication channel specifically designed for automotive applications, along with a corresponding set of protocols and standards.

[0091] Data storage device 32 stores data used in the automatic control functions of vehicle 10. In various embodiments, data storage device 32 stores a defined map of the navigable environment. The defined map may include various data in addition to the road data associated with it, including altitude, climate, lighting, etc. In various embodiments, the defined map may be predefined by a remote system and obtained from a remote system (see reference). Figure 2 (Further detailed description). For example, the defined map can be assembled by a remote system and transmitted to vehicle 10 (wirelessly and / or via wire) and stored in data storage device 32. It is understood that data storage device 32 may be part of controller 34, separate from controller 34, or part of controller 34 and a separate system.

[0092] The controller 34 includes at least one processor 44 and a computer-readable storage device or medium 46. The processor 44 can be any custom or commercially available processor, central processing unit (CPU), graphics processing unit (GPU), auxiliary processor among a plurality of processors associated with the controller 34, semiconductor-based microprocessor (in the form of a microchip or chipset), macroprocessor, any combination thereof, or any device typically used for executing instructions. The computer-readable storage device or medium 46 can include volatile and non-volatile memory, such as read-only memory (ROM), random access memory (RAM), and keep-alive memory (KAM). KAM is persistent or non-volatile memory that can be used to store various operational variables when the processor 44 is powered off. The computer-readable storage device or medium 46 can be implemented using any of a variety of known storage devices, such as PROM (programmable read-only memory), EPROM (electrical PROM), EEPROM (electrically erasable PROM), flash memory, or any other electrical, magnetic, optical, or combined storage device capable of storing data (some of which represents executable instructions) used by the controller 34 in controlling and performing the functions of the vehicle 10.

[0093] The instructions may include one or more separate programs, each of which includes an ordered list of executable instructions for implementing logical functions. When executed by processor 44, these instructions receive and process signals from sensor system 28, perform logic, calculations, methods, and / or algorithms for automatically controlling components of vehicle 10, and generate control signals for actuator system 30 to control components of vehicle 10 based on logic, calculations, methods, and / or algorithms. Although in Figure 1 The diagram shows only one controller 34, but embodiments of vehicle 10 may include any number of controllers 34 that communicate via any suitable communication medium or combination of communication media and cooperate to process sensor signals, execute logic, calculations, methods and / or algorithms, and generate control signals to automatically control the features of vehicle 10.

[0094] In various embodiments, one or more instructions of controller 34 are included in surround view display system 100, and when executed by processor 44, process image data from at least one optical camera of sensor system 28 to extract features from the images to determine the ground plane. When executed by processor 44, the instructions use the ground plane to determine camera alignment information. The image data is then assembled using the camera alignment information to form a surround view from a defined perspective. In various embodiments, sensing devices 40a to 40n include N(one or more) cameras (e.g., optical cameras configured to capture color images of the environment) that sense the external environment of vehicle 10 and generate image data. The cameras are arranged such that each covers a defined field of view of the environment surrounding the vehicle. Based on, for example, the camera's attitude and position relative to the vehicle and relative to the ground, image data from each camera is assembled into a surround view.

[0095] It will be understood that controller 34 may differ from others in other respects. Figure 1The illustrated implementation. For example, controller 34 may be coupled to or may otherwise utilize one or more remote computer systems and / or other control systems, for example, as part of one or more of the aforementioned vehicle equipment and systems. It will be understood that although this exemplary implementation is described in the context of a fully functional computer system, those skilled in the art will recognize that the apparatus of this disclosure can be distributed as a program product with one or more types of non-transitory computer-readable signal-bearing media used to store and distribute programs and their instructions, such as non-transitory computer-readable media carrying programs and containing computer instructions stored therein, which are used to cause a computer processor (such as processor 44) to implement and execute the program. Such program products can take many forms, and this disclosure applies equally regardless of the specific type of computer-readable signal-bearing media used to perform the distribution. Examples of signal-bearing media include: recordable media (such as floppy disks, hard disk drives, memory cards, and optical disks), and transmission media (such as digital and analog communication links). It will be understood that cloud-based storage and / or other technologies may also be used in the defined implementation. It will be similarly understood that the computer system of controller 34 may also differ in other respects. Figure 1 The implementation shown differs, for example, in that the computer system of controller 204 may be connected to or may otherwise utilize one or more remote computer systems and / or other control systems.

[0096] refer to Figure 2 And continue to refer to Figure 1 The data flow diagram illustrates the various implementation methods. Figure 1 The components of the surround view display system 100. It is understood that various embodiments of the surround view display system 100 according to this disclosure may include any number of modules embedded within the controller 34, which may be combined and / or further divided to similarly implement the systems and methods described herein. Furthermore, inputs to the surround view display system 100 may be received from the sensor system 28, from other control modules (not shown) associated with the vehicle 10, and / or by… Figure 1 Other submodules (not shown) within the controller 34 determine / model the data. Additionally, the input may undergo preprocessing such as subsampling, noise reduction, normalization, feature extraction, and loss reduction. In various embodiments, the surround view display system 100 includes an enable module 102, a region selection module 104, a feature prediction module 106, a ground detection module 110, a parameter determination module 112, and a display module 114.

[0097] In various embodiments, the enable module 102 receives vehicle data 116 and camera data 118 as inputs. In various embodiments, vehicle data 116 includes data indicating vehicle speed, vehicle acceleration, and steering wheel angle. In various embodiments, camera data 118 includes a defined distance between keyframes. The enable module 102 evaluates the inputs 116 and 118 to determine whether vehicle 10 is traveling in a straight line and smoothly. For example, the enable module 102 compares each input to a predefined threshold or multiple thresholds within a defined range to determine if conditions are met. If all conditions are not met, it is determined that vehicle 10 is not traveling in a straight line or is traveling unevenly, and enable data 120 is set to indicate that camera-to-ground alignment is not enabled. If all conditions are met, it is determined that vehicle 10 is traveling in a straight line and smoothly, and enable data 120 is set to indicate that camera-to-ground alignment is enabled.

[0098] In various embodiments, the region selection module 104 receives enable data 120, image data 121, and lighting data 122 as input. In various embodiments, the image data 121 includes data defining a plurality of image frames captured by the camera. In various embodiments, the lighting data 122 includes information about the environment in which the images were captured, such as external lighting conditions, time of day, and current weather.

[0099] When enable data 120 indicates that camera-to-ground alignment is enabled, region selection module 104 identifies a region of interest (ROI) within each image frame of image data 121. An ROI is a region where data is sufficient to predict features. In various embodiments, region selection module 104 identifies ROIs based on lighting data, climate data, and / or time data, and generates ROI data 126 accordingly. For example, as... Figure 3A and 3B As shown, each image frame 200, 202 can be divided into a Region based on the time of day. Figure 3A This shows two regions associated with image frame 200 captured during the day. 1 204 has its associated first lighting conditions, Region. 2 206 has enhanced lighting conditions associated with it. Figure 3B The image frame 202 captured at night is shown to have three regions associated with it. 1 208 Region (area) with limited or no lighting 2 210 and Region 3212 is identified by external lighting conditions from taillights, roadside lights, etc. It is understood that regions of interest can be identified in each image frame based on any number of conditions, and are not limited to this lighting example.

[0100] Return to reference Figure 2 In various implementations, the feature prediction module 106 receives ROI data 126 and vehicle data 127 as input. The feature prediction module 106 uses vehicle speeds from the vehicle data 127 to predict feature points within the ROI. The feature prediction module 106 generates feature prediction data 128, which includes indications of feature points in the ROI.

[0101] In various implementations, the feature prediction module 106 uses a first method (referred to as a three-dimensional (3D) projection method) to predict feature points, which is based on three-dimensional (3D) projection to predict the detected feature points in the next image. k+1 The future position within. For example, as in Figure 4 As shown in more detail below, the 3D projection method receives data from an image. k The feature points extracted from the ROI are shown at position 300, and for each feature point... i The identified feature points are back-projected onto the unit plane as follows:

[0102] At 310, ,

[0103] in, K A matrix representing the inherent parameters associated with the camera; and Represents the scale value.

[0104] Subsequently, assuming the plane is flat, the depth of the feature points... d It was identified as:

[0105] At 320, ,

[0106] in, h This represents the height from the center of the camera to the unit plane.

[0107] If depth d Within the range of 330, it is based on the camera's rotation. Determined vehicle speed between frames To determine the feature offset as:

[0108] At 340, And when the enable condition is met at 350, then the 3D feature location is predicted at 350.

[0109] Then, at 360°, the 3D features are projected back into the camera image to obtain the feature positions according to the camera coordinates:

[0110] At 370 and 380, .

[0111] Image shown at 390. k+1 The predicted feature location.

[0112] Return to reference Figure 2 In various other embodiments, the feature prediction module 106 uses a second method (referred to as the vanishing point method) to predict feature points, which predicts the detected feature points in the next image based on vehicle speed and the vanishing point in the image. k+1 The future position within. For example, such as Figure 5 As shown, if the enable condition is met at 405, the vanishing point method receives data from the image. k The feature points extracted from the ROI are shown at position 400.

[0113] At 410, the vanishing point method probes the image. k The vanishing points are associated with the detected feature points. At 420, one of the vanishing points is selected and its quality is checked. At 430, the epipolar line is determined based on the vanishing points and feature points. Based on the homography matrix 480 constructed at 460 and 470, feature projection is performed along the epipolar line to generate an image. k+1 The predicted feature location is shown at 450.

[0114] Return to reference Figure 2 In various other embodiments, the feature prediction module 106 uses a third method (referred to as the epipolar method) to predict feature points, which predicts the detected feature points in the next image based on epipolar lines in the image. k+1 The future position within. For example, such as Figure 6 As shown, the epipolar method processes... k The image frames and vehicle speed are keyframes. The first keyframe is determined at 510, the second at 520, and so on, until the third keyframe is determined at 530. k One keyframe.

[0115] Subsequently, at position 540, two-dimensional (2D) matching is used to match feature points between two keyframes, and at position 550, feature detection is performed. At position 555, epipolar lines are determined from the 2D feature points. At position 570, the feature points in the image are shown. k+1 The predicted feature location is determined using the epipolar line and vehicle speed at 580. At 560, the epipolar line can be used to perform any location correction.

[0116] Return to reference Figure 2 The ground detection module 110 receives feature prediction data 128 as input. The ground detection module 110 selects ground features from the feature prediction data 128 and then identifies the ground plane from the selected features. The ground detection module 110 generates ground data 130 indicating the ground plane.

[0117] In various embodiments, the ground detection module 110 uses a fixed two-dimensional image road mask and a three-dimensional region of interest bounding box to select ground features. In various other embodiments, the ground detection module 110 selects ground features by estimating the road mask using semantic segmentation and machine learning techniques.

[0118] In various other embodiments, the ground detection module 110 uses geometric constraints to select ground features to remove any non-ground points. For example, as Figure 7 As shown, the image is received, and feature extraction, feature detection, and matching are performed at 600 and 610.

[0119] Subsequently, at position 620, the relative pose between the first and second image frames is recovered. At position 630, the homography matrix... H Calculated as:

[0120] H= .

[0121] At 640, the relative pose on the camera unit plane is thus used. R Non-collinear points q And, calculate the ground normal vector for the feature points:

[0122] .

[0123] When the ground normal vector is at 650 n With reference vector When the difference is small (e.g., less than a threshold) If the ground feature is selected at 660, then the ground normal vector at 650 is the same as the reference vector. When the difference is not small (e.g., greater than a threshold), If ), then no ground feature is selected at 670.

[0124] Return to reference Figure 2 In various implementations, when the camera pose estimation is determined to be reliable, the ground detection module 110 performs the ground feature selection as described above. For example, when the camera rotation change is small while the vehicle is traveling in a straight line, and when the camera movement direction has a small angle with the vehicle motion vector, the ground detection module 110 determines that the camera pose estimation is reliable.

[0125] Once the ground features are extracted, the ground detection module 110 then uses a random sample consistency (RANSAC) method, such as, but not limited to, the M estimator sample consistency (MSAC) method or some other method, and performs ground plane fitting of the selected ground points by solving constraints to ensure that the ground normal vector has a small angle relative to the ground vector and is perpendicular to the vehicle motion vector.

[0126] The parameter determination module 112 receives ground feature data 130 as input. The parameter determination module 112 extracts the ground normal vector from the identified ground plane and uses the extracted ground normal vector to estimate alignment data 132, which includes the camera's roll, pitch, yaw and altitude.

[0127] For example, such as Figure 8 As shown, given the camera's inherent data 700, in time... k Keyframe ground features 710 and in time k The camera pose is 720 degrees, and the size is [size missing]. a A sliding window.

[0128] When the sliding window is full at 730, the ground normal vector and camera ground height are refined at 740. For example, the transfer distance is minimized through homography, thereby obtaining an optimized normal vector from the data in the sliding window. and height :

[0129]

[0130] And updated:

[0131] ,and

[0132] and .

[0133] in, a Indicates the size of the sliding window; b The number of features between adjacent keyframes; H Represents the homography matrix; and This represents the damping factor. Once optimized alignment data 760 is obtained from the window at 750, the old feature data and camera pose are removed from the sliding window at 770, and the method returns to the update window at 730. The optimized alignment data 760 is then used to determine the camera's yaw, pitch, and roll.

[0134] Return to reference Figure 2The display module 114 receives alignment data 132 and image data 121 as input. Based on the alignment data 123, the display module 114 assembles the image data 121 from a defined perspective into a surround view. The display generates display data 134 to display the surround view on the display 50 of the vehicle 10.

[0135] refer to Figure 9 And continue to refer to Figure 1-8 Provides a way to display using the surround view display system 100 Figure 1 A flowchart of a method 800 for generating surround view image data within a vehicle 10, wherein the surround view display system implements... Figure 1 and Figure 2 A method and system for aligning a camera to the ground. As can be understood from this disclosure, the sequence of operations within method 800 is not limited to... Figure 9 The method 800 may be executed in the order shown, but may be executed in an applicable order and according to one or more variations of this disclosure. In various embodiments, the method 800 may be scheduled to run based on one or more predetermined events, and / or may run continuously during the operation of the vehicle 10.

[0136] In one example, method 800 could begin at 805. Input data is obtained between 810 and 830. For example, at 810, a description of what was captured by a camera on vehicle 10 is obtained. k Image data from the frame. At 820, speed data and other vehicle data are obtained. At 830, lighting data associated with the vehicle's environment is obtained.

[0137] Subsequently, the activation conditions are evaluated to ensure that the vehicle 10 travels in a straight line and smoothly. For example, at 840, parameters including, but not limited to, speed, acceleration, steering wheel angle, and distance between keyframes are compared with thresholds and / or ranges. If one or more of the activation conditions are not met at 840, method 800 continues to obtain new input data at 810-830.

[0138] Once the activation condition is met at 840, camera-to-ground alignment is performed between 850 and 900, and display data is generated based on the camera-to-ground alignment at 910. For example, at 850, lighting data and image data are evaluated to determine the ROI. As described above at 860, feature prediction is performed on the ROI based on, for example, a 3D projection method, a vanishing point method, or an epipolar method. Subsequently, at 865, feature data associated with image frames having camera poses deemed reliably associated are used, and feature data with unreliable camera poses are filtered out.

[0139] Using filtered feature data, at 870, ground features are selected from feature points based on, for example, a fixed 2D mask and 3D ROI box, a dynamic 2D mask, or homography constraints as described above. At 880, a ground plane is determined from the selected feature points. At 890, a normal vector and camera height are calculated from the ground plane, and at 895, the normal vector and camera height are refined using, for example, a sliding window method as described above and constraints as described above.

[0140] Subsequently, at 900, camera parameters including pitch, yaw, and roll are determined. Then, at 910, display data is generated based on the camera parameters to display the image data on the display 50 of the vehicle 10, for example, according to different viewing angles. Method 800 can then end at 920.

[0141] Although at least one exemplary embodiment has been presented in the foregoing detailed description, it should be understood that numerous variations exist. It should also be understood that the exemplary embodiments or exemplary implementations are merely examples and are not intended to limit the scope, applicability, or configuration of this disclosure in any way. Rather, the foregoing detailed description will provide those skilled in the art with a convenient roadmap for implementing one or more exemplary embodiments. It should be understood that various changes can be made to the function and arrangement of the elements without departing from the scope of this disclosure as set forth in the appended claims and their legal equivalents.

Claims

1. A method for generating a virtual view of a scene associated with a vehicle, the method comprising: Receive image data from multiple images that are defined to be associated with the vehicle's environment; Receive vehicle data indicating the speed of the vehicle, wherein the vehicle data is associated with the image data; The processor determines at least one feature point within the image based on the vehicle data and a 3D projection method; The processor selects a subset of the feature points as ground points; The processor determines the ground plane based on a subset of feature points; The processor determines the ground normal vector from the ground plane; The processor determines the camera alignment value to the ground based on the ground normal vector; and The processor generates display data based on the ground normal vector. The three-dimensional projection method uses the back projection of the feature points from a two-dimensional image to a unit plane to determine the feature points from a sequence of two-dimensional images from multiple images. The three-dimensional projection method further determines the feature points from the two-dimensional image sequence by determining the feature offset in the unit plane based on the vehicle data and predicting the three-dimensional points of the feature points based on the feature offset.

2. The method according to claim 1, wherein, The three-dimensional projection method further determines the feature points by projecting the feature points back into the two-dimensional image based on the feature offset.

3. The method according to claim 1, wherein, The selection of a subset of feature points is based on a fixed two-dimensional image road mask and a three-dimensional region.

4. The method according to claim 1, wherein, The selection of a subset of feature points is based on a dynamic two-dimensional image road mask.

5. The method according to claim 1, wherein, The selection of a subset of feature points is based on homography constraints.

6. The method according to claim 1, further comprising: Receive lighting data indicating lighting conditions associated with the environment of the vehicle; Based on the lighting data, at least one region of interest is selected within the at least one image; and The determination of the feature points is based on the at least one region of interest.

7. The method according to claim 6, wherein, The lighting data includes at least one of ambient lighting data, climate data, and time of day data.

8. The method according to claim 1, further comprising: Assess the vehicle's condition to determine when the vehicle is traveling smoothly and in a straight line; and Specifically, in response to determining that the vehicle is traveling smoothly and in a straight line, the determination of the feature points is performed.

9. The method according to claim 8, wherein, The vehicle status includes vehicle acceleration, vehicle speed, and steering wheel angle.

10. The method according to claim 8, wherein, The vehicle status includes the determined distance traveled between two of the plurality of images.

11. The method according to claim 1, further comprising: Determining camera attitude based on the rotation and translation of a reference camera is unreliable. For images with camera poses determined to be unreliable, filter feature points; and The determination of the subset of feature points is based on the filtered feature points.

12. The method according to claim 1, further comprising: The ground normal vector is optimized based on a sliding window method associated with the plurality of images.

13. The method according to claim 1, wherein, The optimization of the ground normal vector is also based on minimizing the transfer distance calculated based on homography.

14. The method according to claim 1, wherein, The determination of the ground plane is based on constraints associated with the reference ground normal vector and the direction of motion.

15. A computer-implemented system for generating a virtual view of a scene associated with a vehicle, the system comprising: A data storage element, the data storage element including computer-readable instructions; and A processor configured to execute computer-readable instructions that control the processor to perform operations, including: Receive image data from multiple images that are defined to be associated with the vehicle's environment; Receive vehicle data indicating the speed of the vehicle, wherein the vehicle data is associated with the image data; Based on the vehicle data and the 3D projection method, at least one feature point within the image is determined; Select a subset of the feature points as ground points; Determine the ground plane based on a subset of feature points; Determine the ground normal vector from the ground plane; The alignment value of the camera to the ground is determined based on the ground normal vector; and Display data is generated based on the ground normal vector. The three-dimensional projection method uses the back projection of the feature points from a two-dimensional image to a unit plane to determine the feature points from a sequence of two-dimensional images from multiple images. The three-dimensional projection method further determines the feature points from the two-dimensional image sequence by determining the feature offset in the unit plane based on the vehicle data, predicting the three-dimensional point of the feature point based on the feature offset, and projecting the feature point back to the two-dimensional image based on the feature offset.

16. Vehicles, including: A camera configured to capture images of the vehicle's environment; and A controller having a processor configured to: receive the image; receive vehicle data indicating the speed of the vehicle, wherein the vehicle data is associated with the image; determine at least one feature point within the image based on the vehicle data and a 3D projection method; select a subset of the feature points as ground points; determine a ground plane based on the subset of feature points; determine a ground normal vector from the ground plane; and determine a camera-to-ground alignment value based on the ground normal vector. The three-dimensional projection method uses the back projection of the feature points from a two-dimensional image to a unit plane to determine the feature points from a sequence of two-dimensional images from multiple images. The three-dimensional projection method further determines the feature points from the two-dimensional image sequence by determining the feature offset in the unit plane based on the vehicle data, predicting the three-dimensional point of the feature point based on the feature offset, and projecting the feature point back to the two-dimensional image based on the feature offset.